[TriLUG] Re: spam solutions - spamassassin

Sat May 22 23:17:48 EDT 2004

On Sat, 2004-05-22 at 18:23, Gregory Woodbury wrote:
> On Sat, May 22, 2004 at 04:42:37PM -0400, Myrhillion wrote:
> > Hi Rick,
> > 
> > I've been struggling a bit trying to setup a spam solution.
> > I have tried assp, but find it wanting..  too much training.
> > 
> > Can you give me your opinions on spamassassin?  I hear a lot of people 
> > using this,
> > but was curious what kind of commitment is required to get it setup to 
> > actually
> > filter spam as opposed to spam + legit email.
> > 
> > I thought an opinion from someone running it might be helpful.
> > Thanks for your time.
> > 
> > Doug Taggart
> 
> I've been using spamassassin for several years now.  I have it trained
> fairly well to detect spam with 0 false positives and a low rate of
> false negatives.  In other words, it never classes legit email as spam
> and only rarely lets spam thru to my mail box.
> 
> The amount of training is steep on the front end.  At first you have to
> feed a large selection of spam into the trainer and then you jut have to
> tweak the thing occasionally as new trends in spam arise.
> 
> I have a collection of selected spam messages that I use to seed newly
> installed user filters (or if I re-install the whole system). This is
> not as big a deal as it might sound.  I use "selected" messages that
> don't have many of the "cache poisoning" random word collections that
> has become one of the recent attempts by spammers to bypass SA and other
> baysian detectors.
> 
> If so inclined you can install the optional extended interface to
> Vipul's Razor collective database of spam but I find that a bit much for
> my lazy tastes.
> 
> If you are on a Linux platform (easy guess huh?) an anti-virus filter
> like ClamAV is worth installing.  I slipped ClamAV into my system in
> about 15 minutes just a week or so ago [Fedora Core 1/sendmail] and it
> does wonders for the virus laden emails before they can slip past the SA
> filters.
> 
> Installing SA can take two forms: a system-wide filtering with all mail
> being passed into SA by sendmail before local delivery; or a per-user
> installation where you have each users' procmail scripts process thru SA
> as part of local delivery.  I opted for the local procmail solution as
> there are a few usernames that get no spam and thus don't require the
> overhead. (Besides, I was too lazy to figure out the SA "milter" at the
> time of initial install and don't particularly relish being the "censor"
> for all the users - I wasn't sure how tuning/training worked when the
> system-wide method is used. I installed at home before I installed at
> any of my work locations.)
> 
> For per-user installations, the following is used at the top of user
> procmailrc files to pass stuff thru the spamassassin daemon:
> 
> 
> ---------------------------(cut here)
> :0f:
> |	/usr/bin/spamc -f
> 
> :0:
> *	^X-Spam-Flag: YES
> 	mbox.caughtspam
> 
> :0
> *	^TO .*@trilug.org
> 	mbox.trilug
> ---------------------------(cut here)
> 
> As you can see, the spam filtering is done before filtering mailing
> lists into separate folders, though you could select to filter stuff
> into folders before filtering thru spamc. 
> 
> I'm supporting about 10 users and ~20 accounts on 6 machines here at
> home.  At one job we were supporting >5000 accounts and matching network
> scale with 1 SA daemon on the incoming mail server. Perhaps others will
> chime in with how it scales for them.
> 
> 
> -- 
> G.Wolfe Woodbury     `- -'
>                        U
> The Line Eater is a boojum!

I'm running SA (via MailScanner) at a client with over 2000 nodes. It
currently filters out about 90% of the spam. I'm using the SpamHaus
lists to filter the mail as well.

To feed it Ham I create imaginary users and subscribe them to various
internal and external lists used by the organization. Any whitelisted
mail or mail scored lower than 2 that comes in for these imaginary users
is submitted as Ham.

To feed it Spam I create other bogus users (using very common names) and
hide the email addresses on web-sites and publicized guest lists.

This works fairly well at keeping the filtering up-to-date. Still the
random crap does make it through occasionally.  I've been thinking about
running an additional Spam Assassin test which keeps track of a hash for
every email that passes through from an external source.  If a certain
threshold of emails comes in with that hash during an hour (or if a mail
matches a hash coming in for one of my bogus Spam users), then any
emails with that hash are marked as suspected Spam.

I don't do that now due to the volume of mail flowing through this site.

SpamAssassin rocks. Used with a virus scanner, the SpamHaus RBL's, and
MailScanner, it really does a great job.

Good Luck - Jon Carnes