[TriLUG] Spamassassin question - Bayesian filtering

Brandon L. Newport bnewport at appws.com
Thu Mar 13 15:08:00 EST 2003


We typically use qmail in conjunction with spamassassin, but sometimes
clients want to manage it on their desktop so I normally suggest
http://popfile.sourceforge.net/  Now we can also utilize spam checking at
the firewall level with OpenBSD and PF, here is some info about doing that
http://www.benzedrine.cx/relaydb.html.

-brandon


 

-----Original Message-----
From: trilug-admin at trilug.org [mailto:trilug-admin at trilug.org] On Behalf Of
Jeremy Portzer
Sent: Thursday, March 13, 2003 2:37 PM
To: TriLUG List
Subject: Re: [TriLUG] Spamassassin question - Bayesian filtering


On Thu, 2003-03-13 at 14:26, Mike Broome wrote:

> 
>   http://marc.theaimsgroup.com/?t=104634355700008&r=1&w=2
> 
> The posts that might provide answers to your questions are these:
> 
>   http://marc.theaimsgroup.com/?l=mutt-users&m=104639189329937&w=2

Thanks for the links.  Talk about ugly though... white on black!  Ugh.
:-)

> This describes how to save the false positives to an mbox style 
> mailbox and the commands to feed those mailboxes to SA to update the 
> spam/ham databases.  The short answer is using the following commands:
> 
>   sa-learn-spam --mbox uncaught-spam-mbox
>   sa-learn-nonspam --mbox false-positive-mbox
> 
> There was a follow-up comment that spamassassain v2.50 combined those 
> two programs into a single sa-learn program with "-spam" and "-ham" 
> options that give the same affect as sa-learn-spam and 
> sa-learn-nonspam, respectively.

Right, that's what I'm doing; with SA 2.50 the commands are "sa-learn --ham"
and "sa-learn --ham", respectively (note double-dashes).  But the problem
that I'm trying to figure out is the fact that "autolearning" will already
have put the uncaught spam into the "ham" side of the database, because it
didn't realize it was spam.  So if I do 'sa-learn --spam' will it
automatically *remove* that spam's data from the ham
database, since I'm now telling it that's it spam?   Or do I need to run
it through with the "--forget" option first ?    If the same email is in
both databases, then it will cancel itself out, which doesn't help the
learning at all!

> This post is also interesting:
> 
>   http://marc.theaimsgroup.com/?l=mutt-users&m=104648274900735&w=2
> 
> It describes one user's finding that the weighting for the bayesian 
> filtering didn't match his expectations and desires and gives the new 
> weighting that he configured.

It must be a real PITA for the Spamasssassin developers to figure out what
weight to give to each rules.  I'll probably leave the weights for the
BAYES_* rules the way they are for now, until I get some more confidence in
how well it works...

--Jeremy

-- 
/=====================================================================\
| Jeremy Portzer       jeremyp at pobox.com       trilug.org/~jeremy     |
| GPG Fingerprint: 712D 77C7 AB2D 2130 989F  E135 6F9F F7BC CC1A 7B92 |
\=====================================================================/




More information about the TriLUG mailing list