[TriLUG] Spamassassin question - Bayesian filtering

Jeremy Portzer jeremyp at pobox.com
Thu Mar 13 14:36:55 EST 2003


On Thu, 2003-03-13 at 14:26, Mike Broome wrote:

> 
>   http://marc.theaimsgroup.com/?t=104634355700008&r=1&w=2
> 
> The posts that might provide answers to your questions are these:
> 
>   http://marc.theaimsgroup.com/?l=mutt-users&m=104639189329937&w=2

Thanks for the links.  Talk about ugly though... white on black!  Ugh.
:-)

> This describes how to save the false positives to an mbox style mailbox
> and the commands to feed those mailboxes to SA to update the spam/ham
> databases.  The short answer is using the following commands:
> 
>   sa-learn-spam --mbox uncaught-spam-mbox
>   sa-learn-nonspam --mbox false-positive-mbox
> 
> There was a follow-up comment that spamassassain v2.50 combined those
> two programs into a single sa-learn program with "-spam" and "-ham"
> options that give the same affect as sa-learn-spam and sa-learn-nonspam,
> respectively.

Right, that's what I'm doing; with SA 2.50 the commands are "sa-learn
--ham" and "sa-learn --ham", respectively (note double-dashes).  But the
problem that I'm trying to figure out is the fact that "autolearning"
will already have put the uncaught spam into the "ham" side of the
database, because it didn't realize it was spam.  So if I do 'sa-learn
--spam' will it automatically *remove* that spam's data from the ham
database, since I'm now telling it that's it spam?   Or do I need to run
it through with the "--forget" option first ?    If the same email is in
both databases, then it will cancel itself out, which doesn't help the
learning at all!

> This post is also interesting:
> 
>   http://marc.theaimsgroup.com/?l=mutt-users&m=104648274900735&w=2
> 
> It describes one user's finding that the weighting for the bayesian
> filtering didn't match his expectations and desires and gives the new
> weighting that he configured.

It must be a real PITA for the Spamasssassin developers to figure out
what weight to give to each rules.  I'll probably leave the weights for
the BAYES_* rules the way they are for now, until I get some more
confidence in how well it works...

--Jeremy

-- 
/=====================================================================\
| Jeremy Portzer       jeremyp at pobox.com       trilug.org/~jeremy     |
| GPG Fingerprint: 712D 77C7 AB2D 2130 989F  E135 6F9F F7BC CC1A 7B92 |
\=====================================================================/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.trilug.org/pipermail/trilug/attachments/20030313/a1ed5f18/attachment.pgp>


More information about the TriLUG mailing list