[TriLUG] Spamassassin question - Bayesian filtering

Jeremy Portzer jeremyp at pobox.com
Thu Mar 13 17:37:55 EST 2003


On Thu, 2003-03-13 at 14:36, Jeremy Portzer wrote:
> 
> Right, that's what I'm doing; with SA 2.50 the commands are "sa-learn
> --ham" and "sa-learn --ham", respectively (note double-dashes).  But the
> problem that I'm trying to figure out is the fact that "autolearning"
> will already have put the uncaught spam into the "ham" side of the
> database, because it didn't realize it was spam.  So if I do 'sa-learn
> --spam' will it automatically *remove* that spam's data from the ham
> database, since I'm now telling it that's it spam?   Or do I need to run
> it through with the "--forget" option first ?    If the same email is in
> both databases, then it will cancel itself out, which doesn't help the
> learning at all!

Here's something I just found that eases my concerns somewhat, from the
man page for configuration options:

<quote>
auto_learn_threshold_nonspam n.nn (default -2.0)
    The score threshold below which a mail has to score, to be fed into
SpamAssassin's learning systems automatically as a non-spam message.

auto_learn_threshold_spam n.nn (default 15.0)
The score threshold above which a mail has to score, to be fed into
SpamAssassin's learning systems automatically as a spam message.
</quote>

These values are pretty conservative.  The spam emails that I was
worried about normally have a score of 2 or 3, certainly not a negative
score, so it looks like they're not being put into the Bayesian database
anyway.  So I don't need to worry about "forgetting" them from the "ham"
side of the DB.

This also shows that not all spam being caught (per default, score of 5
is tagged as spam) is going into the Bayesian system.   So what I could
do is find everything that is spam, but with a score lower than 15, and
feed that to sa-learn manually.  Hmm.

--Jeremy

-- 
/=====================================================================\
| Jeremy Portzer       jeremyp at pobox.com       trilug.org/~jeremy     |
| GPG Fingerprint: 712D 77C7 AB2D 2130 989F  E135 6F9F F7BC CC1A 7B92 |
\=====================================================================/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.trilug.org/pipermail/trilug/attachments/20030313/8e88f0e2/attachment.pgp>


More information about the TriLUG mailing list