[TriLUG] Spamassassin question - Bayesian filtering
Jeremy Portzer
jeremyp at pobox.com
Thu Mar 13 17:37:55 EST 2003
On Thu, 2003-03-13 at 14:36, Jeremy Portzer wrote:
>
> Right, that's what I'm doing; with SA 2.50 the commands are "sa-learn
> --ham" and "sa-learn --ham", respectively (note double-dashes). But the
> problem that I'm trying to figure out is the fact that "autolearning"
> will already have put the uncaught spam into the "ham" side of the
> database, because it didn't realize it was spam. So if I do 'sa-learn
> --spam' will it automatically *remove* that spam's data from the ham
> database, since I'm now telling it that's it spam? Or do I need to run
> it through with the "--forget" option first ? If the same email is in
> both databases, then it will cancel itself out, which doesn't help the
> learning at all!
Here's something I just found that eases my concerns somewhat, from the
man page for configuration options:
<quote>
auto_learn_threshold_nonspam n.nn (default -2.0)
The score threshold below which a mail has to score, to be fed into
SpamAssassin's learning systems automatically as a non-spam message.
auto_learn_threshold_spam n.nn (default 15.0)
The score threshold above which a mail has to score, to be fed into
SpamAssassin's learning systems automatically as a spam message.
</quote>
These values are pretty conservative. The spam emails that I was
worried about normally have a score of 2 or 3, certainly not a negative
score, so it looks like they're not being put into the Bayesian database
anyway. So I don't need to worry about "forgetting" them from the "ham"
side of the DB.
This also shows that not all spam being caught (per default, score of 5
is tagged as spam) is going into the Bayesian system. So what I could
do is find everything that is spam, but with a score lower than 15, and
feed that to sa-learn manually. Hmm.
--Jeremy
--
/=====================================================================\
| Jeremy Portzer jeremyp at pobox.com trilug.org/~jeremy |
| GPG Fingerprint: 712D 77C7 AB2D 2130 989F E135 6F9F F7BC CC1A 7B92 |
\=====================================================================/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.trilug.org/pipermail/trilug/attachments/20030313/8e88f0e2/attachment.pgp>
More information about the TriLUG
mailing list