[TriLUG] Spamassassin question - Bayesian filtering

Jeremy Portzer jeremyp at pobox.com
Thu Mar 13 14:08:36 EST 2003


Hi Jon,

The Bayesian filtering that I've been working with uses per-user
databases in ~/.spamassassin, not a side-wite database. So the site-wide
administrator need not worry about it (if the users are savvy).  But you
make some good points about how to make it easier if there is a
site-wide database.

My question was more along the lines of, what's the proper way to 
"submit it back via the sa-learn command" (Specifically for missed
spams; I haven't seen any false positives yet.)   Do I use the --forget
option because the message would have been counted as non-spam earlier? 
Or do I just use the --spam option?  None of the documentation is
specific on this.

--Jeremy

On Thu, 2003-03-13 at 14:00, Jon Carnes wrote:
> The training looks like a pain in the a**, but I think you could make it
> easier on the folks by setting up some scripts to accept forwarded
> messages from your local users.
> 
> Local users would forward mistakenly tagged messages to one of two
> addresses:
>   sa_nospam - indicating that this shouldn't have been marked as spam
>   sa_spam - indicating that this spam message slipped through
> 
> It would be up to you Jeremy to have your script reshape the message to
> its original form and then submit it back via the sa-learn command.
> 
> Just an idea, but it may work (and be a good contrib back into the
> community).
> 
> Jon 
> 
> On Thu, 2003-03-13 at 13:11, Jeremy Portzer wrote:
> > Good afternoon folks,
> > 
> > I've been playing around with the new Spamassassin, version 2.50, which
> > includes Bayesian filtering (see http://www.paulgraham.com/spam.html for
> > the paper about this, mentioned at ESR's talk, and see the man page for
> > the "sa-learn" command).
> > 
> > As per the sa-learn man page, the default in SA 2.50 is to operate in
> > Unsupervised auto-learning.  This means that mail is populated in the
> > "ham/spam" databases based on whether SpamAssassin marks it as spam or
> > not, from the other rules.   The man page mentions that this "should be
> > supplemented with some supervised training in addition, if possible."
> > 
> > How do I go about "supplementing" the auto-learning mode?  One problem I
> > can see with auto-learning is that missed spams become marked as "ham"
> > (non-spam) and could mess up the database.  So I'm collecting these
> > mistakes, but how do I properly adjust the database?  Do I need to make
> > it "forget" the mistaken emails first, and then run them through
> > sa-learn with --ham?   Or is running them through with --ham enough?
> > 
> > Anyone know of resources/HOWTOs/examples with actual commands, instead
> > of generalized statements like "supplement with supervised training" ?
> > 
> > ====
> > 
> > If anyone else is interested in testing SpamAssassin, it is installed on
> > the TriLUG mail server now.  Just put something like this in your
> > .procmailrc :
> > 
> > :0fw
> > | /usr/bin/spamc
> > 
> > Then your spam will be marked with the X-Spam-Status header, which you
> > can filter on if you like.
> > 
> > Regards,
> > Jeremy
> > 
> > -- 
> > /=====================================================================\
> > | Jeremy Portzer       jeremyp at pobox.com       trilug.org/~jeremy     |
> > | GPG Fingerprint: 712D 77C7 AB2D 2130 989F  E135 6F9F F7BC CC1A 7B92 |
> > \=====================================================================/
> 
> 
> _______________________________________________
> TriLUG mailing list
>     http://www.trilug.org/mailman/listinfo/trilug
> TriLUG Organizational FAQ:
>     http://www.trilug.org/~lovelace/faq/TriLUG-faq.html
> 
-- 
/=====================================================================\
| Jeremy Portzer       jeremyp at pobox.com       trilug.org/~jeremy     |
| GPG Fingerprint: 712D 77C7 AB2D 2130 989F  E135 6F9F F7BC CC1A 7B92 |
\=====================================================================/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.trilug.org/pipermail/trilug/attachments/20030313/6169b6e2/attachment.pgp>


More information about the TriLUG mailing list