[TriLUG] Spamassassin question - Bayesian filtering
Jeremy Portzer
jeremyp at pobox.com
Thu Mar 13 14:08:36 EST 2003
Hi Jon,
The Bayesian filtering that I've been working with uses per-user
databases in ~/.spamassassin, not a side-wite database. So the site-wide
administrator need not worry about it (if the users are savvy). But you
make some good points about how to make it easier if there is a
site-wide database.
My question was more along the lines of, what's the proper way to
"submit it back via the sa-learn command" (Specifically for missed
spams; I haven't seen any false positives yet.) Do I use the --forget
option because the message would have been counted as non-spam earlier?
Or do I just use the --spam option? None of the documentation is
specific on this.
--Jeremy
On Thu, 2003-03-13 at 14:00, Jon Carnes wrote:
> The training looks like a pain in the a**, but I think you could make it
> easier on the folks by setting up some scripts to accept forwarded
> messages from your local users.
>
> Local users would forward mistakenly tagged messages to one of two
> addresses:
> sa_nospam - indicating that this shouldn't have been marked as spam
> sa_spam - indicating that this spam message slipped through
>
> It would be up to you Jeremy to have your script reshape the message to
> its original form and then submit it back via the sa-learn command.
>
> Just an idea, but it may work (and be a good contrib back into the
> community).
>
> Jon
>
> On Thu, 2003-03-13 at 13:11, Jeremy Portzer wrote:
> > Good afternoon folks,
> >
> > I've been playing around with the new Spamassassin, version 2.50, which
> > includes Bayesian filtering (see http://www.paulgraham.com/spam.html for
> > the paper about this, mentioned at ESR's talk, and see the man page for
> > the "sa-learn" command).
> >
> > As per the sa-learn man page, the default in SA 2.50 is to operate in
> > Unsupervised auto-learning. This means that mail is populated in the
> > "ham/spam" databases based on whether SpamAssassin marks it as spam or
> > not, from the other rules. The man page mentions that this "should be
> > supplemented with some supervised training in addition, if possible."
> >
> > How do I go about "supplementing" the auto-learning mode? One problem I
> > can see with auto-learning is that missed spams become marked as "ham"
> > (non-spam) and could mess up the database. So I'm collecting these
> > mistakes, but how do I properly adjust the database? Do I need to make
> > it "forget" the mistaken emails first, and then run them through
> > sa-learn with --ham? Or is running them through with --ham enough?
> >
> > Anyone know of resources/HOWTOs/examples with actual commands, instead
> > of generalized statements like "supplement with supervised training" ?
> >
> > ====
> >
> > If anyone else is interested in testing SpamAssassin, it is installed on
> > the TriLUG mail server now. Just put something like this in your
> > .procmailrc :
> >
> > :0fw
> > | /usr/bin/spamc
> >
> > Then your spam will be marked with the X-Spam-Status header, which you
> > can filter on if you like.
> >
> > Regards,
> > Jeremy
> >
> > --
> > /=====================================================================\
> > | Jeremy Portzer jeremyp at pobox.com trilug.org/~jeremy |
> > | GPG Fingerprint: 712D 77C7 AB2D 2130 989F E135 6F9F F7BC CC1A 7B92 |
> > \=====================================================================/
>
>
> _______________________________________________
> TriLUG mailing list
> http://www.trilug.org/mailman/listinfo/trilug
> TriLUG Organizational FAQ:
> http://www.trilug.org/~lovelace/faq/TriLUG-faq.html
>
--
/=====================================================================\
| Jeremy Portzer jeremyp at pobox.com trilug.org/~jeremy |
| GPG Fingerprint: 712D 77C7 AB2D 2130 989F E135 6F9F F7BC CC1A 7B92 |
\=====================================================================/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://www.trilug.org/pipermail/trilug/attachments/20030313/6169b6e2/attachment.pgp>
More information about the TriLUG
mailing list