[TriLUG] Spamassassin question - Bayesian filtering

Thu Mar 13 14:26:40 EST 2003

On Thu, Mar 13, 2003 at 02:08:36PM -0500, Jeremy Portzer wrote:
> My question was more along the lines of, what's the proper way to 
> "submit it back via the sa-learn command" (Specifically for missed
> spams; I haven't seen any false positives yet.)   Do I use the --forget
> option because the message would have been counted as non-spam earlier? 
> Or do I just use the --spam option?  None of the documentation is
> specific on this.

Hi, Jeremy,

There was a recent discussion about spam filtering and the bayesian
filtering of spamassassin on the mutt-users lists.  Note that I have no
personal experience with spamassassin, yet.

The thread -- with subject "OT: spam" that started 2/27/03 and is still
getting some replies trickling in today -- can be found in a web-based
archiver here:

  http://marc.theaimsgroup.com/?t=104634355700008&r=1&w=2

The posts that might provide answers to your questions are these:

  http://marc.theaimsgroup.com/?l=mutt-users&m=104639189329937&w=2

This describes how to save the false positives to an mbox style mailbox
and the commands to feed those mailboxes to SA to update the spam/ham
databases.  The short answer is using the following commands:

  sa-learn-spam --mbox uncaught-spam-mbox
  sa-learn-nonspam --mbox false-positive-mbox

There was a follow-up comment that spamassassain v2.50 combined those
two programs into a single sa-learn program with "-spam" and "-ham"
options that give the same affect as sa-learn-spam and sa-learn-nonspam,
respectively.

This post is also interesting:

  http://marc.theaimsgroup.com/?l=mutt-users&m=104648274900735&w=2

It describes one user's finding that the weighting for the bayesian
filtering didn't match his expectations and desires and gives the new
weighting that he configured.

Hope this helps,
Mike

-- 
Mike Broome
mbroome(at)employees.org