[TriLUG] Bayesian filtering

Turnpike Man turnpike420 at yahoo.com
Tue Sep 2 11:56:23 EDT 2003


HA! I used to work for this guy.... back in 97/98 at Productivity Point Intl!!

David M.

--- Jim Ray wrote:
> i thought some of you would appreciate and actually understand this article
> from Tech Republic:
> 
> Use DSPAM to reduce spam from a Linux mail server
> 
> http://techrepublic.com.com/5102-6261-5063820.html
> 
> August 27, 2003
> Scott Lowe MCSE
> 
> Spam: It’s what’s for dinner. And breakfast. And lunch. And every snack in
> between. Wherever you turn these days, spam is invading inboxes everywhere,
> quickly making the jump from an annoyance to major business problem. Spam is
> much more operating system-agnostic than many e-mail viruses, so you can
> find a host of anti-spam solutions for a variety of products on a variety of
> platforms.
> 
> One solution for UNIX and Linux mail servers is DSPAM, which acts as the
> local delivery agent for the server and learns to recognize spam to ease the
> administrative burden of constantly keeping up with blacklists. DSPAM uses a
> Bayesian statistical analysis to improve the success rate and reduce the
> percentage of false positives.
> 
> ----------------------------------------------------------------------------
> ----
> What's Bayesian analysis?
> "Bayesian," according to Merriam-Webster Online, is “being, relating to, or
> concerned with a theory (as of decision making or statistical inference)
> involving the application of Bayes' theorem and the use of probabilities
> based on prior knowledge and accumulated experience.” Simply put, DSPAM uses
> an analysis of past results to continually improve its spam-detection rate,
> resulting in a higher success rate as time goes on.
> ----------------------------------------------------------------------------
> ----
> 
> System requirements
> DSPAM requires a mailer agent that is capable of using a configurable local
> delivery agent and the Berkeley DB4 database. The Berkeley DB4 database is
> an easy installation, and full instructions are provided in its accompanying
> README file. As of this writing, the current version of DSPAM is 2.6.3, and
> you can download it here. Let's walk through the process of installing and
> configuring DSPAM.
> 
> ----------------------------------------------------------------------------
> ----
> My lab configuration
> For this article, I am using Red Hat 9 and my mail server is Sendmail.
> ----------------------------------------------------------------------------
> ----
> 
> Installing DSPAM
> First, download the latest version of DSPAM from the link above. For my
> example, the filename is dspam-2.6.tar.gz. From the directory where you have
> saved the download, execute the following command to expand the
> distribution:
> gunzip -dc dspam-2.6.tar.gz | tar xvf -
> 
> Now, change to the expanded directory with the command dspam-2.6. You can
> build the configuration for DSPAM using a typical configure command with the
> options shown in Table A.
> 
> 
> Table A: DSPAM configuration options Parameter  Description  Default
> --with-local-delivery-agent=[mail program]  Use the program specified as the
> local mail delivery agent.  Depends on your system.
> --with-userdir=[user directory]  Specify the directory where user
> dictionaries, signatures, etc. should be stored.  /etc/mail/dspam
> --with-signature-life=[# of days]  The number of days for the signature
> life.  14 days
> --with-db4-includes=[Location of DB4 includes]  Where to find Berkeley DB
> 4.1.x headers  Depends on DB4 install.
> 
> 
> Since I did a typical install using Sendmail, I could use the following
> command to begin the installation process:
> ./configure --with-db4-includes=/usr/local/BerkeleyDB.4.1/include/
> 
> I included the path to the DB4 includes to make sure that the configuration
> script could find them. Unfortunately, on my Red Hat Linux 9 system, the
> configuration failed with an error relating to the Berkeley DB 4 libraries,
> even though I provided the location to find them. After finding the source
> of the error and visiting the helpful user discussion forums at the DSPAM
> Web site, I issued the following command before executing the configure
> script again:
> export
> LDFLAGS='-Wl,--rpath -Wl,/usr/local/BerkeleyDB.4.1/lib -Wl,--library-path -W
> l,/usr/local/BerkeleyDB.4.1/lib'
> 
> The LDFLAGS variable passes options that will be used during the
> configuration phase of the installation.
> 
> Once the command prompt comes back and there are no errors, compile DSPAM
> using the make command. To install the compiled binaries into their final
> location, execute make install. This step needs to be performed as the root
> user. After this completes successfully, DSPAM is ready to be used by your
> mail program.
> 
> Changes to the Sendmail configuration
> Once DSPAM is installed, you need to modify your Sendmail configuration to
> use DSPAM as the local delivery agent. Doing this will force mail through
> the DSPAM engine so that it can do its job.
> 
> Changing the local delivery agent to the DSPAM executable is accomplished by
> modifying the Sendmail configuration file, sendmail.cf. Be sure to make a
> copy of sendmail.cf before changing it.
> 
> To make DSPAM active, find the line at the bottom of sendmail.cf labeled
> Mlocal. If you are not using procmail, the first option after Mlocal will
> read something like P=/bin/mail. In this case, replace the contents of the
> Mlocal line with the following:
> Mlocal, P=/usr/local/bin/dspam, F=lsDFMAw5:/|@qfSmn9, S=EnvFromL/HdrFromL,
> R=EnvToL/HdrToL,
>        T=DNS/RFC822/X-Unix,
>        A=dspam -d $u
> 
> If you are using procmail, which is identifiable by looking at the original
> Mlocal line, you need to use a slightly different configuration. With
> procmail, the first configuration option on the Mlocal line will read
> P=/usr/bin/procmail, and you will replace the contents of the Mlocal line
> with the following:
> Mlocal, P=/usr/local/bin/dspam, F=lsDFMAw5:/|@qSPfhn9, S=EnvFromL/HdrFromL,
> R=EnvToL/HdrToL,
>        T=DNS/RFC822/X-Unix,
>        A=dspam -t -Y -a $h -d $u
> 
> If you installed DSPAM to a different location, provide that location in
> place of /usr/local/bin/dspam.
> 
> Adding mail aliases
> DSPAM works by having the user forward spam to a unique account that is just
> for this purpose. For each user who you want to use DSPAM, you need to add a
> spam alias to the aliases file, which is typically located in either /etc or
> /etc/mail. On my Red Hat 9 system, it is in /etc.
> 
> Use a text editor to edit this file and add an entry similar to the
> following for each user:
> spam-slowe: "|/usr/local/bin/dspam -d slowe --addspam"
> 
> The first part, spam-slowe, is simply an existing user ID with spam- as the
> prefix. The second part, |/usr/local/bin/dspam, will pipe mail received to
> this account through the executable you named (in this case, the DSPAM
> executable). The -d slowe portion indicates that the name of the dictionary
> is slowe. A separate dictionary is created for each use. Finally, --addspam
> indicates that the mail will be used to process future spam.
> 
> After you have added an alias to the aliases file, run the command
> newaliases to rebuild the aliases dictionary, aliases.db.
> 
> DSPAM with smrsh
> If you are using a Sendmail system that uses smrsh (Sendmail restricted
> shell), you also need to add DSPAM's executable as a program that is allowed
> to be used by Sendmail. This is as easy as placing a link to the DSPAM
> executable in the smrsh configuration directory, which is typically
> /etc/smrsh. The following two commands accomplish this goal:
> cd /etc/smrsh
> ln -s /usr/local/bin/dspam dspam
> 
> If you use smrsh and fail to do this, you will be unable to forward spam to
> the spam identification accounts, and DSPAM will be unable to learn its job.
> 
> Using DSPAM
> At this point, you should have a working DSPAM/Sendmail system with
> appropriate aliases for your users. Now, if your users receive spam, they
> should forward it to the "spam-username" alias you set up for them. As DSPAM
> learns what kind of mail the user considers spam, it will eventually begin
> simply blocking the spam items. In general, DSPAM can begin blocking with
> fewer than 50 e-mails forwarded to the spam agent, but it takes 200 to 300
> for it to be truly useful.
> 
> As a test, I sent a few e-mails to the root user's spam account on my lab
> system to see what kind of statistics DSPAM compiled. I can get details on
> DSPAM's statistics by executing /usr/local/bin/dspam_stats. For the root
> user, I got the following statistics:
> root 0 TS 7 TI 1 TM 0 FP
> 
> This indicates that seven innocent messages and one spam miss have been
> recorded, while no spam messages have been caught, and there have been no
> false positives.
> 
> Administrative tasks
> You need to perform some administrative tasks to keep DSPAM running
> efficiently and to keep it from gobbling up too much disk space. Each night,
> you should run a cron job that runs the dspam_clean program to clean the
> signature database. To do this, add the following line to the nightly cron
> job:
> 0 0 * * * /usr/local/bin/dspam_clean
> 
> Every five days or so, you should also run the dspam_purge program to
> optimize the user dictionary files. The following cron configuration will do
> the trick:
> 0 0 5,10,15,20,25,30 * * * /usr/local/bin/dspam_purge
> 
> Effective and free
> DSPAM is not difficult to configure and maintain, and it can save an
> organization both the administrative hassle and the financial burden that is
> quickly mounting because of the massive amounts of spam that employees have
> to deal with. Best of all, DSPAM is free, making it much more economical to
> use than most other spam-fighting products.
> 
> 
> -- 
> TriLUG mailing list        : http://www.trilug.org/mailman/listinfo/trilug
> TriLUG Organizational FAQ  : http://trilug.org/faq/
> TriLUG Member Services FAQ : http://members.trilug.org/services_faq/
> TriLUG PGP Keyring         : http://trilug.org/~chrish/trilug.asc


__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com



More information about the TriLUG mailing list