[TriLUG] stopping Cyrillic spam.

Daniel Sterling dan at lost-habit.com
Sun Jan 28 01:05:06 EST 2007


Cristóbal Palmer wrote:
> We're already using content checks... and other techniques.
Excellent! I hate to be repetitive, but please keep using statistical
analysis! I run spamassassin with the bayes *off*. Spam that
spamassassin misses is filtered by Thunderbird's built in statistical
analysis. I have a silly setup like this mostly because it works and I
am too lazy to change it.

Anyway, my Thunderbird's filters are catching the Cyrillic spam. I
noticed that the following fun keyword is in mine:

charset="windows-1251"

windows-1251 is the Cyrillic encoding. You can definitely trash messages with that string.

Also, you may or may not have good luck with the following bit of regex: [\x{400}-\x{52f}] -- let me know! (I suppose it mostly depends on whether or not the string to be matched against is using byte or character semantics.)

-- Dan





More information about the TriLUG mailing list