[TriLUG] Base 64 (perl) regular expressions

matt at noway2.thruhere.net matt at noway2.thruhere.net
Wed Mar 7 17:09:14 EST 2012


Once again I have been seeing an increase in the amount of Chinese SPAM
passing through my mail filters.  The problem seems to be that the
spammers are using UTF-8 with Base64 encoding to try and prevent
filtering.

This head lead me down the path of trying to understand the regular
expressions used in the subject line, which I have learned are Base64
encoded with the UTF-8.  This site
(http://it-blog.timk.de/it-blog/page/howto-find-chinese-or-russian-spam-encoded-in-utf-8-with-spamassassin.htm#more-29)
has a pretty good explanation of how the decoding is done using Perl
Regex.

Ultiamtely, I would like to tell Postfix to reject the messages based upon
the character set used in header with a rude response rather than process
the message and discard it via Spamassassin.

One of the regular expressions that triggers is: Subject =~
/(?:[\xe4][\xb8-\xbf][\x80-\xbf]|[\xe5-\xe9][\x80-\xbf][\x80-\xbf])/

I understand this is a match Subject on the hex codes of the character set
once the string has been base 64 decoded.

What I am not understanding, and would like to ask the perl experts here
is: what is the (?: part of the expression?  I don't recall reading about
that in any perl book and Googling is giving me some pages with it used,
but no explanation.

Would someone please shed some light on this one for me?






More information about the TriLUG mailing list