[TriLUG] A curious regular expression

T. Bryan tbryan at python.net
Tue Apr 10 00:17:23 EDT 2007


On Sunday 08 April 2007 16:32, Aaron S. Joyner wrote:
> Marty Ferguson wrote:
> > (.... get it... "world peace" ... the  [.*\@{,99}]#&% regex hippie)
>
> Forgive me for latching onto a rather curious portion of your message,
> but that REGEX is a most unusual one.it's syntactically valid (your
> brackets match, your braces contain an actual numerical range
> representation, etc), so I have to think you spent a moment constructing
> it, but it's very unlikely to match a sensible string?
>
> Potential matches include:
> .#&%
> *#&%
> .@#&%
> *@@#&%

Huh?  I missed the first part of the thread, but which REGEX language are you 
talking about?  If we're talking Perlish regex, don't the brackets make it a 
character class?  That is, the {,99} doesn't indicate a quantifier, it just 
adds the characters '{', ',', '9', and '}' to the character class.

#---------
#!/usr/bin/env python
import re
regex = re.compile("[.*\@{,99}]#&%")
for s in [".#&%", "*#&%", ".@#&%", "*@@#&%", "{#&%"]:
    if regex.match(s) != None: 
        print s, "matched!"
    else: 
        print s, "didn't match."
#---------

.#&% matched!
*#&% matched!
.@#&% didn't match.
*@@#&% didn't match.
{#&% matched!

Of course, I'm cheating a bit there.  regex.search actually matches all of 
those strings.  regex.match forces the match to start at the start of the 
string that we're checking.  I did that because you said...

> or any combination of the leading . and * followed by 0 to 99 @'s, 
> then followed by the string #&$.

See?  They'll still match the regex, but the 0 to 99 @'s isn't relevant.  The 
regex [.*\@{,99}]#&% matches the string that consists of one of the following 
characters
. * @ { , 9 }
followed by the string
#&%

So, 99 @'s followed by #&% would be a match, but only on the final @#&%.  The 
previous 98 @'s aren't part of the matching text.  You could just as easily 
had a string like 
s = "Some long sentence with exactly ninety-eight characters in it that 
doesn't match the regex itself @#&%"

So, the regex matches the strings
.#&%
*#&%
@#&%
{#&%
,#&%
9#&%
}#&%

You can, of course, still find a match within a larger string, but I don't 
think that's what you were saying in your reply.

Gosh, I killed 20 minutes on this e-mail.  Next time I'll delete the thread 
*without* a brief scan to see if any interesting tangents popped up.  ;-)

---Tom



More information about the TriLUG mailing list