[TriLUG] A curious regular expression
T. Bryan
tbryan at python.net
Tue Apr 10 00:17:23 EDT 2007
On Sunday 08 April 2007 16:32, Aaron S. Joyner wrote:
> Marty Ferguson wrote:
> > (.... get it... "world peace" ... the [.*\@{,99}]#&% regex hippie)
>
> Forgive me for latching onto a rather curious portion of your message,
> but that REGEX is a most unusual one.it's syntactically valid (your
> brackets match, your braces contain an actual numerical range
> representation, etc), so I have to think you spent a moment constructing
> it, but it's very unlikely to match a sensible string?
>
> Potential matches include:
> .#&%
> *#&%
> .@#&%
> *@@#&%
Huh? I missed the first part of the thread, but which REGEX language are you
talking about? If we're talking Perlish regex, don't the brackets make it a
character class? That is, the {,99} doesn't indicate a quantifier, it just
adds the characters '{', ',', '9', and '}' to the character class.
#---------
#!/usr/bin/env python
import re
regex = re.compile("[.*\@{,99}]#&%")
for s in [".#&%", "*#&%", ".@#&%", "*@@#&%", "{#&%"]:
if regex.match(s) != None:
print s, "matched!"
else:
print s, "didn't match."
#---------
.#&% matched!
*#&% matched!
.@#&% didn't match.
*@@#&% didn't match.
{#&% matched!
Of course, I'm cheating a bit there. regex.search actually matches all of
those strings. regex.match forces the match to start at the start of the
string that we're checking. I did that because you said...
> or any combination of the leading . and * followed by 0 to 99 @'s,
> then followed by the string #&$.
See? They'll still match the regex, but the 0 to 99 @'s isn't relevant. The
regex [.*\@{,99}]#&% matches the string that consists of one of the following
characters
. * @ { , 9 }
followed by the string
#&%
So, 99 @'s followed by #&% would be a match, but only on the final @#&%. The
previous 98 @'s aren't part of the matching text. You could just as easily
had a string like
s = "Some long sentence with exactly ninety-eight characters in it that
doesn't match the regex itself @#&%"
So, the regex matches the strings
.#&%
*#&%
@#&%
{#&%
,#&%
9#&%
}#&%
You can, of course, still find a match within a larger string, but I don't
think that's what you were saying in your reply.
Gosh, I killed 20 minutes on this e-mail. Next time I'll delete the thread
*without* a brief scan to see if any interesting tangents popped up. ;-)
---Tom
More information about the TriLUG
mailing list