[TriLUG] A curious regular expression
William Sutton
william at trilug.org
Tue Apr 10 00:07:08 EDT 2007
> >
> > Potential matches include:
> > .#&%
> > *#&%
> > .@#&%
> > *@@#&%
>
> Huh? I missed the first part of the thread, but which REGEX language are you
> talking about? If we're talking Perlish regex, don't the brackets make it a
> character class? That is, the {,99} doesn't indicate a quantifier, it just
> adds the characters '{', ',', '9', and '}' to the character class.
{x,y} in perl regex is in fact a quantifier; empirical tests show that
when x isn't specified, it is treated as '1'; thus, we're talking about
1-99 (inclusive) sequential '@' characters.
Short perl program to illustrate:
#####
my @strings = (
'#&%', # no leading .*, no @*
'@#&%', # no leading.*, has @*
'+#&%', # leading .*, no @*
'+@#&%', # leading .*, has @*
);
foreach my $string (@strings)
{
print "string $string "
. ($string =~ m/[.*\@{,99}]#&%/ ? "matches" : "does not match") .
"\n";
}
#####
Now then, the regex (in perl, dunno what regex language was originally
being used) is as follows:
[ # character class
.* # 0 or more characters
\@{,99} # and 1-99 '@' characters
] # end character class
#&% # followed by '#&%'
in other words, you must have at least a single '@' somewhere in the
character class; before or after .* doesn't matter; can have 0 or more
characters (of unspecified value) either before or after the '@'
character(s), and the string has to also contain '#&%' following the
character class.
William
>
> #---------
> #!/usr/bin/env python
> import re
> regex = re.compile("[.*\@{,99}]#&%")
> for s in [".#&%", "*#&%", ".@#&%", "*@@#&%", "{#&%"]:
> if regex.match(s) != None:
> print s, "matched!"
> else:
> print s, "didn't match."
> #---------
>
> .#&% matched!
> *#&% matched!
> .@#&% didn't match.
> *@@#&% didn't match.
> {#&% matched!
>
> Of course, I'm cheating a bit there. regex.search actually matches all of
> those strings. regex.match forces the match to start at the start of the
> string that we're checking. I did that because you said...
>
> > or any combination of the leading . and * followed by 0 to 99 @'s,
> > then followed by the string #&$.
>
> See? They'll still match the regex, but the 0 to 99 @'s isn't relevant. The
> regex [.*\@{,99}]#&% matches the string that consists of one of the following
> characters
> . * @ { , 9 }
> followed by the string
> #&%
>
> So, 99 @'s followed by #&% would be a match, but only on the final @#&%. The
> previous 98 @'s aren't part of the matching text. You could just as easily
> had a string like
> s = "Some long sentence with exactly ninety-eight characters in it that
> doesn't match the regex itself @#&%"
>
> So, the regex matches the strings
> .#&%
> *#&%
> @#&%
> {#&%
> ,#&%
> 9#&%
> }#&%
>
> You can, of course, still find a match within a larger string, but I don't
> think that's what you were saying in your reply.
>
> Gosh, I killed 20 minutes on this e-mail. Next time I'll delete the thread
> *without* a brief scan to see if any interesting tangents popped up. ;-)
>
> ---Tom
>
More information about the TriLUG
mailing list