[TriLUG] A curious regular expression
Tom Bryan
tbryan at python.net
Tue Apr 10 13:23:18 EDT 2007
William Sutton wrote:
>> Huh? I missed the first part of the thread, but which REGEX language are you
>> talking about? If we're talking Perlish regex, don't the brackets make it a
>> character class? That is, the {,99} doesn't indicate a quantifier, it just
>> adds the characters '{', ',', '9', and '}' to the character class.
> {x,y} in perl regex is in fact a quantifier; empirical tests show that
> when x isn't specified, it is treated as '1'; thus, we're talking about
> 1-99 (inclusive) sequential '@' characters.
Sure, but *not* inside a character class. That was the entire point of
my previous message. I'm not a master of regular expressions, but
character classes don't work the way you're saying they do. :)
> '+@#&%', # leading .*, has @*
As I said before, the + isn't part of the match. It's a red herring.
Just because 'a\wc' matches 'abc' and 'azc' and 'xyzabc' does not mean
that the 'xyz' in the last example are part of the match. They're just
extra characters that the regex scans past on its way to a the matching
string. :)
> Now then, the regex (in perl, dunno what regex language was originally
> being used) is as follows:
>
> [ # character class
> .* # 0 or more characters
> \@{,99} # and 1-99 '@' characters
> ] # end character class
> #&% # followed by '#&%'
And I would disagree entirely. Once you're in the character class, the
., *, and { are just characters. They have no special meaning.
> in other words, you must have at least a single '@' somewhere in the
> character class;
No, you don't. See below.
> before or after .* doesn't matter;
No. Here, the .* are just other options. It can be a . or * or @. I
can put something before the @, but that's irrelevant. It's *not* part
of the string that matches the regular expression.
> can have 0 or more
> characters (of unspecified value) either before or after the '@'
> character(s), and the string has to also contain '#&%' following the
> character class.
I'll say it again. You must have one of
. * @ { , 9 }
followed by
#&%
Here's a modified version of your program. Let's use the $& special
variable here to show what part of my matching strings actually matched
the regular expression. That might clear some things up. I also
include some counter examples that show that I can match without an @ in
the string. I also show that when I match longer string (0 or more
characters plus 1-99 @'s), the characters preceding the @#&% at the end
of the string are not actually part of the match.
####
my @strings = (
'#&%', # no leading .*, no @*
'@#&%', # no leading.*, has @*
'+#&%', # leading .*, no @*
'+@#&%', # leading .*, has @*
'not_part_of_the_match@@@@@#&%', # matches last 4 chars
',#&%', # also a valid match, no @'s
'9#&%', # also a valid match, no @'s
'{#&%', # also a valid match, no @'s
'}#&%', # also a valid match, no @'s
'*#&%' # also a valid match, no @'s
);
foreach my $string (@strings)
{
print "string $string ";
if ($string =~ m/[.*\@{,99}]#&%/)
{
print "matches this text '$&'.\n";
}
else
{
print "does not match.\n";
}
}
####
string #&% does not match.
string @#&% matches this text '@#&%'.
string +#&% does not match.
string +@#&% matches this text '@#&%'.
string not_part_of_the_match@@@@@#&% matches this text '@#&%'.
string ,#&% matches this text ',#&%'.
string 9#&% matches this text '9#&%'.
string {#&% matches this text '{#&%'.
string }#&% matches this text '}#&%'.
string *#&% matches this text '*#&%'.
Regards,
---Tom
More information about the TriLUG
mailing list