[TriLUG] A curious regular expression
William Sutton
william at trilug.org
Tue Apr 10 13:29:28 EDT 2007
You are so right. I sit corrected :)
--
William Sutton
On Tue, 10 Apr 2007, Tom Bryan wrote:
> William Sutton wrote:
>
> >> Huh? I missed the first part of the thread, but which REGEX language are you
> >> talking about? If we're talking Perlish regex, don't the brackets make it a
> >> character class? That is, the {,99} doesn't indicate a quantifier, it just
> >> adds the characters '{', ',', '9', and '}' to the character class.
>
> > {x,y} in perl regex is in fact a quantifier; empirical tests show that
> > when x isn't specified, it is treated as '1'; thus, we're talking about
> > 1-99 (inclusive) sequential '@' characters.
>
> Sure, but *not* inside a character class. That was the entire point of
> my previous message. I'm not a master of regular expressions, but
> character classes don't work the way you're saying they do. :)
>
> > '+@#&%', # leading .*, has @*
>
> As I said before, the + isn't part of the match. It's a red herring.
> Just because 'a\wc' matches 'abc' and 'azc' and 'xyzabc' does not mean
> that the 'xyz' in the last example are part of the match. They're just
> extra characters that the regex scans past on its way to a the matching
> string. :)
>
> > Now then, the regex (in perl, dunno what regex language was originally
> > being used) is as follows:
> >
> > [ # character class
> > .* # 0 or more characters
> > \@{,99} # and 1-99 '@' characters
> > ] # end character class
> > #&% # followed by '#&%'
>
> And I would disagree entirely. Once you're in the character class, the
> ., *, and { are just characters. They have no special meaning.
>
> > in other words, you must have at least a single '@' somewhere in the
> > character class;
>
> No, you don't. See below.
>
> > before or after .* doesn't matter;
>
> No. Here, the .* are just other options. It can be a . or * or @. I
> can put something before the @, but that's irrelevant. It's *not* part
> of the string that matches the regular expression.
>
> > can have 0 or more
> > characters (of unspecified value) either before or after the '@'
> > character(s), and the string has to also contain '#&%' following the
> > character class.
>
> I'll say it again. You must have one of
> . * @ { , 9 }
> followed by
> #&%
>
>
> Here's a modified version of your program. Let's use the $& special
> variable here to show what part of my matching strings actually matched
> the regular expression. That might clear some things up. I also
> include some counter examples that show that I can match without an @ in
> the string. I also show that when I match longer string (0 or more
> characters plus 1-99 @'s), the characters preceding the @#&% at the end
> of the string are not actually part of the match.
>
> ####
> my @strings = (
> '#&%', # no leading .*, no @*
> '@#&%', # no leading.*, has @*
> '+#&%', # leading .*, no @*
> '+@#&%', # leading .*, has @*
> 'not_part_of_the_match@@@@@#&%', # matches last 4 chars
> ',#&%', # also a valid match, no @'s
> '9#&%', # also a valid match, no @'s
> '{#&%', # also a valid match, no @'s
> '}#&%', # also a valid match, no @'s
> '*#&%' # also a valid match, no @'s
> );
>
> foreach my $string (@strings)
> {
> print "string $string ";
> if ($string =~ m/[.*\@{,99}]#&%/)
> {
> print "matches this text '$&'.\n";
> }
> else
> {
> print "does not match.\n";
> }
> }
> ####
>
> string #&% does not match.
> string @#&% matches this text '@#&%'.
> string +#&% does not match.
> string +@#&% matches this text '@#&%'.
> string not_part_of_the_match@@@@@#&% matches this text '@#&%'.
> string ,#&% matches this text ',#&%'.
> string 9#&% matches this text '9#&%'.
> string {#&% matches this text '{#&%'.
> string }#&% matches this text '}#&%'.
> string *#&% matches this text '*#&%'.
>
> Regards,
> ---Tom
>
More information about the TriLUG
mailing list