[TriLUG] A curious regular expression

Tue Apr 10 13:29:28 EDT 2007

You are so right.  I sit corrected :)

-- 
William Sutton

On Tue, 10 Apr 2007, Tom Bryan wrote:

> William Sutton wrote:
> 
> >> Huh?  I missed the first part of the thread, but which REGEX language are you 
> >> talking about?  If we're talking Perlish regex, don't the brackets make it a 
> >> character class?  That is, the {,99} doesn't indicate a quantifier, it just 
> >> adds the characters '{', ',', '9', and '}' to the character class.
> 
> > {x,y} in perl regex is in fact a quantifier; empirical tests show that 
> > when x isn't specified, it is treated as '1'; thus, we're talking about 
> > 1-99 (inclusive) sequential '@' characters.
> 
> Sure, but *not* inside a character class.  That was the entire point of 
> my previous message.  I'm not a master of regular expressions, but 
> character classes don't work the way you're saying they do.  :)
> 
> >                '+@#&%',    # leading .*, has @*
> 
> As I said before, the + isn't part of the match.  It's a red herring. 
> Just because 'a\wc' matches 'abc' and 'azc' and 'xyzabc' does not mean 
> that the 'xyz' in the last example are part of the match.  They're just 
> extra characters that the regex scans past on its way to a the matching 
> string. :)
> 
> > Now then, the regex (in perl, dunno what regex language was originally 
> > being used) is as follows:
> > 
> > [		# character class
> > 	.*	# 0 or more characters
> > 	\@{,99}	# and 1-99 '@' characters
> > ]		# end character class
> > #&%		# followed by '#&%'
> 
> And I would disagree entirely.  Once you're in the character class, the 
> ., *, and { are just characters.  They have no special meaning.
> 
> > in other words, you must have at least a single '@' somewhere in the 
> > character class; 
> 
> No, you don't.  See below.
> 
> > before or after .* doesn't matter; 
> 
> No.  Here, the .* are just other options.  It can be a . or * or @.  I 
> can put something before the @, but that's irrelevant.  It's *not* part 
> of the string that matches the regular expression.
> 
>  > can have 0 or more
> > characters (of unspecified value) either before or after the '@' 
> > character(s), and the string has to also contain '#&%' following the 
> > character class.
> 
> I'll say it again.  You must have one of
> . * @ { , 9 }
> followed by
> #&%
> 
> 
> Here's a modified version of your program.  Let's use the $& special 
> variable here to show what part of my matching strings actually matched 
> the regular expression.  That might clear some things up.  I also 
> include some counter examples that show that I can match without an @ in 
> the string.  I also show that when I match longer string (0 or more 
> characters plus 1-99 @'s), the characters preceding the @#&% at the end 
> of the string are not actually part of the match.
> 
> ####
> my @strings = (
>                 '#&%',      # no leading .*, no @*
>                 '@#&%',     # no leading.*, has @*
>                 '+#&%',     # leading .*, no @*
>                 '+@#&%',    # leading .*, has @*
> 	       'not_part_of_the_match@@@@@#&%',  # matches last 4 chars
>                 ',#&%',     # also a valid match, no @'s
>                 '9#&%',     # also a valid match, no @'s
>                 '{#&%',     # also a valid match, no @'s
>                 '}#&%',     # also a valid match, no @'s
>                 '*#&%'      # also a valid match, no @'s
>                );
> 
> foreach my $string (@strings)
> {
>      print "string $string ";
>      if ($string =~ m/[.*\@{,99}]#&%/)
>      {
> 	print "matches this text '$&'.\n";
>      }
>      else
>      {
>          print "does not match.\n";
>      }
> }
> ####
> 
> string #&% does not match.
> string @#&% matches this text '@#&%'.
> string +#&% does not match.
> string +@#&% matches this text '@#&%'.
> string not_part_of_the_match@@@@@#&% matches this text '@#&%'.
> string ,#&% matches this text ',#&%'.
> string 9#&% matches this text '9#&%'.
> string {#&% matches this text '{#&%'.
> string }#&% matches this text '}#&%'.
> string *#&% matches this text '*#&%'.
> 
> Regards,
> ---Tom
>