[TriLUG] [hopefully] quickie help with a regex
Kevin Hunter
hunteke at earlham.edu
Mon Aug 9 00:38:01 EDT 2010
At 8:35pm -0600 Sun, 08 Aug 2010, Warren Myers wrote:
> (technically this text is in XML/HTML documents (and yes, I know
> regexes are bad for HTML, but in this instance, it's what I
> need)),
You know the details, of course, but in my experience, of the folks
who've asked me for help in working with HTML and regexes, literally 19
times out of 20 it was not the approach they really wanted (just the
first one that came to mind).
> I'm looking to match text inside single quotes using PHP [...]but
> am having a little trouble with the formatting.
>
> I *think* what I want is:
> [\'][.]*]\']
^ ^
Thomas already pointed out the possibly mismatched brackets of the
character class.
> Is that right, or am I way off? It seems to only match sometimes.
It's difficult to say if you're way off without seeing the larger
context of the problem. Regexes are difficult not in concept, but in
implementation because the littlest detail can make them not match what
you had intended. The question you need to ask is "What -- /exactly/ --
am I trying to match?" Are embedded quotes accepted? How about
embedded newlines? Do I want the values of XML element attributes?
Does this script help elucidate anything for you?
-----
$ cat test.php
#!/usr/bin/php
<?php
$str = "<element attr1='asdf\'jkl' attr2='asd\nf'>";
echo "Regexes against string: -->$str<--\n";
$regexes = array(
"Match w/ greedy regex" => "/'.*'/",
"Match with nongreedy regex" => "/'.*?'/",
"Match embedded newline" => "/'[^']*'/",
# won't match the newline in current string, but try
# removing all the internal single quotes of $str
"Match attributes w/ anchor" => "/='.*?'/",
"Match w/ embedded quote" => "/'.*?(?<!\\\\)'/",
# That's really '(?<!\\)', but it's interpreted twice,
# so need to escape the backslash not once, but twice.
# 'negative look behind assertion' prevents an escaped
# quote. Not perfect, because, for example, the regex
# would miss the final quote of "'asdf\\'"
);
foreach ( $regexes as $description => $regex ) {
echo "\n$description\n";
if ( preg_match( $regex, $str, $matches ) ) {
foreach ( $matches as $k => $v) {
echo "$k -> $v\n";
}
}
}
?>
$ php test.php
# ...
-----
Cheers,
Kevin
More information about the TriLUG
mailing list