Figuring the arcane arts of Regex

Dig_Gil
9 years ago | edited 9 years ago

1

I’ve been learning Regex. Specially when applied to PHP. But there some things which seem over my head.
Something I’ve been trying to do is replacing different kinds of simple gibberish in a text for useful HTML (kinda like parsing BBCode).
For example:
This is just an image img([SRC]). But it can be an hyperlink image img([SRC][LINK]).
Seems easy. I can use preg_replace with such regex:
Pattern: @img$\[(.*?)\]$@si Replace for: <img src='$1' />
And it will replace every img([SRC]) in the text. But if we want to replace the more complex image I guess I could do:
Pattern: @img$\(?:[(.*?)\])+?$@si Replace for: <a href='$2'><img src='$1' /></a>
But it doesn’t work as I expected. I thought about preg_match_all, which outputs an array of groups, but it doesn’t take arrays as input like preg_replace and that’s inconvenient.
I feel there’s something stupid in my understanding but I can’t point out what.

Luke [flabbyrabbit]
9 years ago

0

This was my initial thought but it doesn’t work as I expected either. For the first block it returns SRC and for the second it only returns LINK.
```$str = “This is just an image img([SRC]). But it can be an hyperlink image img([SRC][LINK]).”;
$pattern = “/img((?:[([^]]+)])+)/i”;

$str = preg_replace_callback(
$pattern,
function ($matches) {
print_r($matches);
},
$str
);

echo $str;```

My next thought would be just to get the string between the brackets and use the callback function to do further processing.

dloser
9 years ago

0

And this is why I usually quote the following:

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

Even in this relatively simple case there is already too much confusion. You have to read the documentation carefully (and/or understand the underlying mechanisms) to understand what is going on. For example, in this case the documentation tells us: “When a capturing subpattern is repeated, the value captured is the substring that matched the final iteration.” In the original patterns posted, there are some oddities like ‘((?:’ and odd use of ‘?’ after repetitions.

Besides this confusion, it is hard to give proper feedback when putting everything in a regular expression. If you use this on big blocks of text and somewhere is a mistake like ‘img([…]a[…])’, how are you going to know? Using a simple parser is probably much more flexible and maintainable.

Dig_Gil
9 years ago

0

So I figured that this problem couldn’t be solved with simple regex alone. Maybe some Regex whiz will eventually tell me otherwise. I heard about people who contest for managing Regex tasks with one-liners (like 80’s action heros :D ). This little challenge was just to learn Regex myself. Nothing serious. But dloser’s quote is amusing still.

For those interested here’s how I managed the pretended effect:

$pattern="@(\S*?)\((.*?)\)@s";  

$num=preg_match_all($pattern, $text, $matches); // $matches[1][num] gives us the function to use, $matches[2][num] gives us the arguments. num is the correspondent number to position on the text.  


foreach($matches[0] as $num=>$match)  
    {  
    switch($matches[1][$num]){  
        case "u":  
            $reput="<span style='text-decoration:underline;' >".$matches[2][$num]."</span>";  
        break;  
        case "f":  
            $args=explode("||",$matches[2][$num]);  

            $rule[1]="<span style='color:".$args[0].";' >".$args[0]."</span>"; //You write a color code and it the text becomes that color automatically! Awesome, no?  
            $rule[2]="<span style='color:".$args[1].";' >".$args[0]."</span>";  
            $rule[3]="<span style='color:".$args[1].";font-size:".$args[2].";' >".$args[0]."</span>";  
            $rule[4]="<span style='color:".$args[1].";font-size:".$args[2].";font-family:".$args[3].";' >".$args[0]."</span>";  

            $reput=$rule[count($args)];  

        break;  
        case "img":  
            $args=explode("||",$matches[2][$num]);  

            $rule[1]="<img src='".$args[0]."' />";  
            $rule[2]="<a href='".$args[1]."'><img src='".$args[0]."' /></a>";  

            $reput=$rule[count($args)];  
        break;  
        }  
    $text=str_replace($match, $reput, $text);  
    }  

return nl2br($text);

I’ve thrown in some usage examples too. But the important point is that one preg_match_all will grab something like
img(SRC||LINK)
And output an array with that expression, another array with the img part and an array with SRC||LINK which is exploded and the different “parameter” used as need.