php preg_match_all returning array of arrays

Question

I want to replace some template tags:

$tags = '{name} text {first}';
preg_match_all('~\{(\w+)\}~', $tags, $matches);
var_dump($matches);

output is:

array(2) { 
          [0]=> array(2) { 
                         [0]=> string(6) "{name}" 
                         [1]=> string(7) "{first}" 
                         } 
          [1]=> array(2) { 
                         [0]=> string(4) "name" 
                         [1]=> string(5) "first" 
                         }
         }

why are there inside 2 arrays? How to achieve only second one?

just use var_dump($matches[1]);? That would provide only the second one-depth array... — Jeffrey
– Jeffrey, Commented Sep 26, 2015 at 13:38
If your goal is to make a replacement, don't take care of what returns preg_match_all since what you need is probably preg_replace_callback. — Casimir et Hippolyte
– Casimir et Hippolyte, Commented Sep 26, 2015 at 14:52

Elias Van Ootegem · Accepted Answer · 2015-09-26 14:33:42Z

The sort answer:

Is there an alternative? Of course there is: lookaround assertions allow you to use zero-width (non-captured) single char matches easily:

preg_match_all('/(?<=\{)\w+(?=})/', $tags, $matches);
var_dump($matches);

Will dump this:

array(1) {
  [0]=>
  array(2) {
    [0]=>
    string(4) "name"
    [1]=>
    string(5) "first"
  }
}

The pattern:

(?<=\{): positive lookbehind - only match the rest of the pattern if there's a { character in front of it (but don't capture it)
\w+: word characters are matches
(?=}): only match preceding pattern if it is followed by a } character (but don't capture the } char)

It's that simple: the pattern uses the {} delimiter chars as conditions for the matches, but doesn't capture them

Explaining this `$matches` array structure a bit:

The reason why $matches looks the way it does is quite simple: when using preg_match(_all), the first entry in the match array will always be the entire string matched by the given regex. That's why I used zero-width lookaround assertions, instead of groups. Your expression matches "{name}" in its entirety, and extracts "name" through grouping.
The matches array will hold the full match on index 0, and add groups at every subsequent index, in your case that means that:

$matches[0] will contain all substrings matching /\{\w+\}/ as a pattern.
$matches[1] will contain all substrings that were captured (/\{(\w+)\}/ captures (\w+)).

If you were to have a regex like this: /\{((\w)([^}]+))}/ the matches array will look something like this:

[
    0 => [
        '{name}',//as if you'd written /\{\w[^}]+}/
    ],
    1 => [
        'name',//matches group  (\w)([^}]+), as if you wrote (\w[^}]+)
    ],
    2 => [
        'n',//matches (\w) group
    ],
    3 => [
        'ame',//and this is the ([^}]+) group obviously
    ]
]

Why? simple because the pattern contains 3 matching groups. Like I said: the first index in the matches array will always be the full match, regardless of capture groups. The groups are then appended to the array in the order the appear in in the expression. So if we analyze the expression:

\{: not matches, but part of the pattern, will only be in the $matches[0] values
((\w)([^}]+)): Start of first matching group, \w[^}]+ match is grouped here, $matches[1] will contain these values
(\w): Second group, a single \w char (ie first character after {. $matches[2] will therefore contain all first characters after a {
([^}]+): Third group, matches rest of string after {\w until a } is encountered, this will make out the $matches[3] values

To better understand, and be able to predict the way $matches will get populated, I'd strongly recommend you use this site: regex101. Write your expression there, and it'll break it all down for you on the right hand side, listing the groups. For example:

/\{((\w)([^}]+))}/

Is broken down like this:

/\{((\w)([^}]+))}/
  \{ matches the character { literally
  1st Capturing group ((\w)([^}]+))
    2nd Capturing group (\w)
      \w match any word character [a-zA-Z0-9_]
    3rd Capturing group ([^}]+)
      [^}]+ match a single character not present in the list below
      Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
      } the literal character }
  } matches the character } literally

Looking at the capturing groups, you can now confidently say what $matches will look like, and you can safely say that $matches[2] will be an array of single characters.

Of course, this may leave you wondering why $matches is a 2D array. Well, that again is really quite easy: What you can predict is how many match indexes a $matches array will contain: 1 for the full pattern, then +1 for each capture group. What you Can't predict, though, is how many matches you'll find.
So what preg_match_all does is really quite simple: fill $matches[0] with all substrings that match the entire pattern, then extract each group substring from these matches and append that value onto the respective $matches arrays. In other words, the number of arrays that you can find in $matches is a given: it depends on the pattern. The number of keys you can find in the sub-arrays of $matches is an unknown, it depends on the string you're processing. If preg_match_all were to return a 1D array, it would be a lot harder to process the matches, now you can simply write this:

$total = count($matches);
foreach ($matches[0] as $k => $full) {
    echo $full . ' contains: ' . PHP_EOL;
    for ($i=1;$i<$total;++$i) {
        printf(
            'Group %d: %s' . PHP_EOL,
            $i, $matches[$i][$k]
        );
    }
}

If preg_match_all created a flat array, you'd have to keep track of the amount of groups in your pattern. Whenever the pattern changes, you'd also have make sure to update the rest of the code to reflect the changes made to the pattern, making your code harder to maintain, whilst making it more error-prone, too

Philipp · Accepted Answer · 2015-09-26 13:26:42Z

Thats because your regex could have multiple match groups - if you have more (..) you would have more entries in your array. The first one[0] ist always the whole match.

If you want an other order of the array, you could use PREG_SET_ORDER as the 4. argument for preg_match_all. Doing this would result in the following

array(2) { 
          [0]=> array(2) { 
                         [0]=> string(6) "{name}" 
                         [1]=> string(7) "name" 
                         } 
          [1]=> array(2) { 
                         [0]=> string(4) "{first}" 
                         [1]=> string(5) "first" 
                         }
         }

this could be easier if you loop over your result in a foreach loop.

If you only interessted in the first match - you should stay with the default PREG_PATTERN_ORDER and just use $matches[1]

Collectives™ on Stack Overflow

php preg_match_all returning array of arrays

2 Answers 2

The sort answer:

Explaining this `$matches` array structure a bit:

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

The sort answer:

Explaining this $matches array structure a bit:

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Explaining this `$matches` array structure a bit: