3

I need to extract a predefined set of hashtags from a blob of text, then extract what number follows right after it if any. Eg. I'd need to extract 30 from "Test string with #other30 hashtag". I assumed preg_match_all would be the right choice.

Some test code:

$hashtag = '#other';
$string  = 'Test string with #other30 hashtag';
$matches = [];
preg_match_all('/' . $hashtag . '\d*/', $string, $matches);
print_r($matches);

Output:

Array
(
    [0] => Array
        (
            [0] => #other30
        )
)

Perfect... Works as expected. Now to extract the number:

$string = $matches[0][0]; // #other30
$matches = [];
preg_match_all('/\d*/', $string, $matches);
print_r($matches);

Output:

Array
(
    [0] => Array
        (
            [0] =>
            [1] =>
            [2] =>
            [3] =>
            [4] =>
            [5] =>
            [6] => 30
            [7] =>
        )
)

What? Looks like it's trying to match every character?

I'm aware of some preg_match_all related answers (one, two), but they all use a parenthesized subpattern. According to documentation - it is optional.

What am I missing? How do I simply get all matches into an array that match such a basic regex like /\d*/ There doesn't seem to be a more appropriate function in php for that.

I never thought I'd be scratching my head with such a basic thing in PHP. Much appreciated.

4 Answers 4

2

You need to replace:

preg_match_all('/\d*/', $string, $matches);

with:

preg_match_all('/\d+/', $string, $matches);

Replace * with +

Because

* Match zero or more times.

+ Match one or more times.

Sign up to request clarification or add additional context in comments.

1 Comment

Such a silly mistake... * matches zero or more of the preceding element... It tries to match zero digits. Duh
1

You can use a capturing group:

preg_match_all('/' . $hashtag . '(\d*)/', $string, $matches); 
echo $matches[1][0] . "\n";
//=> 30

Here (\d*) will capture the number after $hashtag.

1 Comment

Even better! Thank you!
1

Also see, that you can reset after a certain point to get part of a match by using \K. And of course need to use \d+ instead of \d* to match one or more digits. Else there would be matches in gaps in between the characters where zero or more digits matches.

enter image description here

So your code can be reduced to

$hashtag = '#other';
$string  = 'Test string with #other30 #other31 hashtag';
preg_match_all('/' . $hashtag . '\K\d+/', $string, $matches);
print_r($matches[0]);

See the demo at eval.in and consider using preg_quote for $hashtag.

1 Comment

Interesting, I wasn't aware of the \K flag. Thank you!
0

PHP Fiddle

<?php

    $hashtag = '#other';
    $string  = 'Test string with #other30 hashtag';
    $matches = [];
    preg_match_all('/' . $hashtag . '\d*/', $string, $matches);
    $string = preg_match_all('#\d+#', $matches[0][0], $m);
    echo $m[0][0];

?>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.