1

I'm using preg_replace to match and replace improperly encoded UTF-8 characters with their proper characters. I've created a "old" array containing the wrong characters, and a corresponding "new" array with the replacements. Here is a snippet of each array:

$old = array(
  '/â€/',
  '/’/',
);
$new = array(
  '†',
  '’',
);

(Note: If you're curious about why I'm doing this, read more here)

A sample string that may contain the wrong data could be:

The programmer’s becoming very frustrated

Which should become:

The programmer's becoming very frustrated

I'm using this function:

$result = preg_replace($old, $new, $str);

But the subject is actually becoming:

The programmer†™s becoming very frustrated

It's clear that PHP is doing what I call a non-greedy match on the subject (not the correct term to use here, I know). preg_replace is executing the replacement on the first pair in the old/new array without considering if there may a different pattern in the pattern array that is more appropriate. If I reverse the order of the replacement pair, then it works as expected.

My question is: Is there an approach that will allow preg_replace to consider all elements of the pattern array before executing a replacement, or is my only option to re-order the arrays?

3 Answers 3

2

I don't think there is any option like that. However, you could use an associative array to store your replacements and sort it using uasort and strlen, so larger matches would come first and you wouldn't need to manage your array order manually.

Then you can use array_keys and array_values to act just like your separated $old and $new arrays.

$replacements = array(
    '†' => '/â€/',
    '’' => '/’/',
);

// sorts the replacements array by value string length keeping the indexes intact
uasort($replacements, function($a, $b) {
    return strlen($b) - strlen($a);
});

$str = 'The programmer’s becoming very frustrated';
$result = preg_replace(array_values($replacements), array_keys($replacements), $str);

EDIT: As @CasimiretHippolyte pointed out, using array_values is not necessary on the first parameter of the preg_replace function in this case. It would only return a copy from $replacements with numerical indexes but the order would be the same. Unless you need an array with identical structure to $old from your question, you do not need to use it.

Sign up to request clarification or add additional context in comments.

Comments

1

Order the arrays $old and $new in such way that the longest regex becomes first:

$old = array(
  '/’/',
  '/â€/',
);
$new = array(
  '’',
  '†',
);
$str = 'The programmer’s becoming very frustrated';
$result = preg_replace($old, $new, $str);
echo $result,"\n";

output:

The programmer’s becoming very frustrated

Comments

0

I don't believe there is a way to do this only using preg_replace. However you can easily do this sorting your array beforehand:

$replacements = array_combine($old, $new);
krsort($replacements);
$result = preg_repalce( array_keys($replacements), array_values($replacements), $string);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.