Using preg_replace to replace all occurrences in php

Question

Regex is absolutely my weak point and this one has me completely stumped. I am building a fairly basic search functionality and I need to be able to alter my user input based on the following pattern:

Subject:

%22first set%22 %22second set%22-drupal -wordpress

Desired output:

+"first set" +"second set" -drupal -wordpress

I wish I could be more help as I normally like to at least post the solution I have so far, but on this one I'm at a loss.

Any help is appreciated. Thank you.

It seems your data is URL encoded. If you apply urldecode, you will get "first set" "second set"-drupal -wordpress. Do you have actually a space before -drupal or should this be inserted too? — Felix Kling
– Felix Kling, Commented Jan 10, 2011 at 3:29
I can manage the space. The only issue using urldecode is that this going in an sql query and I only want to urldecode double quotes and only if they're in this pattern. — S16
– S16, Commented Jan 10, 2011 at 3:35

Felix Kling · Accepted Answer · 2011-01-10 03:47:32Z

2

Seems your data is URL encoded. If you apply urldecode, you will get

"first set" "second set" -drupal -wordpress

(I assume you have a space before -drupal).

Now you have to add +. Again, I assume you have to add those before all words that don't have a - and that are not inside quotes:

$str = '"first set" "second set" -drupal -wordpress foo';
echo preg_replace('#( |^)(?!(?:\w+"|-| ))#','\1+', $str));
// prints +"first set" +"second set" -drupal -wordpress +foo

Update: If you cannot use urldecode, you could just use str_replace to replace %22 with ".

edited Jan 10, 2011 at 3:47

answered Jan 10, 2011 at 3:27

Felix Kling

820k181 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ming-Tang · Accepted Answer · 2014-06-27 17:54:33Z

1

preg_replace('/%22((?:[^%]|%[^2]|%2[^2])*)%22/', '+"$1"', $str);

Explanation: The $1 is a backreference, which references the first ()-section in the regular expression, in this case, ((?:[^%]|%[^2]|%2[^2])*). And the [^%] and the alternations (...|...|...) after it prevents %22 in between from being matched due to greediness. See http://en.wikipedia.org/wiki/Regular_expression#Lazy_quantification.

I found that technique in a JavaCC example of matching block comments (/* */), and I can't find any other webpages explaining it, so here is a cleaner example: To match a block of text between 12345 12345........12345 with no 12345 in between: /12345([^1]|1[^2]|12[^3]|123[^4]|1234[^5])*12345/

edited Jun 27, 2014 at 17:54

answered Jan 10, 2011 at 3:30

Ming-Tang

17.7k8 gold badges40 silver badges78 bronze badges

3 Comments

S16 Over a year ago

You rock. Thank you, very much. Any chance you could offer an explanation on the solution?

Ming-Tang Over a year ago

The $1 is a backreference, which references the first ()-section in the regular expression, in this case, ((?:[^%]|%[^2]|%2[^2])*). And the [^%] thing prevents %22 in between from being matched: prevents greedy matching, greediness is explained in en.wikipedia.org/wiki/Regular_expression#Lazy_quantification , while the [^%] method is explained in shinkirou.org/blog/2010/12/tricky-regular-expression-problems (first seen in a JavaCC example)

trejder Over a year ago

@SHiNKiROU Explanation to a code given in answer, should be put to answer itsef, not to the comments, where many people may miss it. I wonder, why didn't you edit your own answer, when asked for a clarification, and used tiny comment instead?

phooji · Accepted Answer · 2011-01-10 03:36:59Z

1

Is this what you're looking for?

<?php
  $input = "%22first set%22 %22second set%22-drupal -wordpress";
  $res = preg_replace( "/\%22(.+?)\%22/","+\"(\\1)\" ", $input);
  print $res;
?>

answered Jan 10, 2011 at 3:36

phooji

10.4k3 gold badges41 silver badges46 bronze badges

1 Comment

phooji Over a year ago

Explanation: the \%22 match "%22". The key here is the (.+?) part, which finds the shortest (i.e., "ungreedy") match between the %22s. In the second part, \1 represents the matched value in (.+?).

Collectives™ on Stack Overflow

Using preg_replace to replace all occurrences in php

3 Answers 3

Comments

3 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related