2

As much as I've tried I can't seem to find the correct regex to locate what I'm after here.

I only want to select the first instance of the url that matches the domain www.myweb.com from the following...

Some text https://www.myweb.com/page/cat/323123442321-rghe432 and then another https://www.adifferentsite.com/fsdhjss/erwr

I need to completely ignore the second url www.adifferentsite.com and only work with the first one that matches www.myweb.com, ignoring any other possible instances of www.myweb.com

Once the first matching domain is discovered I need to store the rest of the url that comes after it...

page/cat/323123442321-rghe432

...into a new variable $newvar, so...

$newvar = 'page/cat/323123442321-rghe432';

I'm trying :

return preg_replace_callback( '/http://www.myweb.com/\/[0-9a-zA-Z]+/', array( __CLASS__, 'my_callback' ), $newvar );

I've read tons of documents on how to detect url's but can't find anything about detecting a specific url.

I really can't grasp how to formulate regex so this formula is incorrect. Any help would be greatly appreciated.

EDIT Edited the question to be a bit more specific and hopefully a bit easier to resolve.

10
  • 1
    If you need to match, why replace? And when you created a regex, did you pay attention at the regex delimiters? I guess you get an Uknown delimiters error. Commented Dec 2, 2015 at 12:20
  • I'm replacing because I'm going to be formulating the link into an oEmbed link where the url format is different. so myweb.com/page/cat/323123442321-rghe432 will become embed.myweb.com/page/cat/323123442321-rghe432 when rendered after this filter Commented Dec 2, 2015 at 12:22
  • Regex wise, I just don't understand how it works. I've read and read and read and just can't seem to grasp the correct way of doing it. Commented Dec 2, 2015 at 12:25
  • Ok, I think you can just use '~\bhttps?://www\.myweb\.com/(\S+)~' regex and push the $m[1] into the array for "the rest of URL". Commented Dec 2, 2015 at 12:25
  • 2
    Here is a demo of what I mean. Commented Dec 2, 2015 at 12:37

1 Answer 1

2

You can use a preg_replace_callback and pass an array into the anonymous function (or just your custom callback function) to fill it with all the necessary URL parts.

Here is a demo:

$rests = array();
$re = '~\b(https?://)www\.myweb\.com/(\S+)~'; 
$str = "Some text https://www.myweb.com/page/cat/323123442321-rghe432 and then another https://www.adifferentsite.com/fsdhjss/erwr"; 
echo $result = preg_replace_callback($re, function ($m) use (&$rests) {
    array_push($rests, $m[2]);
    return $m[1] . "embed.myweb.com/" . $m[2];
}, $str) . PHP_EOL;
print_r($rests);

Results:

Some text https://embed.myweb.com/page/cat/323123442321-rghe432 and then another https://www.adifferentsite.com/fsdhjss/erwr
Array
(
    [0] => page/cat/323123442321-rghe432
)

A couple of words:

  • '~\b(https?://)www\.myweb\.com/(\S+)~' has ~ as a regex delimiter, so you do not have to escape /
  • It is declared with a single-quoted literal, so you do not have to use double-escaping for \\S
  • It matches and captures into capturing groups 2 substrings: \b(https?://) (that matches a whole word http or https followed by ://) and (\S+) (that matches 1 or more non-whitespace characters). These capturing groups are marked with (...) in the pattern and can be accessed via $matches[n] where n is the id of the capturing group.

UPDATE

If you only need to replace the first occurrence of the URL, pass the limit argument to the preg_replace_callback:

$rest = "";
$re = '~\b(https?://)www\.myweb\.com/(\S+\b)~'; 
$str = "Some text https://www.myweb.com/page/cat/323123442321-rghe432, another http://www.myweb.com/page/cat/323123442321-rghe432 and then another https://www.adifferentsite.com/fsdhjss/erwr"; 
echo $result = preg_replace_callback($re, function ($m) use (&$rest) {
    $rest = $m[2];
    return $m[1] . "embed.myweb.com/" . $m[2];
}, $str, 1) . PHP_EOL;
//-LIMIT ^ - HERE -
echo $rest;

See another IDEONE demo

Sign up to request clarification or add additional context in comments.

11 Comments

This works perfectly, but I don't need an array in this instance. I only need to match and collect the url for the first instance of the url. Is this routine possible without using an array?
Do you mean to say that you have Some text https://www.myweb.com/page1 and https://www.myweb.com/page2 and you only want to get the first one replaced only? Use 1 as the last argument to preg_replace_callback.
Thank you @stribizhev that is perfect!
Use negated character class [^/]* to match 0 or more characters other than / to stay inside the URL parts. See demo.
Ok, use this one. $re = '~(https?://)www\.myweb\.com/(([^/]*/[^/]*)\S+\b)~'; and then $result = preg_replace_callback($re, function ($m) use (&$rest) { $rest = $m[3]; return $m[1] . "embed.myweb.com/" . $m[2]; }, $str, 1)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.