Regex only for specific domain name in URL

Question

As much as I've tried I can't seem to find the correct regex to locate what I'm after here.

I only want to select the first instance of the url that matches the domain www.myweb.com from the following...

Some text https://www.myweb.com/page/cat/323123442321-rghe432 and then another https://www.adifferentsite.com/fsdhjss/erwr

I need to completely ignore the second url www.adifferentsite.com and only work with the first one that matches www.myweb.com, ignoring any other possible instances of www.myweb.com

Once the first matching domain is discovered I need to store the rest of the url that comes after it...

page/cat/323123442321-rghe432

...into a new variable $newvar, so...

$newvar = 'page/cat/323123442321-rghe432';

I'm trying :

return preg_replace_callback( '/http://www.myweb.com/\/[0-9a-zA-Z]+/', array( __CLASS__, 'my_callback' ), $newvar );

I've read tons of documents on how to detect url's but can't find anything about detecting a specific url.

I really can't grasp how to formulate regex so this formula is incorrect. Any help would be greatly appreciated.

EDIT Edited the question to be a bit more specific and hopefully a bit easier to resolve.

If you need to match, why replace? And when you created a regex, did you pay attention at the regex delimiters? I guess you get an Uknown delimiters error. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Dec 2, 2015 at 12:20
I'm replacing because I'm going to be formulating the link into an oEmbed link where the url format is different. so myweb.com/page/cat/323123442321-rghe432 will become embed.myweb.com/page/cat/323123442321-rghe432 when rendered after this filter — Grant
– Grant, Commented Dec 2, 2015 at 12:22
Regex wise, I just don't understand how it works. I've read and read and read and just can't seem to grasp the correct way of doing it. — Grant
– Grant, Commented Dec 2, 2015 at 12:25
Ok, I think you can just use '~\bhttps?://www\.myweb\.com/(\S+)~' regex and push the $m[1] into the array for "the rest of URL". — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Dec 2, 2015 at 12:25

Wiktor Stribiżew · Accepted Answer · 2015-12-02 13:21:06Z

2

You can use a preg_replace_callback and pass an array into the anonymous function (or just your custom callback function) to fill it with all the necessary URL parts.

Here is a demo:

$rests = array();
$re = '~\b(https?://)www\.myweb\.com/(\S+)~'; 
$str = "Some text https://www.myweb.com/page/cat/323123442321-rghe432 and then another https://www.adifferentsite.com/fsdhjss/erwr"; 
echo $result = preg_replace_callback($re, function ($m) use (&$rests) {
    array_push($rests, $m[2]);
    return $m[1] . "embed.myweb.com/" . $m[2];
}, $str) . PHP_EOL;
print_r($rests);

Results:

Some text https://embed.myweb.com/page/cat/323123442321-rghe432 and then another https://www.adifferentsite.com/fsdhjss/erwr
Array
(
    [0] => page/cat/323123442321-rghe432
)

A couple of words:

'~\b(https?://)www\.myweb\.com/(\S+)~' has ~ as a regex delimiter, so you do not have to escape /
It is declared with a single-quoted literal, so you do not have to use double-escaping for \\S
It matches and captures into capturing groups 2 substrings: \b(https?://) (that matches a whole word http or https followed by ://) and (\S+) (that matches 1 or more non-whitespace characters). These capturing groups are marked with (...) in the pattern and can be accessed via $matches[n] where n is the id of the capturing group.

UPDATE

If you only need to replace the first occurrence of the URL, pass the limit argument to the preg_replace_callback:

$rest = "";
$re = '~\b(https?://)www\.myweb\.com/(\S+\b)~'; 
$str = "Some text https://www.myweb.com/page/cat/323123442321-rghe432, another http://www.myweb.com/page/cat/323123442321-rghe432 and then another https://www.adifferentsite.com/fsdhjss/erwr"; 
echo $result = preg_replace_callback($re, function ($m) use (&$rest) {
    $rest = $m[2];
    return $m[1] . "embed.myweb.com/" . $m[2];
}, $str, 1) . PHP_EOL;
//-LIMIT ^ - HERE -
echo $rest;

See another IDEONE demo

edited Dec 2, 2015 at 13:21

answered Dec 2, 2015 at 13:08

Wiktor Stribiżew

631k41 gold badges502 silver badges633 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Grant Over a year ago

This works perfectly, but I don't need an array in this instance. I only need to match and collect the url for the first instance of the url. Is this routine possible without using an array?

Wiktor Stribiżew Over a year ago

Do you mean to say that you have Some text https://www.myweb.com/page1 and https://www.myweb.com/page2 and you only want to get the first one replaced only? Use 1 as the last argument to preg_replace_callback.

Grant Over a year ago

Thank you @stribizhev that is perfect!

Wiktor Stribiżew Over a year ago

Use negated character class [^/]* to match 0 or more characters other than / to stay inside the URL parts. See demo.

Wiktor Stribiżew Over a year ago

Ok, use this one. $re = '~(https?://)www\.myweb\.com/(([^/]*/[^/]*)\S+\b)~'; and then

$result = preg_replace_callback($re, function ($m) use (&$rest) { 	$rest = $m[3]; 	return $m[1] . "embed.myweb.com/" . $m[2]; }, $str, 1)

|

Collectives™ on Stack Overflow

Regex only for specific domain name in URL

1 Answer 1

11 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

11 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related