0

I was looking online (and here also) for all possible functions for getting domain from URL. Latest code I found is from here - https://gist.github.com/pocesar/5366899

            <?php
            /**
             * @param string $domain Pass $_SERVER['SERVER_NAME'] here
             * @param bool $debug
             *
             * @debug bool $debug
             * @return string
             */
            function get_domain($domain, $debug = false)
            {
                $original = $domain = strtolower($domain);

                if (filter_var($domain, FILTER_VALIDATE_IP)) { return $domain; }

                $debug ? print('<strong style="color:green">&raquo;</strong> Parsing: '.$original) : false;

                $arr = array_slice(array_filter(explode('.', $domain, 4), function($value){
                    return $value !== 'www';
                }), 0); //rebuild array indexes

                if (count($arr) > 2)
                {
                    $count = count($arr);
                    $_sub = explode('.', $count === 4 ? $arr[3] : $arr[2]);

                    $debug ? print(" (parts count: {$count})") : false;

                    if (count($_sub) === 2) // two level TLD
                    {
                        $removed = array_shift($arr);
                        if ($count === 4) // got a subdomain acting as a domain
                        {
                            $removed = array_shift($arr);
                        }
                        $debug ? print("<br>\n" . '[*] Two level TLD: <strong>' . join('.', $_sub) . '</strong> ') : false;
                    }
                    elseif (count($_sub) === 1) // one level TLD
                    {
                        $removed = array_shift($arr); //remove the subdomain

                        if (strlen($_sub[0]) === 2 && $count === 3) // TLD domain must be 2 letters
                        {
                            array_unshift($arr, $removed);
                        }
                        else
                        {
                            // non country TLD according to IANA
                            $tlds = array(
                                'aero',
                                'arpa',
                                'asia',
                                'biz',
                                'cat',
                                'com',
                                'coop',
                                'edu',
                                'gov',
                                'info',
                                'jobs',
                                'mil',
                                'mobi',
                                'museum',
                                'name',
                                'net',
                                'org',
                                'post',
                                'pro',
                                'tel',
                                'travel',
                                'xxx',
                            );

                            if (count($arr) > 2 && in_array($_sub[0], $tlds) !== false) //special TLD don't have a country
                            {
                                array_shift($arr);
                            }
                        }
                        $debug ? print("<br>\n" .'[*] One level TLD: <strong>'.join('.', $_sub).'</strong> ') : false;
                    }
                    else // more than 3 levels, something is wrong
                    {
                        for ($i = count($_sub); $i > 1; $i--)
                        {
                            $removed = array_shift($arr);
                        }
                        $debug ? print("<br>\n" . '[*] Three level TLD: <strong>' . join('.', $_sub) . '</strong> ') : false;
                    }
                }
                elseif (count($arr) === 2)
                {
                    $arr0 = array_shift($arr);

                    if (strpos(join('.', $arr), '.') === false
                        && in_array($arr[0], array('localhost','test','invalid')) === false) // not a reserved domain
                    {
                        $debug ? print("<br>\n" .'Seems invalid domain: <strong>'.join('.', $arr).'</strong> re-adding: <strong>'.$arr0.'</strong> ') : false;
                        // seems invalid domain, restore it
                        array_unshift($arr, $arr0);
                    }
                }

                $debug ? print("<br>\n".'<strong style="color:gray">&laquo;</strong> Done parsing: <span style="color:red">' . $original . '</span> as <span style="color:blue">'. join('.', $arr) ."</span><br>\n") : false;

                return join('.', $arr);
            }

            $urls = array(
                'www.example.com' => 'example.com',
                'example.com' => 'example.com',
                'example.com.br' => 'example.com.br',
                'www.example.com.br' => 'example.com.br',
                'www.example.gov.br' => 'example.gov.br',
                'localhost' => 'localhost',
                'www.localhost' => 'localhost',
                'subdomain.localhost' => 'localhost',
                'www.subdomain.example.com' => 'example.com',
                'subdomain.example.com' => 'example.com',
                'subdomain.example.com.br' => 'example.com.br',
                'www.subdomain.example.com.br' => 'example.com.br',
                'www.subdomain.example.biz.br' => 'example.biz.br',
                'subdomain.example.biz.br' => 'example.biz.br',
                'subdomain.example.net' => 'example.net',
                'www.subdomain.example.net' => 'example.net',
                'www.subdomain.example.co.kr' => 'example.co.kr',
                'subdomain.example.co.kr' => 'example.co.kr',
                'example.co.kr' => 'example.co.kr',
                'example.jobs' => 'example.jobs',
                'www.example.jobs' => 'example.jobs',
                'subdomain.example.jobs' => 'example.jobs',
                'insane.subdomain.example.jobs' => 'example.jobs',
                'insane.subdomain.example.com.br' => 'example.com.br',
                'www.doubleinsane.subdomain.example.com.br' => 'example.com.br',
                'www.subdomain.example.jobs' => 'example.jobs',
                'test' => 'test',
                'www.test' => 'test',
                'subdomain.test' => 'test',
                'www.detran.sp.gov.br' => 'sp.gov.br',
                'www.mp.sp.gov.br' => 'sp.gov.br',
                'ny.library.museum' => 'library.museum',
                'www.ny.library.museum' => 'library.museum',
                'ny.ny.library.museum' => 'library.museum',
                'www.library.museum' => 'library.museum',
                'info.abril.com.br' => 'abril.com.br',
                '127.0.0.1' => '127.0.0.1',
                '::1' => '::1',
            );

            $failed = 0;
            $total = count($urls);

            foreach ($urls as $from => $expected)
            {
                $from = get_domain($from, true);
                if ($from !== $expected)
                {
                    $failed++;
                    print("<div style='color:fuchsia;'>expected {$from} to be {$expected}</div>");
                }
            }

            if ($failed)
            {
                print("{$failed} tests failed out of {$total}");
            }
            else
            {
                print("Success");
            }

But I found that it does not work in these cases:

blog.ebaum.tv
api.outside.in
chip.cuccio.us
brushes.net.tc
beta.wua.la
core.windows.net
dd.cron.ru
compute-1.amazonaws.com
docs.rinet.ru
dupont.free.fr
edusim.greenbush.us
dtek.chalmers.se
fifthgear.five.tv
friizu.pri.ee
fortune.cnn.com
grondziowski.neostrada.pl
iden.tify.us
fb.joyent.us
blog.tr.im
jspec.jaxa.jp
mashable.blogs.mu
lists.burri.to
com.edgesuite.net
my.noovo.us
blog.bit.ly
moon.dominos.jp

So, for all subdomains above, function returns subdomain instead of domain. Does anybody have idea how to fix this function?

3
  • The function always returns the third level domain under country-specific top-level domains. It assumes they're all of the form organization.type.country, e.g. oxford.ac.uk and example.co.cr. Commented Dec 27, 2014 at 17:33
  • @Barmar do you have idea how to fix it ? Also "compute-1.amazonaws.com" is some bug. Also "dupont.free.fr" should be "free.fr" etc.. Commented Dec 27, 2014 at 17:57
  • Without knowing the rules for every country, and all the exceptions, I don't think there's any good solution. Commented Dec 27, 2014 at 17:59

2 Answers 2

1

Try:

function getDomain ($address) {

# Establishes Hostname
$uri[Hostname] = substr($address,0, (strpos($address,'.')));

# Establishes Domainname
$uri[Domainname] = substr($address, (strlen($uri[Hostname]) + 1));

if (preg_match("/\//", $uri[Domainname])) {
$uri[Domainname] = substr($uri[Domainname], 0, strpos($uri[Domainname],'/'));
}

# Establishes TLD
if (preg_match("/\./", $uri[Domainname])) {
$uri[TLD] = substr($uri[Domainname], (strpos($uri[Domainname],'.') + 1));
$uri[Domainname] = substr($uri[Domainname],0,-(strlen($uri[TLD]) + 1));
}

if (preg_match("/\//", $uri[TLD])) {
$uri[TLD] = substr($uri[TLD], 0, strpos($uri[TLD],'/'));
}

# Re-labels parts if there are only 2 (instead of 3)
if (count($uri) == 2) {
$uri[TLD] = $uri[Domainname];
$uri[Domainname] = $uri[Hostname];
unset ($uri[Hostname]);
}

# Added to handle domains of type .co.rs, .co.uk, .co.jp etc.
if ($uri[Domainname] == 'co') {
$uri[TLD] = $uri[Domainname].'.'.$uri[TLD];
$uri[Domainname] = $uri[Hostname];
unset ($uri[Hostname]);
}

return $uri;
}

This function will take any standard web address (ie. not one including multiple subdomains) and return an array containing the hostname (optionally), the domain name and the TLD.

Sign up to request clarification or add additional context in comments.

3 Comments

If you also need to process web addresses which include multiple subdomains, you can turn the block containing the 2nd and 3rd if statements into a loop, to extract every subdomain from the web address.
Does not work. For domain "slajer.co.rs" it returns "Array ( [Hostname] => slajer [Domainname] => co [TLD] => rs )". But .co.rs is TLD
Yep. I have added a block at the end to handle TLDs like .co.rs, .co.uk, .co.jp etc.
0

You can easily find out domain name by using simple server function as like following ...

     echo $_SERVER['SERVER_NAME'];

OR

At first you can create a function as like following ..

<?php

  function getDomain($url){
    if(filter_var($url, FILTER_VALIDATE_URL, FILTER_FLAG_HOST_REQUIRED) === FALSE){
       return false;
     }

     /*** to get the url parts ***/
     $parts = parse_url($url);

     /*** return the host domain ***/
     return $parts['scheme'].'://'.$parts['host'];

   }
?>

Then Call this function as like following ....

<?php

  $url = 'http://phpro.org/classes/Phproogle-Docs.html';
  echo getDomain($url);
?>

2 Comments

It's definitely a useful Global variable to know. But... it won't extract a domain from a URI.
Now hopefully it will will return exact domain .... If not work just send your expected domain format as example...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.