To find the domain of any string given, a three-step solution seems to work best:
I performed only some tests and it seems like the result is as expected. The method directly generates the output, but can be modified to return the domain name instead of generating output:
<?php
getDomain("http://www.stackoverflow.com");
getDomain("http://www.google.co.uk");
getDomain("http://books.google.co.uk");
getDomain("http://a.b.c.google.co.uk");
getDomain("http://www.nominet.org.uk/intelligence/statistics/registration/");
getDomain("http://invalid.fail.pooo");
getDomain("http://AnotherOneThatShouldFail.com");
function getDomain($url){
echo "Searching Domain for '".$url."': ";
//Step 1: Get the actual hostname
$url = parse_url($url);
$actualHostname = $url["host"];
//step 2: Top-Down approach: check DNS Records for the first valid A-record.
//Re-Assemble url step-by-step, i.e. for www.google.co.uk, check:
// - uk
// - co.uk
// - google.co.uk (will match here)
// - www.google.co.uk (will be skipped)
$domainParts = explode(".", $actualHostname);
for ($i= count($domainParts)-1; $i>=0; $i--){
$domain = "";
$currentCountry = null;
for ($j = count($domainParts)-1; $j>=$i; $j--){
$domain = $domainParts[$j] . "." . $domain;
if ($currentCountry == null){
$currentCountry = $domainParts[$j];
}
}
$domain = trim($domain, ".");
$validRecord = checkdnsrr($domain, "A"); //looking for Class A records
if ($validRecord){
//If the host can be resolved to an ip, it seems valid.
//if hostname is returned, its invalid.
$hostIp = gethostbyname($domain);
$validRecord &= ($hostIp != $domain);
if ($validRecord){
//last check: DNS server might answer with one of ISPs default server ips for invalid domains.
//perform a test on this by querying a domain of the same "country" that is invalid for sure to obtain an
//ip list of ISPs default servers. Then compare with the response of current $domain.
$validRecord &= !(in_array($hostIp, gethostbynamel("iiiiiiiiiiiiiiiiiinvaliddomain." . $currentCountry)));
}
}
//valid record?
if ($validRecord){
//return $domain;
echo $domain."<br />";
return;
}
}
//return null;
echo " not resolved.<br />";
}
?>
Output of the example above:
Searching Domain for 'http://www.stackoverflow.com': stackoverflow.com
Searching Domain for 'http://www.google.co.uk': google.co.uk
Searching Domain for 'http://books.google.co.uk': google.co.uk
Searching Domain for 'http://a.b.c.google.co.uk': google.co.uk
Searching Domain for 'http://www.nominet.org.uk/intelligence/statistics/registration/': nominet.org.uk
Searching Domain for 'http://invalid.fail.pooo': not resolved.
Searching Domain for 'http://AnotherOneThatShouldFail.com': not resolved.
This is only a very limited set of test-cases but I cannot imagine a case, where a domain has no A-record.
As a nice side-effect, this also validates urls and does not just rely on theoretically valid formats like the last examples are showing.
best,
dognose