29

Capture the domain till the ending characters $, \?, /, :. I need a regex that captures example.com in all of these.

example.com:3000
example.com?pass=gas
example.com/
example.com
2

6 Answers 6

57

If you actually have valid URLs, this will work:

var urls = [
    'http://example.com:3000',
    'http://example.com?pass=gas',
    'http://example.com/',
    'http://example.com'
];

for (x in urls) {
    var a = document.createElement('a');
    a.href = urls[x];
    console.log(a.hostname);
}

//=> example.com
//=> example.com
//=> example.com
//=> example.com

Note, using regex for this kind of thing is silly when the language you're using has other built-in methods.

Other properties available on A elements.

var a = document.createElement('a');
a.href = "http://example.com:3000/path/to/something?query=string#fragment"

a.protocol   //=> http:
a.hostname   //=> example.com
a.port       //=> 3000
a.pathname   //=> /path/to/something
a.search     //=> ?query=string
a.hash       //=> #fragment
a.host       //=> example.com:3000

EDIT #2

Upon further consideration, I looked into the Node.js docs and found this little gem: url#parse

The code above can be rewritten as:

var url = require('url');

var urls = [
    'http://example.com:3000',
    'http://example.com?pass=gas',
    'http://example.com/',
    'http://example.com'
];

for (x in urls) {
    console.log(url.parse(urls[x]).hostname);
}

//=> example.com
//=> example.com
//=> example.com
//=> example.com

EDIT #1

See the revision history of this post if you'd like to see how to solve this problem using jsdom and nodejs

Sign up to request clarification or add additional context in comments.

11 Comments

javascript but I would really just like a regex
This would be great, but I'm working server-side. No doc =[. Might be a way to fake it.
Have you heard of jsdom? Also, you should've mentioned you were using something like node.js in the tags :P
@ThomasReggi, I discovered that nodejs has it's own url#parse method. Please see Edit #2 above.
Using DOM objects is not JS feature, but DOM binding feature. DOM doesn't exist in many JS environments. Also, it is very slow, and the proper way to perform simple string parsing is EXACTLY using regexps.
|
32

Since you're using node, just use the built-in url.parse() method; you want the resulting hostname property:

var url=require('url');
var urls = [
  'http://example.com:3000',
  'http://example.com?pass=gas',
  'http://example.com/',
  'http://example.com'
];

urls.forEach(function(x) {
  console.log(url.parse(x).hostname);
});

4 Comments

returns { pathname: '0', path: '0', href: '0' } { pathname: '1', path: '1', href: '1' } { pathname: '2', path: '2', href: '2' } { pathname: '3', path: '3', href: '3' }
Goofed-up test harness (copied from another answer), updated in my answer. Lesson: don't use for (...in...) to iterate over arrays.
it includes subdomain
@MuhammadUmer subdomain is part of the hostname.
28

A new challenger has appeared. According to node docs, you can also use

   var url = new URL(urlString);
   console.log(url.hostname);

https://nodejs.org/api/url.html#url_the_whatwg_url_api

This seems to be a more current way.

Comments

6

I'm using Node ^10 and this is how I extract the hostname from a URL.

var url = URL.parse('https://stackoverflow.com/q/13506460/2535178')
console.log(url.hostname)
//=> stackoverflow.com

Comments

1

I reccomend using the new URL class that is now included in most browsers.

var urls = [
  'http://example.com:3000',
  'http://example.com?pass=gas',
  'http://example.com/',
  'http://example.com'
];

urls.forEach(url => {
  const u = new URL(url)
  console.log(u.hostname)
})

Comments

0
/^((?:[a-z0-9-_]+\.)*[a-z0-9-_]+\.?)(?::([0-9]+))?(.*)$/i

matches are host, port, path

4 Comments

Does not work : s="stackoverflow.com/questions/13506460/…" s.match(/^((?:[a-z0-9-]+\.)*[a-z0-9-]+\.?)(?::([0-9]+))?(.*)$/i) gives the following result : ["stackoverflow.com/questions/13506460/…", "http", undefined, "://stackoverflow.com/questions/13506460/how-to-extract-the-host-from-a-url-in-javascript"]
Don't post fake test please. Your results contain string "http" as a matched string while the string you say you run regexp on doesn't contain "http" substring. You either patched the execution result or source code of your jS virtual machine to achieve this results. "stackoverflow.com/questions/13506460/how-to-extract...".match(/^((?:[a-z0-9-]+\.)*[a-z0-9-]+\.?)(?::([0-9]+))?(.*)$/i) works perfectly fine resulting in ["stackoverflow.com/questions/13506460/how-to-extract...", "stackoverflow.com", undefined, "/questions/13506460/how-to-extract..."]
nope, stackoverflow auto cuts the url... Now, please check this fiddle : jsfiddle.net/WLGmv and let me know if I'm doing anything wrong.
Sure thing. You try to use this regexp for the wrong purpose. If you reread the original question, it was not supposed to do what you want. You need to parse URLs with URI scheme, try this: /^(?:https?:\/\/)?((?:[a-z0-9-_]+\.)*[a-z0-9-_]+\.?)(?::([0-9]+))?(.*)$/i (works only for http and https or no URI scheme at all). Fiddle is here: jsfiddle.net/WLGmv/1

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.