2

I'm trying to build a list of domain names from an Enom API call. I get back a lot of information and need to locate the domain name related lines, and then join them together.

The string that comes back from Enom looks somewhat like this:

SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
TLDOverride=1

I'd like to build a list from that which looks like this:

[domain1.com, domain2.org, domain3.co.uk, domain4.net]

To find the different domain name components I've tried the following (where "enom" is the string above) but have only been able to get the SLD and TLD matches.

re.findall("^.*(SLD|TLD).*$", enom, re.M) 

9 Answers 9

6

Edit: Every time I see a question asking for regular expression solution I have this bizarre urge to try and solve it without regular expressions. Most of the times it's more efficient than the use of regex, I encourage the OP to test which of the solutions is most efficient.

Here is the naive approach:

a = """SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
TLDOverride=1"""

b = a.split("\n")
c = [x.split("=")[1] for x in b if x != 'TLDOverride=1']
for x in range(0,len(c),2):
    print ".".join(c[x:x+2])

>> domain1.com
>> domain2.org
>> domain3.co.uk
>> domain4.net
Sign up to request clarification or add additional context in comments.

1 Comment

I agree! Pythons string manipulation functions are so powerful, I feel like banging one's head against the walls with regex is often unneeded. I rarely use import re anymore.
4

You have a capturing group in your expression. re.findall documentation says:

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

That's why only the conent of the capturing group is returned.

try:

re.findall("^.*((?:SLD|TLD)\d*)=(.*)$", enom, re.M)

This would return a list of tuples:

[('SLD1', 'domain1'), ('TLD1', 'com'), ('SLD2', 'domain2'), ('TLD2', 'org'), ('SLD3', 'domain3'), ('TLD4', 'co.uk'), ('SLD5', 'domain4'), ('TLD5', 'net')]

Combining SLDs and TLDs is then up to you.

Comments

3

this works for you example,

>>> sld_list = re.findall("^.*SLD[0-9]*?=(.*?)$", enom, re.M)
>>> tld_list = re.findall("^.*TLD[0-9]*?=(.*?)$", enom, re.M)
>>> map(lambda x: x[0] + '.' + x[1], zip(sld_list, tld_list))
['domain1.com', 'domain2.org', 'domain3.co.uk', 'domain4.net']

Comments

3

I'm not sure why are you talking about regular expressions. I mean, why don't you just run a for loop?

A famous quote seems to be appropriate here:

Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.

domains = []
components = []
for line in enom.split('\n'):
  k,v = line.split('=')
  if k == 'TLDOverride':
    continue
  components.append(v)
  if k.startswith('TLD'):
    domains.append('.'.join(components))
    components = []

P.S. I'm not sure what's this TLDOverride so the code just ignores it.

1 Comment

I think this is the best solution, because it's O(n) algorithm.
2

Here's one way:

import re
print map('.'.join,  zip(*[iter(re.findall(r'^(?:S|T)LD\d+=(.*)$', text, re.M))]*2))
# ['domain1.com', 'domain2.org', 'domain3.co.uk', 'domain4.net']

Comments

2

Just for fun, map -> filter -> map:

input = """
SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
"""

splited = map(lambda x: x.split("="), input.split())
slds = filter(lambda x: x[1][0].startswith('SLD'), enumerate(splited))
print map(lambda x: '.'.join([x[1][1], splited[x[0] + 1][1], ]), slds)

>>> ['domain1.com', 'domain2.org', 'domain3.co.uk', 'domain4.net']

Comments

1

This appears to do what you want:

domains = re.findall('SLD\d+=(.+)', re.sub(r'\nTLD\d+=', '.', enom))

It assumes that the lines are sorted and SLD always comes before its TLD. If that can be not the case, try this slightly more verbose code without regexes:

d = dict(x.split('=') for x in enom.strip().splitlines())

domains = [
    d[key] + '.' + d.get('T' + key[1:], '') 
    for key in d if key.startswith('SLD')
]

Comments

1

You need to use multiline regex for this. This is similar to this post.

data = """SLD1=domain1
TLD1=com
SLD2=domain2
TLD2=org
TLDOverride=1
SLD3=domain3
TLD4=co.uk
SLD5=domain4
TLD5=net
TLDOverride=1"""

domain_seq = re.compile(r"SLD\d=(\w+)\nTLD\d=(\w+)", re.M)
for item in domain_seq.finditer(data):
    domain, tld = item.group(1), item.group(2)
    print "%s.%s" % (domain,tld)

Comments

1

As some other answers already said, there's no need to use a regular expression here. A simple split and some filtering will do nicely:

lines = data.split("\n") #assuming data contains your input string
sld, tld = [[x.split("=")[1] for x in lines if x[:3] == t] for t in ("SLD", "TLD")]
result = [x+y for x, y in zip(sld, tld)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.