So I was having issues with errors using the tld library where it didn't know how to handle certain proxy request urls. To fix this a few exceptions were added and it worked for a particular days worth of data.
import tld
from tld import get_fld
#Custom try-except function to handle IPs and garbage http requests
def try_get_fld(x):
try:
return get_fld(x)
except tld.exceptions.TldBadUrl:
return np.nan
except tld.exceptions.TldDomainNotFound:
return np.nan
#Apply the function above to the request dataframe
request['flds'] = request['request'].apply(try_get_fld)
But on a different day I ran into a new error:
ValueError: Invalid IPv6 URL
So I added to the exceptions:
def try_get_fld(x):
try:
return get_fld(x)
except tld.exceptions.TldBadUrl:
return np.nan
except tld.exceptions.TldDomainNotFound:
return np.nan
except tld.exceptions.ValueError:
return np.nan
Then I ran into an Attribute error:
AttributeError: 'module' object has no attribute 'ValueError'
So I added that to the exceptions:
def try_get_fld(x):
try:
return get_fld(x)
except tld.exceptions.TldBadUrl:
return np.nan
except tld.exceptions.TldDomainNotFound:
return np.nan
except tld.exceptions.ValueError:
return np.nan
except tld.exceptions.AttributeError:
return np.nan
Then I get the AttributeError: 'module' object has no attribute 'ValueError' again.
Does anybody know what I'm doing wrong or how to fix my issue? The goal is just to mark the request urls with NaN so that I can apply the method to my dataset.
except ValueError, it's a base PythonErrorclass.