1

After using Pandas to read a json object into a Pandas.DataFrame, we only want to print the first year in each pandas row. Eg: if we have 2013-2014(2015), we want to print 2013

Full code (here)

x = '{"0":"1985\\u2013present","1":"1985\\u2013present",......}'
a = pd.read_json(x, typ='series')
for i, row in a.iteritems():
    print row.split('-')[0].split('—')[0].split('(')[0]

the following error occurs:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-1333-d8ef23860c53> in <module>()
      1 for i, row in a.iteritems():
----> 2     print row.split('-')[0].split('—')[0].split('(')[0]

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

Why is this happening? How can we fix the problem?

1 Answer 1

1

Your json data strings are unicode string, which you can see for example by just printing one of the values:

In: a[0]
Out: u'1985\u2013present'

Now you try to split the string at the unicode \u2031 (EN DASH), but the string you give to split is no unicode string (therefore the error 'ascii' codec can't decode byte 0xe2 - the EN DASH is no ASCII character).

To make your example working, you could use:

for i, row in a.iteritems():
    print row.split('-')[0].split(u'—')[0].split('(')[0]

Notice the u in front of the uncode dash. You could also write u'\u2013' to split the string.

For details on unicode in Python, see https://docs.python.org/2/howto/unicode.html

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.