UnicodeDecodeError on Python 3.5 when doing SQL query

Question

I was using Python 3.5 + pyodbc (on windows 10) to do apply some SQL queries on a server, and everything was doing fine like this:

In [1]: import pyodbc
   ...:
In [2]: server = 'tcp:192.168.100.76'
   ...: database = 'database'
   ...: username = 'username'
   ...: password = 'password'
   ...:
In [3]: cnxn = pyodbc.connect('DRIVER={ODBC Driver 13 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+use
   ...: rname+';PWD='+ password)
   ...: cursor = cnxn.cursor()

In [29]: sqlQuery = "SELECT TOP 50 * " + \
    ...: "FROM dbo.DS10_CIBBASE " + \
    ...: "LEFT JOIN dbo.DS10_CIBCITY AS O_CITY ON dbo.DS10_CIBBASE.O_CITY=O_CITY.CITYCODE " + \
    ...: "LEFT JOIN dbo.DS10_CIBCITY AS B_CITY ON dbo.DS10_CIBBASE.B_CITY=B_CITY.CITYCODE " + \
    ...: "LEFT JOIN dbo.DS10_CIBMETH AS METHOD1 ON dbo.DS10_CIBBASE.METHOD1=METHOD1.METHCODE " + \
    ...: "LEFT JOIN dbo.DS10_CIBMETH AS METHOD2 ON dbo.DS10_CIBBASE.METHOD2=METHOD2.METHCODE " + \
    ...: "LEFT JOIN dbo.DS10_CIBMETH AS METHOD3 ON dbo.DS10_CIBBASE.METHOD3=METHOD3.METHCODE " + \
    ...: "LEFT JOIN dbo.DS10_CIBPERS ON dbo.DS10_CIBBASE.BASENO=dbo.DS10_CIBPERS.BASENO " + \
    ...: "LEFT JOIN dbo.DS10_CIBFEAT ON dbo.DS10_CIBPERS.FEATURE=dbo.DS10_CIBFEAT.FEATCODE " + \
    ...: "LEFT JOIN dbo.DS10_CIBOCCU ON dbo.DS10_CIBPERS.OCCUPATION= dbo.DS10_CIBOCCU.OCCUCODE " + \
    ...: "LEFT JOIN dbo.DS10_CIBEDUC ON dbo.DS10_CIBPERS.EDU=dbo.DS10_CIBEDUC.EDUCCODE " + \
    ...: "LEFT JOIN dbo.DS10_CIBCITY AS L_CITY ON dbo.DS10_CIBPERS.CITY=L_CITY.CITYCODE ;"

In [30]: cursor.execute(sqlQuery)
Out[30]: <pyodbc.Cursor at 0x262625e5750>

So far so good, and then I changed the query a little bit:

In [31]: sqlQuery0 = "SELECT TOP 50 * " + \
    ...: "FROM dbo.LAB1_EXP3_CARNO1 ; "

In [32]: cursor.execute(sqlQuery0)
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-32-769696011e11> in <module>()
----> 1 cursor.execute(sqlQuery0)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa8 in position 0: invalid start byte

I could not figure why the previous query was fine, while the later encountered the UnicodeDecodeError, since all the queries were written with ASCII characters.. So what did I do wrong, and how could I get rid of this error?

cco · Accepted Answer · 2016-11-21 04:14:32Z

2

You're getting this because your data is not UTF-8 encoded (\U00A8 is "spacing diaeresis", and \xA8 is the same in Latin-1, but \xA8 is š in Latin-15. The error is not because the query has non-ascii characters (obviously, it doesn't), but because the result is returning a string of 8-bit characters that aren't UTF-8 encoded.

For SQL server, you can use cast or convert to change the varchar column causing the error into an nvarchar (UTF-16 Unicode) value.

answered Nov 21, 2016 at 4:14

cco

6,3712 gold badges20 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Sidney Over a year ago

Thanks, it seems to be this problem. I ran the script on linux machine, and this UnicodeDecodeError did not occur. So one more thing, when using CAST, should I specify the columns that cause this error? or can I simply use something like SELECT CAST(TOP 50 * AS nvarchar) (btw, this did not work..)?

cco Over a year ago

No, you'll have to cast each column individually (but you only need to do the varchar columns, and only those that might have non-ascii characters). Alternatively, if you know what encoding is being used, you can do the conversion in Python (using .encode)

Collectives™ on Stack Overflow

UnicodeDecodeError on Python 3.5 when doing SQL query

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related