1

I was using Python 3.5 + pyodbc (on windows 10) to do apply some SQL queries on a server, and everything was doing fine like this:

In [1]: import pyodbc
   ...:
In [2]: server = 'tcp:192.168.100.76'
   ...: database = 'database'
   ...: username = 'username'
   ...: password = 'password'
   ...:
In [3]: cnxn = pyodbc.connect('DRIVER={ODBC Driver 13 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+use
   ...: rname+';PWD='+ password)
   ...: cursor = cnxn.cursor()

In [29]: sqlQuery = "SELECT TOP 50 * " + \
    ...: "FROM dbo.DS10_CIBBASE " + \
    ...: "LEFT JOIN dbo.DS10_CIBCITY AS O_CITY ON dbo.DS10_CIBBASE.O_CITY=O_CITY.CITYCODE " + \
    ...: "LEFT JOIN dbo.DS10_CIBCITY AS B_CITY ON dbo.DS10_CIBBASE.B_CITY=B_CITY.CITYCODE " + \
    ...: "LEFT JOIN dbo.DS10_CIBMETH AS METHOD1 ON dbo.DS10_CIBBASE.METHOD1=METHOD1.METHCODE " + \
    ...: "LEFT JOIN dbo.DS10_CIBMETH AS METHOD2 ON dbo.DS10_CIBBASE.METHOD2=METHOD2.METHCODE " + \
    ...: "LEFT JOIN dbo.DS10_CIBMETH AS METHOD3 ON dbo.DS10_CIBBASE.METHOD3=METHOD3.METHCODE " + \
    ...: "LEFT JOIN dbo.DS10_CIBPERS ON dbo.DS10_CIBBASE.BASENO=dbo.DS10_CIBPERS.BASENO " + \
    ...: "LEFT JOIN dbo.DS10_CIBFEAT ON dbo.DS10_CIBPERS.FEATURE=dbo.DS10_CIBFEAT.FEATCODE " + \
    ...: "LEFT JOIN dbo.DS10_CIBOCCU ON dbo.DS10_CIBPERS.OCCUPATION= dbo.DS10_CIBOCCU.OCCUCODE " + \
    ...: "LEFT JOIN dbo.DS10_CIBEDUC ON dbo.DS10_CIBPERS.EDU=dbo.DS10_CIBEDUC.EDUCCODE " + \
    ...: "LEFT JOIN dbo.DS10_CIBCITY AS L_CITY ON dbo.DS10_CIBPERS.CITY=L_CITY.CITYCODE ;"

In [30]: cursor.execute(sqlQuery)
Out[30]: <pyodbc.Cursor at 0x262625e5750>

So far so good, and then I changed the query a little bit:

In [31]: sqlQuery0 = "SELECT TOP 50 * " + \
    ...: "FROM dbo.LAB1_EXP3_CARNO1 ; "

In [32]: cursor.execute(sqlQuery0)
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-32-769696011e11> in <module>()
----> 1 cursor.execute(sqlQuery0)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa8 in position 0: invalid start byte

I could not figure why the previous query was fine, while the later encountered the UnicodeDecodeError, since all the queries were written with ASCII characters.. So what did I do wrong, and how could I get rid of this error?

1 Answer 1

2

You're getting this because your data is not UTF-8 encoded (\U00A8 is "spacing diaeresis", and \xA8 is the same in Latin-1, but \xA8 is š in Latin-15. The error is not because the query has non-ascii characters (obviously, it doesn't), but because the result is returning a string of 8-bit characters that aren't UTF-8 encoded.

For SQL server, you can use cast or convert to change the varchar column causing the error into an nvarchar (UTF-16 Unicode) value.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, it seems to be this problem. I ran the script on linux machine, and this UnicodeDecodeError did not occur. So one more thing, when using CAST, should I specify the columns that cause this error? or can I simply use something like SELECT CAST(TOP 50 * AS nvarchar) (btw, this did not work..)?
No, you'll have to cast each column individually (but you only need to do the varchar columns, and only those that might have non-ascii characters). Alternatively, if you know what encoding is being used, you can do the conversion in Python (using .encode)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.