4

I am loading data from one table into pandas and then inserting that data into new table. However, instead of normal string value I am seeing bytearray.

bytearray(b'TM16B0I8') it should be TM16B0I8

What am I doing wrong here?

My code:

engine_str = 'mysql+mysqlconnector://user:pass@localhost/db'
engine = sqlalchemy.create_engine(engine_str, echo=False, encoding='utf-8')
connection = engine.connect()

th_df = pd.read_sql('select ticket_id, history_date', con=connection)

for row in th_df.to_dict(orient="records"):
    var_ticket_id = row['ticket_id']
    var_history_date = row['history_date']

    query = 'INSERT INTO new_table(ticket_id, history_date)....'
2
  • where do you see the bytearray? anywhere related to the code above? Commented Dec 1, 2016 at 22:26
  • 1
    So when I am printing th_df['ticket_id'], instead of giving me a string 'TM16A0JY' it is giving me this array [77, 83, 90, 45, 48, 50, 53, 52, 57, 56] and after the insert when I looked into DB it is showing me bytearray(b'TM16A0JY'). Interestingly for integer IDs it is not showing bytearray and also inserting a integer value in db. 4567. Commented Dec 1, 2016 at 22:31

4 Answers 4

10

For some reason the Python MySql connector only returns bytearrys, (more info in (How return str from mysql using mysql.connector?) but you can decode them into unicode strings with

var_ticket_id = row['ticket_id'].decode()
var_history_date = row['history_date'].decode()
Sign up to request clarification or add additional context in comments.

Comments

6

Make sure you are using the right collation, and encoding. I happen to use UTF8MB4_BIN for one of my website db tables. Changed it to utf8mb4_general_ci, and it did the trick.

1 Comment

This is perfect - I too had a similar problem that was solved by changing the collation from latin1_bin to COLLATE latin1_general_ci . Thank you @Yongju Lee
2

Producing a bytearray is now the expected behaviour.

It changed with mysql-connector-python 8.0.24 (2021-04-20). According to the v8.0.24 release notes, "Binary columns were returned as strings instead of 'bytes' or 'bytearray'" behaviour was a bug that was fixed in that release.

So producing a Python bytearray is the correct behaviour, if the database column is a binary type (e.g. binary or varbinary). Previously, it produced a Python string, but now it produces a bytearray.

So either change the data type in the database to a non-binary data type, or convert the bytearray to a string in your code. If the column is nullable, you'll have to check for that first; since attempting to invoke decode() method on None would produce an error. You'll also have to be sure the bytes represent a valid string, in the character encoding being used for the decoding/conversion.

2 Comments

Do you have any insight as to why a bytearray is returned rather than bytes?
I do not know why the library authors used the mutable bytearray instead of immutable bytes. Maybe the implementation constructs the result by modifying a bytearray, so it was more efficient to just return it (rather than adding another operation to convert it to bytes to return).
0

Much easier...

How to return str from MySQL using mysql.connector?

Adding mysql-connector-python==8.0.17 to requirements.txt resolved this issue for me

"pip install mysql-connector-python" from terminal

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.