I have a column in a PostgresQL table of type BYTEA. The model class defines the column as a LargeBinary field, which the documentation says "The Binary type generates BLOB or BYTEA when tables are created, and also converts incoming values using the Binary callable provided by each DB-API."
I have a Python string which I would like to insert into this table.
The Python string is:
'\x83\x8a\x13,\x96G\xfd9ae\xc2\xaa\xc3syn\xd1\x94b\x1cq\xfa\xeby$\xf8\xfe\xfe\xc5\xb1\xf5\xb5Q\xaf\xc3i\xe3\xe4\x02+\x00ke\xf5\x9c\xcbA8\x8c\x89\x13\x00\x07T\xeb3\xbcp\x1b\xff\xd0\x00I\xb9'
The relevant snippet of my SQLAlchemy code is:
migrate_engine.execute(
"""
UPDATE table
SET x=%(x)s
WHERE id=%(id)s
""",
x=the_string_above,
id='1')
I am getting the error:
sqlalchemy.exc.DataError: (DataError) invalid byte sequence for encoding "UTF8": 0x83
'\n UPDATE table\n SET x=%(x)s\n WHERE id=%(id)s\n ' {'x': '\x83\x8a\x13,\x96G\xfd9ae\xc2\xaa\xc3syn\xd1\x94b\x1cq\xfa\xeby$\xf8\xfe\xfe\xc5\xb1\xf5\xb5Q\xaf\xc3i\xe3\xe4\x02+\x00ke\xf5\x9c\xcbA8\x8c\x89\x13\x00\x07T\xeb3\xbcp\x1b\xff\xd0\x00I\xb9', 'id': '1',}
If I go into the pgadmin3 console and enter the UPDATE command directly, the update works fine. The error is clearly from SQLAlchemy. The string is a valid Python2 string. The column has type BYTEA. The query works without SQLAlchemy. Can anyone see why Python thinks this byte string is in UTF-8?
'\xa9'(the copyright symbol). Something, somewhere, is trying to decode it as UTF-8. The last line in the stacktrace shows "sqlalchemy/engine/default.py", line 331, in do_execute""""string is presumably UTF-8 and then you're trying to embed non-UTF-8 data inside it using simpleminded string operations. You might need to manually handle the ASCII-ification of the binary data to do it that way (or maybe there's a %-code that will do it for you, not a Python guy, sorry). PostgreSQL supports a couple different encoding schemes for embedding binary data in strings, maybe try one of those.len("""abc%sdef""" % '\xa9')produces 7 as expected, sans errors. However theexecutemethod may do its own brand of interpolation. For now an extra base64 encoding is a sufficient workaround, but it will slow things down :(