3

I have a column in a PostgresQL table of type BYTEA. The model class defines the column as a LargeBinary field, which the documentation says "The Binary type generates BLOB or BYTEA when tables are created, and also converts incoming values using the Binary callable provided by each DB-API."

I have a Python string which I would like to insert into this table.

The Python string is:

'\x83\x8a\x13,\x96G\xfd9ae\xc2\xaa\xc3syn\xd1\x94b\x1cq\xfa\xeby$\xf8\xfe\xfe\xc5\xb1\xf5\xb5Q\xaf\xc3i\xe3\xe4\x02+\x00ke\xf5\x9c\xcbA8\x8c\x89\x13\x00\x07T\xeb3\xbcp\x1b\xff\xd0\x00I\xb9'

The relevant snippet of my SQLAlchemy code is:

    migrate_engine.execute(
        """
        UPDATE table
        SET x=%(x)s
        WHERE id=%(id)s
        """,
        x=the_string_above,
        id='1')

I am getting the error:

sqlalchemy.exc.DataError: (DataError) invalid byte sequence for encoding "UTF8": 0x83
'\n            UPDATE table\n            SET x=%(x)s\n            WHERE id=%(id)s\n            ' {'x': '\x83\x8a\x13,\x96G\xfd9ae\xc2\xaa\xc3syn\xd1\x94b\x1cq\xfa\xeby$\xf8\xfe\xfe\xc5\xb1\xf5\xb5Q\xaf\xc3i\xe3\xe4\x02+\x00ke\xf5\x9c\xcbA8\x8c\x89\x13\x00\x07T\xeb3\xbcp\x1b\xff\xd0\x00I\xb9', 'id': '1',}

If I go into the pgadmin3 console and enter the UPDATE command directly, the update works fine. The error is clearly from SQLAlchemy. The string is a valid Python2 string. The column has type BYTEA. The query works without SQLAlchemy. Can anyone see why Python thinks this byte string is in UTF-8?

6
  • no idea, what happens if you make it a raw string? As in r"\x83\x8a..." Commented Jun 12, 2014 at 22:48
  • Then the slashes would be in the string. I get the same problem with a single character string like '\xa9' (the copyright symbol). Something, somewhere, is trying to decode it as UTF-8. The last line in the stacktrace shows "sqlalchemy/engine/default.py", line 331, in do_execute" Commented Jun 12, 2014 at 22:52
  • 1
    Is it the placeholder substitution that's doing it? The """ string is presumably UTF-8 and then you're trying to embed non-UTF-8 data inside it using simpleminded string operations. You might need to manually handle the ASCII-ification of the binary data to do it that way (or maybe there's a %-code that will do it for you, not a Python guy, sorry). PostgreSQL supports a couple different encoding schemes for embedding binary data in strings, maybe try one of those. Commented Jun 12, 2014 at 23:22
  • Could it be the client_encoding settings? Just spitballing here... Commented Jun 13, 2014 at 1:26
  • 1
    @muistooshort That may be. For plain string interpolation that is not the case, e.g. len("""abc%sdef""" % '\xa9') produces 7 as expected, sans errors. However the execute method may do its own brand of interpolation. For now an extra base64 encoding is a sufficient workaround, but it will slow things down :( Commented Jun 13, 2014 at 3:36

1 Answer 1

1

Try wrapping the data in a buffer:

migrate_engine.execute(
    """
    UPDATE table
    SET x=%(x)s
    WHERE id=%(id)s
    """,
    x=buffer(the_string_above),
    id='1')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.