I just whipped up this little extension to
MySQLdb/cursors.py today. It automatically
translates strings coming out of MySQL to Unicode
if appropriate. I suspect it's not quite what you'll
want (it does double the number of cursor classes),
but it does give you some code to start with.
Typical usage for people using predominantly
western European languages would be to just pick
the approprite Unicode variant of the various
cursor classes when connecting to the database. For
people who use different encodings than the defaults
listed in UnicodeMixIn.encodings, they can simply
subclass the appropriate cursor class and define
their own list of encodings to try.
The patch is against 0.9.1 but I compared my version
of cursors.py with that in 0.9.2b1 and didn't see
any obvious conflicts.
Skip
Logged In: YES
user_id=71372
I'm probably not going to use this patch. It seems to me
that the easiest way to do this is to add a new converter, i.e.
from MySQLdb.constants import FIELD_TYPE
conv = MySQLdb.converters.conversions.copy()
conv[FIELD_TYPE.VAR_STRING] = Char2Unicode
XXX maybe also FIELD_TYPE.STRING
db = MySQLdb.connect(..., conv=conv)
where Char2Unicode is pretty much your encode_string method
as a function.
If anything, I think you would want to subclass the
Connection object to add the unicode translation stuff,
rather than the various Cursor classes. Take a look at
Connection.init() to see how it handles writing out
unicode objects.
Logged In: YES
user_id=44345
My first thought was to add a new converter, but it
seems that only has an effect when passing data to
MySQL. I need something that works for data coming
out of MySQL. I don't see where the connection object
gets involved with data coming out of the database
either.
Skip
Logged In: YES
user_id=71372
The converter works both ways. When sending data to the
database, it looks for the Python type or class as the key.
When retrieving data from the database, it uses a MySQL
FIELD_TYPE. These conversions are actually done in _mysql.c;
see _mysql_field_to_python(), _mysql_row_to_tuple(), and
_mysql_ResultObject_New() (for reading); and
_mysql_escape*() (for writing).
Another reason you would want to do this as part of the
connection is that MySQL (3.23.21+) has a default character
set associated with each connection
(connection.character_set_name()), which is usually latin1.
Thus some of the default conversion functions are overridden
by bound Connection object methods.
Logged In: YES
user_id=71372
0.9.2c1 returns CHAR and VARCHAR columns as unicode if the
correct connection option is used. Can you give 0.9.2c1 a try?
Logged In: YES
user_id=71372
0.9.2c2 should resolve this issue for you