8

I am a little confused right now I have a PDO connection with charset=utf8 and the DB uses latin.

What does this mean ?

My thought is it means that every connection done by PHP either sending or receiving from the DB is encoded to utf8. However I read a lot that the DB too should using the same charset as the PHP .

Can anyone please explains in details the role of the character set in PHP and in MySQL DB exactly and what's the benefit of aligning them?

1

1 Answer 1

2

Say PHP sends some text to MySQL to be stored, something like

INSERT INTO `some_table` ("foo") VALUES 
('The quick brown fox jumps over the lazy dog');

The basic intent of this query is obviously to tell MySQL to store the string The quick brown fox jumps over the lazy dog into the database.

If PHP is configured to use UTF-8, it means that when it converts the human readable characters to binary - in order to transmit it to MySQL - it will convert the characters using the UTF-8 encoding system.
MySQL can read characters encoded in UTF-8 and so it has no problems understanding that the digitally encoded sequence is meant to mean T and h and e etc - in human readable characters.
If MySQL is configured to store data in the some_table table using latin1, when it receives the string, it will convert the characters from their UTF-8 encodings to the latin1 equivalents prior to saving the data to harddisk.
In this case there is no problem - because the english alphabet characters can be represented by both UTF-8 and Latin1.
However, problems occur if the string PHP sent contained characters that can only be represented by UTF-8 and not Latin1, e.g. a smart quote . When MySQL tries to convert the smart quote into a digital form, it won't be able to - because the Latin1 literally has no digital encoding defined to represent .
I'm not sure what MySQL's exact error management process is when it encounters this situation, and whether the situation is recoverable, but generally the end result is that the underlying encoding will corrupted and unusable.
Because this problem only occurs for those characters which cannot be represented by the two systems - and if 99% of all your communications involve english characters, you may not notice a problem for quite a while and even then it will only be the occasional character, but trying to recover when you do notice problems could be frustrating.

Sign up to request clarification or add additional context in comments.

5 Comments

This is awesome and it explains A lot thanks . But wanted to ask you , if I use utf8 in php is it recommended I use the same with MySQL even though I'm trying to save performance with a DB with nearly 20 columns and 2000 rows . And for the mean time I just use English (might change in the future )
Cool. I can't imagine why switching MySQL would cause performance issues especially with a database that small. But if you were that worried you can always setup a DB and time your queries with the old and new MySQL charsets. My understanding is that when MySQL release v6.0 UTF-8 will the default charset moving forward anyway
Switching won't cause issue the queries it self would be slower . My understanding is UTF8 takes more space than Latin making it slower to query please correct if I'm wrong and give me your final suggestion about my situation I know the DB is small but I'm running on a minimal resources so I don't need to suck all the cup power to queries .
If you are worried about the performance, setup the new charset and measure it the time it takes to run a complex query. Inside your PHP set a timestamp using microtime(true); run your query then set another timestamp and measure the difference. for more info refer: <stackoverflow.com/a/5267918/4668401>
Thanks a lot , I compared some query they have almost the same speed so k converted to utf8mb4 .

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.