4

I am reading a JSON string from a file, parsing it, then inserting the data into a MySQL database. My insert query is throwing the following error:

SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xE3\xADs' for column 'fname' at row 1

I believe the content causing the error is the í in the name Ailís (I echoed IDs until the error was thrown).

  • The file is UTF8 encoded
  • I am reading the file using a UTF8 context
  • I am checking the encoding of the data to be UTF8 (it is)
  • My PDO connection has a UTF8 charset, as well as SET NAMES utf8
  • The database is UTF8 encoded
  • The table is UTF8 encoded
  • The column is UTF8 encoded

Code:

$opts = ['http' => ['header' => 'Accept-Charset: UTF-8, *;q=0']];
$context = stream_context_create($opts);
$post = file_get_contents('sample_data/11111a_json_upload.json',false, $context);
if(!mb_check_encoding($post, 'UTF-8'))
    throw new Exception('Invalid encoding detected.');
$data = json_decode($post, true);

I also inserted the following function before I decoded the JSON:

static function clean_unicode_literals($string)
{
    return preg_replace_callback('@\\\(x)?([0-9a-zA-Z]{2,3})@',
        function ($m) {
            if ($m[1]) {
                $hex = substr($m[2], 0, 2);
                $unhex = chr(hexdec($hex));
                if (strlen($m[2]) > 2) {
                    $unhex .= substr($m[2], 2);
                }
                return $unhex;
            } else {
                return chr(octdec($m[2]));
            }
        }, $string);
}

When I read the raw file, and when I echo the parsed data to the browser, the name appears correctly. I assume therefore the issue is somewhere in my connection?

I create a new PDO instance like so:

public function __construct($db_user, $db_pass, $db_name, $db_host, $charset)
{
    if(!is_null($db_name))
        $dsn = 'mysql:host=' . $db_host . ';dbname=' . $db_name . ';charset=' . $charset;
    else
        $dsn = 'mysql:host=' . $db_host . ';charset=' . $charset;

    $options = [
        PDO::ATTR_PERSISTENT => true,
        PDO::ATTR_ERRMODE    => PDO::ERRMODE_EXCEPTION,
        PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'utf8'"
    ];

    try
    {
        $this->db_handler = new PDO($dsn, $db_user, $db_pass, $options);
        $this->db_handler->exec('SET NAMES utf8');
        $this->db_valid = true;
    }
    catch(PDOException $e)
    {
        $this->db_error = $e->getMessage();
        $this->db_valid = false;
    }

    return $this->db_valid;
}

(SET NAMES is there twice as I'm troubleshooting...)
The database, table, and column charsets are set to utf8_general_ci.

My IDE is PHPStorm, and I am running WAMP MySQL 5.7.14 on Windows 10.

1
  • So where is the code that actually does the insert? Commented Oct 10, 2017 at 17:48

1 Answer 1

1

Something is definitely wrong with that input string: \xE3\xADs

The first nibble E indicates that it should be a 3-byte UTF-8 sequence, but there are only two bytes.

And it's definitely not the í as thats the two-byte sequence \xC3\xAD.

I have to wonder why you've got that clean_unicode_literals function in there at all as all JSON strings and documents are supposed to be valid UTF-8 according to the JSON spec.

Try removing the clean_unicode_literals calls, and if you're still getting an error then the source data is corrupt.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.