1

I have a Django app, using tasypie to serialize some data.

There is a name

"Glòria" 

(with an accented 'o') in the database, but this is not being serialized correctly. In the json produced by tasypie, it comes out as

"Glòria" 

The serializer class looks like this:

import json as simplejson


class PrettyJSONSerializer(Serializer):
    json_indent = 2
    def to_json(self, data, options=None):
        options = options or {}
        data = self.to_simple(data, options)
        return simplejson.dumps(data, cls=json.DjangoJSONEncoder,
            sort_keys=True, ensure_ascii=False, indent=self.json_indent)

Changing the attribute on the simplejson.dumps to

ensure_ascii=True 

returns the following:

"Gl\u00f2ria"
7
  • Is this Python 2 or 3? If it's Python 2, is the name represented by a str or a unicode object? Commented Aug 11, 2015 at 9:52
  • Python 2.7, its stored as unicode internally, the debugger shows: u'Gl\xf2ria ' Commented Aug 11, 2015 at 9:58
  • The "Gl\u00f2ria" version is actually a valid JSON representation of Glòria. Are you sure the problem with ensure_ascii=False is with the serializer and not the client? Commented Aug 11, 2015 at 10:05
  • I don't see a problem with"Gl\u00f2ria" but its not what I want to return. I would like to set ensure_ascii=False, and have it output a ò'' rather than 'ò' Commented Aug 11, 2015 at 10:10
  • 1
    Hmmm. I don't know Django or tastypie, so there might be a proper way to fix this, but FWIW, you can easily convert that Unicode escape to proper Unicode. Eg, s="this is a Gl\u00f2ria test".decode('unicode-escape');print s,repr(s) prints this is a Glòria test u'this is a Gl\xf2ria test'. At least, it'll print that if your console is set to use utf-8 encoding. :) Commented Aug 11, 2015 at 10:20

1 Answer 1

3

I can't comment (yet..) so I'm posting a reply. Python 2 isn't exactly fun with encodings.

Glòria is the correct utf-8 encoded representation of the data in bytes. Gl\u00f2ria is Python 2 internal representation of unicode strings. json.dumps returns a python unicode string. What you probably want to do is encode the output of json.dumps in utf8.

import json
data = u'Gl\xf2ria'
encoded_data = json.dumps(s, ensure_ascii=False).encode("utf8")
print(encoded_data)

prints Glòria.

Edit: Just to make sure

Glòria = Gl\xc3\xb2ria. Printed with the print statement both should display correctly as Glòria.

Sign up to request clarification or add additional context in comments.

2 Comments

You are right, its maybe a web browser based problem, as using curl on the command line displays it properly.
Maybe you're familiar with the matter, but in case somebody reading this later isn't: it's necessary to define the content encoding. text/html provides html tags for this, but something like application/json may require adding the encoding to Content-type HTTP headers to display correctly on browsers (Content-type: application/json; charset=utf-8).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.