2

i want to concatenate some strings(Persian strings) in python:

            for t in lstres:
            with conn:
                c = conn.cursor()   
                q="SELECT fa FROM words WHERE en ='"+t+"'"
                c.execute(q)
                lst=c.fetchall()

                if lst:
                    W.append(lst)
                else:
                    W.append(t)

        cnum=1
        for can in W:
            cnum=cnum*len(W)

        candida=Set()

        for ii in range(1,min(20,cnum)):
            candid=""
            for w in W:
                candid+=str(" "+random.choice (w)[0]).encode('utf-8')
            candida.add(candid)

but it says :

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd8 in position 1: ordinal not in  range(128)

what is the problem ?

2
  • 4
    post the full traceback please... Commented Sep 13, 2012 at 15:58
  • I meant the error message, not the code ^^ also, the part where you fill W would be interesting. Commented Sep 13, 2012 at 16:04

3 Answers 3

1

Somewhere along the line Python is trying to do an implicit type conversion from a unicode string to an ASCII encoded string. Where this is happening is difficult to tell from what you've posted, but it's better to just make sure that you always use unicode anyway. To do this you need to add a u in front of all your strings like so: u"A unicode string" and always use unicode() instead of str().

Unicode is often overlooked by English language programmers and tutorials because in English you can get away with just using ASCII encoded characters. Unfortunately the rest of the world suffers for this because most languages use characters not supported by ASCII. It might be useful to look over the Python Unicode HOWTO to get some guidance on good programming practice in Unicode.

I also found this article very useful.

Sign up to request clarification or add additional context in comments.

Comments

1

The problem is here:

for ii in range(1,min(20,cnum)):
   candid=""
   for w in W:
       candid+=str(" "+random.choice (w)[0]).encode('utf-8')
    candida.add(candid)

It should be

for ii in range(1,min(20,cnum)):
    candid=u""
    for w in W:
        candid+=str(u" "+random.choice (w)[0]).encode('utf-8')
    candida.add(candid)

but it's not idiomatic python

you should do

for ii in range(1,min(20,cnum)):
     candida.add(u" ".join(random.choice (w)[0] for w in W))

moreover there is a potentiel sql injection in your script

q="SELECT fa FROM words WHERE en ='"+t+"'"
c.execute(q)

you should do

q="SELECT fa FROM words WHERE en =?"
c.execute(q, (t,))

(t,) is a tuple with only one element

Comments

0

You need to declare your strings as Unicode :

u'Your string here éàèç×...'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.