Python - Replace list of characters with another list

Question

I have two lists:

wrong_chars = [
    ['أ','إ','ٱ','ٲ','ٳ','ٵ'],
    ['ٮ','ݕ','ݖ','ﭒ','ﭓ','ﭔ'],
    ['ڀ','ݐ','ݔ','ﭖ','ﭗ','ﭘ'],
    ['ٹ','ٺ','ٻ','ټ','ݓ','ﭞ'],
]

true_chars = [
    ['ا'],
    ['ب'],
    ['پ'],
    ['ت'],
]

For a given string I want to replace the entries in wrong_chars with those in true_chars. Is there a clean way to do that in python?

Community · Accepted Answer · 2017-05-23 12:14:08Z

8

string module to the rescue!

There's a really handy function as a part of the string module called translate that does exactly what you're looking for, though you'll have to pass in your translation mapping as a dictionary.

The documentation is here

An example based on a tutorial from tutoriapoint is shown below:

>>> from string import maketrans

>>> trantab = maketrans("aeiou", "12345")
>>> "this is string example....wow!!!".translate(trantab)

th3s 3s str3ng 2x1mpl2....w4w!!!

It looks like you're using unicode here though, which works slightly differently. You can look at this question to get a sense, but here's an example that should work for you more specifically:

translation_dict = {}
for i, char_list in enumerate(wrong_chars):
    for char in char_list:
        translation_dict[ord(char)] = true_chars[i]

example.translate(translation_dict)

edited May 23, 2017 at 12:14

CommunityBot

11 silver badge

answered Jul 1, 2015 at 16:56

Slater Victoroff

22k23 gold badges92 silver badges149 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Chalist Over a year ago

thanks for good answer. but i have question again. I change your code to translation_dict[ord(char.decode('utf-8'))] = true_chars[i]. This is true? and i get error: expected a character buffer object in this line

Slater Victoroff Over a year ago

@chalist you shouldn't have to decode the character to get the ord. Have you tried on the raw unicode object?

TheFaultInOurStars Over a year ago

Note that string module does not contain maketrans function in python 3, rather it is available in python2. If anyone is interested in using maketrans, they need call this function on str: str.maketrans(...)

Amir Yousefi · Accepted Answer · 2015-07-02 20:24:25Z

I merged your two wrong and true chars in a list of dictionaries of wrongs and what should be replaced with them. so here you are:
link to a working sample http://ideone.com/mz7E0R
and code itself

given_string = "ayznobcyn"
correction_list = [
                    {"wrongs":['x','y','z'],"true":'x'},
                    {"wrongs":['m','n','o'],"true":'m'},
                    {"wrongs":['q','r','s','t'],"true":'q'}
                  ]

processed_string = ""
true_char = ""

for s in given_string:
    for correction in correction_list:
        true_char=s
        if s in correction['wrongs']:
            true_char=correction['true']
            break
    processed_string+=true_char

print given_string
print processed_string

this code can be more optimized and of course i do not care about unicode problems if there was any, because i see you are using Farsi. you should take care about that.

jfs · Accepted Answer · 2015-07-01 21:48:35Z

1

#!/usr/bin/env python
from __future__ import unicode_literals

wrong_chars = [
    ['1', '2', '3'],
    ['4', '5', '6'],
    ['7'],
]
true_chars = 'abc'

table = {}
for keys, value in zip(wrong_chars, true_chars):
    table.update(dict.fromkeys(map(ord, keys), value))
print("123456789".translate(table))

Output

aaabbbc89

edited Jul 1, 2015 at 21:48

answered Jul 1, 2015 at 21:43

jfs

417k210 gold badges1k silver badges1.7k bronze badges

3 Comments

jfs Over a year ago

@chalist: the code works as is on Python 2 and 3. Do you have from __future__ import unicode_literals at the top in your code?

jfs Over a year ago

@chalist: here's live example that demonstrates that it works. Update your quesiton, to include the complete (but minimal) code example with the full traceback if any.

jfs Over a year ago

@chalist: a single user-perceived character may span several Unicode codepoints. (I've used 'abc' as a shortcut for ['a', 'b', 'c']). Use a list, to see the character boundaries: ideone.com/cweBU9 If a "wrong character" contains more than one Unicode codepoint then you could use text.replace(multiple_codepoints, true_char) or re.sub("|".join(map(re.escape, ['1', '2', '3'])), 'a', text)

Hosein Remezan · Accepted Answer · 2015-07-02 10:11:12Z

0

In my idea you can make just one list that contain true characters too like this:

NewChars = {["ا"،"أ"،"إ"،"آ"], ["ب"،"بِ"،"بِ"،]} 
# add all true characters to the first of lists and add all lists to a dict, then:
Ch="إ"
For L in NewChars:
    If Ch in L: return L[0]

answered Jul 2, 2015 at 10:11

Hosein Remezan

43810 silver badges20 bronze badges

1 Comment

Chalist Over a year ago

thanks but list is very very big. each of rows has over 100 char somtimes.

Collectives™ on Stack Overflow

Python - Replace list of characters with another list

4 Answers 4

3 Comments

Comments

Output

3 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

Comments

Output

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related