4

Is there any built in way to do this?

rawstr = r"3 \u176? \u177? 0.2\u176? (2\u952?)"
#required str is 3 ° ± 0.2° (2θ).

something like

In [1] rawstr.unescape()?
Out[1]: '3° ± 0.2° 2θ'

The question is how to convert rawstr to 'utf-8'.

Please see my answer for more clarity.

Please answer if better option than what I am doing right now.

4
  • 1
    you could use codecs.raw_unicode_escape_decode. Unfortunately your raw string contains invalid unicode escapes, hence it does not work (I'm referring to \u176?. They should be in the form \uXXXX) Commented Mar 2, 2017 at 6:37
  • Alternatively, create a bytestring (use rb as prefix) and use .decode('unicode-escape'), but this again fails because \u176? is not a valid unicode escape. Commented Mar 2, 2017 at 6:39
  • Thanks. I think I will have to write decoder for me. Commented Mar 2, 2017 at 6:45
  • Possible duplicate of How to decode string representative of utf-8 with python? Commented Mar 2, 2017 at 7:58

2 Answers 2

2

Yep, there is!

For python 2:

print r'your string'.decode('string_escape')

For python 3, you need to transform it as bytes, and then use decode:

print(rb'your string'.decode('unicode_escape'))

Note that this doesn't work in your case, since your symbols aren't escaped properly (even if you print them using the "normal" way, it doesn't work).


Your string should be like this:

rb'3\u00B0 \u00b1 0.2\u00B0 2\u03B8'

Note that if you need to transform a string to bytes in python, you can use the bytes function.

my_str = r'3\u00B0 \u00b1 0.2\u00B0 2\u03B8'
my_bytes = bytes(my_str, 'utf-8')
print my_bytes.decode('string_escape') # python 2
print(my_bytes.decode('unicode_escape')) # python 3
Sign up to request clarification or add additional context in comments.

2 Comments

I thinks it is ansi text.
"ANSI text" is not a well-defined term. On Windows, it was misleadingly used in the past to refer to the system's local default encoding, which was widely further misinterpreted to be a particular code page (commonly 1252, though you see all of 437, 850, and whatever is the default in the reader's locale).
1

If you are on windows and pythonnet installed

import clr
clr.AddReference("System")
clr.AddReference("System.Windows.Forms")
import System.Windows.Forms as WinForms

def rtf_to_text(rtf_str):
    """Converts rtf to text"""

    rtf = r"{\rtf1\ansi\ansicpg1252" + '\n' + rtf_str + '\n' + '}'
    richTextBox = WinForms.RichTextBox()
    richTextBox.Rtf = rtf
    return richTextBox.Text

print(rtf_to_text(r'3 \u176? \u177? 0.2\u176? (2\u952?)'))
-->'3 ° ± 0.2° (2θ)'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.