3

I have a text file from which I have to read a lot of numbers (double). It has ASCII control characters like DLE, NUL etc. which are visible in the text file. so when I read them to get only the doubles/ints from a line, I am getting erros like "invalid literals \x10". Shown below are the first 2 lines of my file. 

DLE NUL NUL NUL [1, 167, 133, 6]DLE NUL NUL   
YS FS NUL[0.0, 4.3025989e-07, 1.5446712e-06, 3.1393029e-06, 5.0430463e-06, 7.1382601e-06

How do I remove all these control characters from a text file at once, using Python? I want this to be done before I parse the file into numbers ...

Any help is appreciated!

2
  • Perhaps you should consider parsing them instead so that you know how to parse the rest of the file. Commented Jul 5, 2013 at 3:34
  • However, I still really need to remove these characters before I do any sort of reading with them.... Commented Jul 5, 2013 at 3:40

2 Answers 2

3

Use string.printable.

>>> import string
>>> filter(string.printable.__contains__, '\x00\x01XYZ\x00\x10')
'XYZ'
Sign up to request clarification or add additional context in comments.

3 Comments

Using regex (see this answer) is an order of magnitude faster.
@WesleyBaugh, If speed matters, you can use str.translate.
@alvas, How about using unicode(string.printable) if you want to use exactly same characters?
2

I know it is very old post, but I am answering as I think, it could help others.

I did as follows. It will replace all ASCII control characters by an empty string.

line = re.sub(r'[\x00-\x1F]+', '', line)

Ref: ASCII (American Standard Code for Information Interchange) Code

Ref: Python re.sub()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.