Remove ASCII control characters from text file Python

Question

I have a text file from which I have to read a lot of numbers (double). It has ASCII control characters like DLE, NUL etc. which are visible in the text file. so when I read them to get only the doubles/ints from a line, I am getting erros like "invalid literals \x10". Shown below are the first 2 lines of my file.

DLE NUL NUL NUL [1, 167, 133, 6]DLE NUL NUL   
YS FS NUL[0.0, 4.3025989e-07, 1.5446712e-06, 3.1393029e-06, 5.0430463e-06, 7.1382601e-06

How do I remove all these control characters from a text file at once, using Python? I want this to be done before I parse the file into numbers ...

Any help is appreciated!

Perhaps you should consider parsing them instead so that you know how to parse the rest of the file. — Ignacio Vazquez-Abrams
– Ignacio Vazquez-Abrams, Commented Jul 5, 2013 at 3:34
However, I still really need to remove these characters before I do any sort of reading with them.... — atmaere
– atmaere, Commented Jul 5, 2013 at 3:40

falsetru · Accepted Answer · 2013-07-05 03:39:38Z

3

Use string.printable.

>>> import string
>>> filter(string.printable.__contains__, '\x00\x01XYZ\x00\x10')
'XYZ'

answered Jul 5, 2013 at 3:39

falsetru

371k69 gold badges770 silver badges660 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Wesley Baugh Over a year ago

Using regex (see this answer) is an order of magnitude faster.

falsetru Over a year ago

@WesleyBaugh, If speed matters, you can use str.translate.

falsetru Over a year ago

@alvas, How about using unicode(string.printable) if you want to use exactly same characters?

user1012513 · Accepted Answer · 2017-04-20 13:54:33Z

2

I know it is very old post, but I am answering as I think, it could help others.

I did as follows. It will replace all ASCII control characters by an empty string.

line = re.sub(r'[\x00-\x1F]+', '', line)

Ref: ASCII (American Standard Code for Information Interchange) Code

Ref: Python re.sub()

answered Apr 20, 2017 at 13:54

user1012513

2,37319 silver badges15 bronze badges

Collectives™ on Stack Overflow

Remove ASCII control characters from text file Python

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related