Python Regex matching and replacing

Question

I have a pdf file with its contents formatted as follows:

00:12 There once lived a man...

00:18 who was thought to have...

and the list goes on following the same pattern. Now I'm trying to write a Regex program that will read the file and remove all of the time stamps as well as replace the line skips with spaces. In other words. I want to make one big paragraph out of it.

This is what I came up for the reg expression:

transcript.replace(transcript.matches("^[0-9:]+$"),"")

and that will get rid of any numbers and colons, meaning the time stamps. Now I'm not sure how to replace the line skips, would I do something like

transcript.replace(transcript.matches("^[\n]+$"), " ")

Any help would be appreciated. Thanks!

Possible duplicate of Python regex over multiple newlines

Guillaume
– Guillaume

2016-11-22 10:18:49 +00:00
Commented Nov 22, 2016 at 10:18 — Guillaume
– Guillaume, Commented Nov 22, 2016 at 10:18

Community · Accepted Answer · 2020-06-20 09:12:55Z

1

Couldn't you just check for a blank line, skip (or delete) those lines and use your transcript code to handle the timestamps?

for line in file:
    if line == "": #test that this is how a blank line is read
       line.delete
    else:
       transcript.replace(transcript.matches("^[0-9:]+$"),"")

This may return a block of text with the following appearance

There once lived a man...

who was thought to have...

Which you still need to wrap into continuous paragraphs. Do the three dots appear at the end of each line as in your quoted text?

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Nov 22, 2016 at 10:26

CJC

4004 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python Regex matching and replacing

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related