2

I get a string line:

>>> line = "  abc\n  def\n\n  ghi\n  jkl"
>>> print line
  abc
  def

  ghi
  jkl

and I want to convert it to "abcdef\n\n ghijkl", like:

>>> print "  abcdef\n\n  ghijkl"
  abcdef

  ghijkl

I tried python re module, and write something like this:

re.sub('(?P<word1>[^\n\s])\n\s*(?P<word2>[^\n\s])', '\g<word1>\g<word2>', line)

but I get this:

>>> re.sub('(?P<word1>[^\n\s])\n\s*(?P<word2>[^\n\s])', '\g<word1>\g<word2>', line)
Out: '  abcdefghijkl'

It seems to me that the \n\s* part is also matching \n\n. Can any one point out where I get it wrong?

3 Answers 3

4

\s matches space, \t, \n (and, depending on your regex engine) a few other whitespace characters.

So if you only want to replace single linebreaks + spaces/tabs, you can use this:

newline = re.sub(r"(?<!\n)\n[ \t]*(?!\n)", "", line)

Explanation:

(?<!\n) # Assert that the previous character isn't a newline
\n      # Match a newline
[ \t]*  # Match any number of spaces/tabs
(?!\n)  # Assert that the next character isn't a newline

In Python:

>>> line = "  abc\n  def\n\n  ghi\n  jkl"
>>> newline = re.sub(r"(?<!\n)\n[ \t]*(?!\n)", "", line)
>>> print newline
  abcdef

  ghijkl
Sign up to request clarification or add additional context in comments.

Comments

0

Try this,

line = "  abc\n  def\n\n  ghi\n  jkl"
print re.sub(r'\n(?!\n)\s*', '', line)

It gives,

abcdef
ghijkl

It says, "Replace a new line, followed by a space that is NOT a new line with nothing."

UPDATE: Here's a better version

>>>  re.sub(r'([^\n])\n(?!\n)\s*', r'\1', line)
'  abcdef\n\n  ghijkl'

It gives exactly what you said in the first post.

Comments

0

You could simplify the regexp if you used \S, which matches any non-whitespace character:

>>> import re
>>> line = "  abc\n  def\n\n  ghi\n  jkl"
>>> print re.sub(r'(\S+)\n\s*(\S+)', r'\1\2', line)
  abcdef

  ghijkl

However, the reason why your own regexp is not working is because your <word1> and <word2> groups are only matching a single character (i.e. they're not using +). So with that simple correction, your regexp will produce the correct output:

>>> print re.sub(r'(?P<word1>[^\n\s]+)\n\s*(?P<word2>[^\n\s]+)', r'\g<word1>\g<word2>', line)
  abcdef

  ghijkl

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.