0

I want to split below string to get each statement separately using regular expression.

Input string:

str1 = "1. Write down what you eat for one week and you will lose weight.  2. Add 10 percent to the amount of daily calories you think you're eating.  3. Get an online weight loss buddy to lose more weight.  4. Get a mantra.  5. Eat three fewer bites of your meal, one less treat a day, or one less glass of orange juice. More items"

My attempt:

re.split(r'[A-z]\.',str1)

Output:

['1. Write down what you eat for one week and you will lose weigh',  "
  2. Add 10 percent to the amount of daily calories you think you're eatin",  
  '  3. Get an online weight loss buddy to lose more weigh',  '
  4. Get a mantr',  '  5. Eat three fewer bites of your meal, one less treat a day, or one less glass of orange juic',  ' More items']

In output i am missing last letter of every statement. I want the output as below:

['1. Write down what you eat for one week and you will lose weight', " 2. Add 10 percent to the amount of daily calories you think you're eating", ' 3. Get an online weight loss buddy to lose more weight', ' 4. Get a mantra', ' 5. Eat three fewer bites of your meal, one less treat a day, or one less glass of orange juice', ' More items']

2 Answers 2

3

The reason for this is because you are consuming the last 2 characters, the same characters on which you are splitting on. If you don't mind losing the ., then you can use a lookbehind to keep the last letter:

re.split(r'(?<=[a-z])\.',str1)

Also, do note that [A-z] does not mean all letters, there are several characters included within that character range that are not letters.

Sign up to request clarification or add additional context in comments.

1 Comment

Yeah, actually, "[ \ ] ^ _ `" symbols are between "Z" and "a".
2

Use a positive lookbehind, so that the preceding character is not consumed:

re.split(r'(?<=[A-Za-z])\.',str1)

See https://docs.python.org/2/library/re.html:

(?<=...) Matches if the current position in the string is preceded by a match for ... that ends at the current position.

1 Comment

OK, I see - "[ \ ] ^ _ `" are in-between - I updated the answer :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.