2

I would like to extract a number from a large html file with python. My idea was to use regex like this:

import re
text = 'gfgfdAAA1234ZZZuijjk'
try:
    found = re.search('AAA(.+?)ZZZ', text).group(1)
except AttributeError:
    found = ''

found

But unfortunately i'm not used to regex and i fail to adapt this example to extract 0,54125 from:

(...)<div class="vk_ans vk_bk">0,54125 count id</div>(...)

Is there an other way to extract the number or could some one help me with the regex?

2
  • 2
    Extract the contents of the tag you need with BeautifulSoup and then just split the string and get Item #0. Commented Apr 27, 2018 at 9:15
  • 1
    Do not use regex for HTML parsing: there are enough tools more suitable for this purpose, e.g. BeautifulSoup, lxml.html... Commented Apr 27, 2018 at 9:23

2 Answers 2

1

If you want output 0,54125(or \d+,\d+), then you need to set some conditions for the output.

From the following input,

 (...)<div class="vk_ans vk_bk">0,54125 count id</div>(...)

If you want to extract 0,54125, it seems you can try several regexs like follows,

(?<=\>)\d+,\d+

Demo

or,

(?<=\<div class=\"vk_ans vk_bk\"\>)\d+,\d+

Demo

, etc..

Sign up to request clarification or add additional context in comments.

Comments

1

You can replace some characters in your text before searching it. For example, to capture numbers like 12,34 you can do this:

text = 'gfgfdAAA12,34ZZZuijjk'
try:
    text = text.replace(',', '')
    found = re.search('AAA(\d+)ZZZ', text).group(1)
except AttributeError:
    found = ''

print found
# 1234

If you need to capture the digits inside a line, you can make your pattern more general, like this:

text = '<div class="vk_ans vk_bk">0,54125 count id</div>'
text = text.replace(',', '')
found = re.search('(\d+)', text).group(1)

print found
# 054125

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.