I am trying to do a regex search on some text where I am only interested in the text between some patterns.
Sample text:
<h2><font color='#fff'>some text </font></h2><HR noshade size="5" width="50%" align="center"><table><tr><th id='t1'>Host </th><th>10.0.1.1</th></tr><th id='t1'>Port </th><th>8080</th></tr><th id='t1'>User </th><th>chris</th></tr><th id='t1'>Password </th><th>chris</th></tr></table><h4><font color='#fff'>
...
<h2><font color='#fff'>some more text </font></h2><HR noshade size="5" width="50%" align="center"><table><tr><th id='t1'>Host </th><th>10.0.1.2</th></tr><th id='t1'>Port </th><th>9090</th></tr><th id='t1'>User </th><th>bob</th></tr><th id='t1'>Password </th><th>bob</th></tr></table><h4><font color='#fff'>
This is my regex:
Host.*?<th>(.*?)<.*Port.*?<th>(.*?)<.*User.*?<th>(.*?)<.*Password.*?<th>(.*?)<
Each regex match is returning a list and that is not what I want. I would like the groups to be combined into a string.
This is the output I want:
10.0.1.1 8080 chris chris
10.0.1.2 9090 bob bob
Here is what I am doing:
lines = []
lines.extend(re.findall(r"Host.*?<th>(.*?)<.*Port.*?<th>(.*?)<.*User.*?<th>(.*?)<.*Password.*?<th>(.*?)<", s))
print (lines)
Which gives me:
[('10.0.1.1', '8080', 'chris', 'chris'), ('10.0.1.2', '9090', 'bob', 'bob')]
Can anyone exaplin why this happens and how I can get what I want?
Thanks, Chris
findallreturns an list of matches, which in this case are tuples. Have you tried usingjointo join elements of each tuple?print('\n'.join([' '.join(values) for values in lines]))