I am trying parse a site, but the html is a mess. Can anyone with more experience in parsing sites help me?
<tr>
<td><font FACE=Tahoma color='#CC0000' size=2><b>Date</b></font></td>
<td><font FACE=Tahoma color='#CC0000' size=2><b>Place</b></font></td>
<td><font FACE=Tahoma color='#CC0000' size=2><b>Situation</b></font></td>
</tr>
<tr><td rowspan=2>16/09/2011 10:11</td><td>New York</td><td><FONT COLOR="000000">Situation Red</font></td></tr>
<tr><td colspan=2>Optional comment hello new york</td></tr>
<tr><td rowspan=2>16/09/2011 10:08</td><td>Texas</td><td><FONT COLOR="000000">Situation Green</font></td></tr>
<tr><td colspan=2>Optional comment hello texas </td></tr>
<tr><td rowspan=1>06/09/2011 13:14</td><td>California</td><td><FONT COLOR="000000">Yellow Situation</font></td></tr>
</TABLE>
A strange and crazy thing is the comment not in the head of table also the start point(california) dont have comment. So, start point always will be like this:
Date: 06/09/2011 13:14
Place: California
Situation: Yellow Situation
Comment: null
all others places have a comment and will be like this:
Date: 16/09/2011 10:11
Place: New York
Situation: Situation Red
Comment: Optional comment hello new york.
I have tried some approaches, but I don't have much experience with node.js and less with HTML parsing. I need a getting started with parsing crazy stuff.