0

I have HTML Code in a string named gridHTML

    <html>
<body>
<style>a{text-decoration:none; color: black;} th { border: solid thin; }

td{text-align: center;vertical-align: middle;font-family: Arial;font-size: 8pt; height: 50px;
border-width: 1px;border-left-style: solid;border-right-style: solid;}

table { border-collapse: collapse; } tr:nth-child(1) { border: solid thin; border-width: 2px;}
tr{ border: solid thin; border-style: dashed solid dashed solid;}
</style>
<div>
    <table >
        <tr class='leftColumnTableHeadO' align='center' style='font-family: Arial; font-size: 8pt; font-weight: normal; width: 100px;'>
            <th scope='col'>TM No.</th>
            <th scope='col' style='width: 83px;'>Filing Date</th>
            <th scope='col'>TradeMark</th>
            <th scope='col'>Class</th>
            <th scope='col'>Jr#</th>
            <th scope='col'>Applicant</th>
            <th scope='col'>Agent / Attorney</th>
            <th scope='col'>Status</th>
            <th scope='col'>City</th>
            <th scope='col'>Logo</th>
        </tr>
        <tr class='lightGrayBg' >
            <td ><a title='View Report' class='calBtn' href='javascript:__doPostBack(&#39;ctl00$MainContent$grdTradeMarkNumber$ctl02$ctl00&#39;,&#39;&#39;)'>38255</a>                                        </td>
            <td ><span id='MainContent_grdTradeMarkNumber_lblFilingDate_0'>09-12-1962</span>                                        </td>
            <td >IMIDAN</td>
            <td >5</td>
            <td >158</td>
            <td >test</td>
            <td >test</td>
            <td >Registered</td>
            <td >DELWARE</td>
            <td ></td>
        </tr>
        <tr >
            <td ><a title='View Report' class='calBtn' href='javascript:__doPostBack(&#39;ctl00$MainContent$grdTradeMarkNumber$ctl03$ctl00&#39;,&#39;&#39;)'>188389</a>                                        </td>
            <td ><span id='MainContent_grdTradeMarkNumber_lblFilingDate_1'>09-09-2003</span>                                        </td>
            <td >RAND</td>
            <td >16</td>
            <td >682</td>
            <td >Ttest </td>
            <td >test </td>
            <td >Advertised</td>
            <td >CALIFORNIA</td>
            <td ></td>
        </tr>
        <tr class='lightGrayBg' >
            <td ><a title='View Report' class='calBtn' href='javascript:__doPostBack(&#39;ctl00$MainContent$grdTradeMarkNumber$ctl04$ctl00&#39;,&#39;&#39;)'>207063</a>                                        </td>
            <td ><span id='MainContent_grdTradeMarkNumber_lblFilingDate_2'>11-03-2005</span>                                        </td>
            <td >FP DIESEL</td>
            <td >7</td>
            <td >690</td>
            <td >testtest</td>
            <td >testtest</td>
            <td >Advertised</td>
            <td >-</td>
            <td ></td>
        </tr>

    </table>
</div>
</body>
</html>

I want to get all rows separately in a list i am using split method to do this

List<string> rows = gridHTML.Split(new string[] { "<tr" }, StringSplitOptions.None).ToList();

but the problem is when i look into the list "<td" is removed

Is there any (other) way to get all rows in a list ?

3
  • Why do you parse ASP.NET controls as string at all? What is your desired result? Commented Nov 25, 2015 at 9:17
  • I can't explain it here :( Commented Nov 25, 2015 at 9:21
  • when i use this List<string> rows = gridHTML.Split(new string[] { "<tr" }, StringSplitOptions.None).ToList(); <tr is removed in all list Commented Nov 25, 2015 at 9:25

2 Answers 2

2

For this one, you could use Linq To XML easily. ie:

var rows = XElement.Parse(gridHTML).Descendants("tr");                           
var cells = rows.Elements("td");
var cellContentsAsString = cells.Select(c => (string)c);

etc.

Sign up to request clarification or add additional context in comments.

2 Comments

Linq-To-XML has some issues with HTML entities blogs.msdn.com/b/bethmassi/archive/2008/04/25/…
I know. That is why I said "For this one" specifically. Otherwise, I could use SgmlReader to sanitize the Html and still use Linq To XML. I often find myself to make a choice between using Linq To XML and HtmlAgilityPack and I find Linq to be easier maybe because it is easier to add the required namespaces and ponder around.
2

You should not use string methods (or regex) to parse HTML, i recommend HtmlAgilityPack:

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(gridHTML); 
List<HtmlNode> trList = doc.DocumentNode.Descendants("tr").ToList();

Since it seems that you want to load this table data into a collection, maybe following approach is better for your requirement. It will load the rows and cells into a DataTable, even the DataColumns are initialized correctly with the table-header values:

DataTable table = new DataTable();
bool firstRowContainsHeader = true;
var tableRows =  doc.DocumentNode.Descendants("tr");
var tableData = tableRows.Skip(firstRowContainsHeader ? 1 : 0)
    .Select(row => row.Descendants("td")
        .Select((cell, index) => new { row, cell, index, cell.InnerText })
        .ToList());

var headerCells = tableRows.First().Descendants()
    .Where(n => n.Name == "td" || n.Name == "th");
int columnIndex = 0;
foreach (HtmlNode cell in headerCells)
{ 
    string colName = firstRowContainsHeader 
        ? cell.InnerText 
        : String.Format("Column {0}", (++columnIndex).ToString());
    table.Columns.Add(colName, typeof(string));
}
foreach (var rowCells in tableData)
{
    DataRow row = table.Rows.Add();
    for (int i = 0; i < Math.Min(rowCells.Count, table.Columns.Count); i++)
    {
        row.SetField(i, rowCells[i].InnerText);
    }
}

5 Comments

i have string variable not text file !
what is the header for HtmlAgilityPack
Error HtmlAgilityPack and HtmlNode could not found
@zulqarnainjalil: i've added the link to my answer. You have to download the dll and add the reference to your project.
@zulqarnainjalil: i've added another approach that fills a DataTable with all data of your table, even column-names are filled correctly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.