1

I use C# and need to parse an HTML to read the attributes into key value pairs. e.g given the following HTML snippet

<DIV myAttribute style="BORDER-BOTTOM: medium none; BACKGROUND-COLOR: transparent; BORDER-TOP: medium none" id=my_ID anotherAttribNamedDIV class="someclass">

Please note that the attributes can be
1. key="value" pairs e.g class="someclass"
2. key=value pairs e.g id=my_ID (no quotes for values)
3. plain attributes e.g myAttribute, which doesn't have a "value"

I need to store them into a dictionary with key value pairs as follows
key=myAttribute value=""
key=style value="BORDER-BOTTOM: medium none; BACKGROUND-COLOR: transparent; BORDER-TOP: medium none"
key=id value="my_ID"
key=anotherAttribNamedDIV value=""
key=class value="someclass"

I am looking for regular expressions to do this.

2
  • 1
    You can't parse [X]HTML with regex. stackoverflow.com/questions/1732348/… Commented Apr 11, 2011 at 14:50
  • Don't use capitals for your html tags. Commented Apr 11, 2011 at 17:26

2 Answers 2

11

You can do this with the HtmlAgilityPack

string myDiv = @"<DIV myAttribute style=""BORDER-BOTTOM: medium none; BACKGROUND-COLOR: transparent; BORDER-TOP: medium none"" id=my_ID anotherAttribNamedDIV class=""someclass""></DIV>";
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(myDiv);
HtmlNode node = doc.DocumentNode.SelectSingleNode("div");

Literal1.Text = ""; 

foreach (HtmlAttribute attr in node.Attributes)
{
    Literal1.Text += attr.Name + ": " + attr.Value + "<br />";
}
Sign up to request clarification or add additional context in comments.

Comments

-1
HtmlDocument docHtml = new HtmlWeb().Load(url);

1 Comment

Could you add some explanation (and formatting for the code)?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.