parsing html data iphone xpath

Question

have this webpage http://www.westminster.ac.uk/schools/computing/undergraduate . I'm using hpple to retrieve data (just started learning about it). I want to specifically retrieve the href from he main page, how can i do this?

I have this line - "NSArray *elements = [xpathParser search:@"//a"];" is able to retrieve all of the href links within the page however how can i retrieve just the ones in the main content? e.g. "BSc Honors Busniess Information Systems"? whats the syntax for it?

What is main content? Can you provide sample?

Kirill Polishchuk
– Kirill Polishchuk

2011-08-17 14:34:25 +00:00
Commented Aug 17, 2011 at 14:34 — Kirill Polishchuk
– Kirill Polishchuk, Commented Aug 17, 2011 at 14:34

Developer · Accepted Answer · 2016-01-20 15:16:09Z

1

It looks like all of the "main content" stuff is found underneath elements with id attributes like "content_div_XXXX" where XXXX is some randomly generated sequence. You might be able to get at what you want using an XPath that looks something like:

//div[starts-with(@id,'content_div')]//a

You should be able to get something like this working, although you'd have to try it out and perhaps tweak it a bit to make it work precisely as you want. Refer to W3Schools XPath page for a good set of XPath tutorials

edited Jan 20, 2016 at 15:16

Developer

3315 silver badges19 bronze badges

answered Aug 17, 2011 at 14:37

Tim Dean

8,3022 gold badges35 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Spike Lee Over a year ago

That actually works pretty well i do have some questions. So from what i gather in the tutorial whatever is in '[]' is is used to filter data. So in this case, were looking for a 'div that has an element id and contains the word 'content_div'?.

Tim Dean Over a year ago

The above XPath selects all <a> elements that have a <div> ancestor with an "id" attribute that starts with the string "content_div". The bracket notation is how you implement conditional checks. The '@" syntax is how you reference attributes. If you have an additional question please update the post.

Spike Lee Over a year ago

Thanks, assuming that i want to retrieve all the text from this page (the page content - in text format) westminster.ac.uk/schools/computing/undergraduate/… would the syntax be this?div[starts-with(@id,'content_div')] without the hyperlink //a? Also when you look at the format of the webpage (to see the elements etc), how do u do it? do u just use firefox/mozilla etc to look at the source code raw? or is there a way to see it in xml format? thanks..

Tim Dean Over a year ago

If you want to select the text within the <a> element rather than the <a> element itself, use //div[starts-with(@id,'content_div')]//a/text(). To see the elements of the HTML for this web page, I typically just view source from my browser. However, if you want more robust ways to analyze the HTML most browsers have built-in developer tools or plugins to help with that.

Collectives™ on Stack Overflow

parsing html data iphone xpath

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related