I´m with an odd issue here, I´ve been using Jsoup 1.7.2 for a while, with no issues, only now, when I try to retrieve the main headlines from this website: www.jornaldamarinha.pt, using this code:
// Connecting...
Document doc = Jsoup.connect("http://www.jornaldamarinha.pt")
.timeout(0)
.get();
// "*[class*=zincontent-wrap]" in "Jsoup idiom", means:
// Select all tags that contains classes with "zincontent-wrap" on its name.
Elements elems = doc.select("*[class*=zincontent-wrap]"); // Retrieves 0 results!
int t = elems.size();
Log.w("INFO", "Total Headlines: " + t);
// Loop trought all retrieved headlines:
for (Element e : elems) {
String headline = e.select("a").text().toString();
Log.w("HEADLINE", headline);
};
It fails!... Retrieves 0 results. (Should retrieve ~8)
The chances are that the issue is caused by:
- Aliens... (Similar to androids, but uglier...)
- Website encoding. (I tried to encode incoming HTML with ISO-8859-15, to handle portuguese special characters, but the issue remains)
- Mal-formatted incoming HTML. (I doubt this could be the issue, since the selector works fine on "Try jsoup online webpage", and Jsoup usually handles broken HTML very well)
- The use of the minus symbol in the class name ("-") is messing with Jsoup. (Seems, to me, to be the main (or at least, one) cause of the issue)
- Something else... (Very probably!)
BUT... at http://try.jsoup.org if I fetch the URL: http://www.jornaldamarinha.pt using this CSS Query:
*[class*=zincontent-wrap]
Everything works just great, there! (Retrieves all the ~8 correct results!)
SO... to resume, all I need is to do exactly what that webpage does, but using code.
THANKS, in advance, for any light or workaround, about this! :)