Jsoup - CSS Query selector issue (?)

Question

I´m with an odd issue here, I´ve been using Jsoup 1.7.2 for a while, with no issues, only now, when I try to retrieve the main headlines from this website: www.jornaldamarinha.pt, using this code:

// Connecting...
Document doc = Jsoup.connect("http://www.jornaldamarinha.pt")
                    .timeout(0)
                    .get();

// "*[class*=zincontent-wrap]" in "Jsoup idiom", means:  
// Select all tags that contains classes with "zincontent-wrap" on its name.
Elements elems = doc.select("*[class*=zincontent-wrap]"); // Retrieves 0 results!

int t = elems.size();
Log.w("INFO", "Total Headlines: " + t);

// Loop trought all retrieved headlines:
for (Element e : elems) {
   String headline = e.select("a").text().toString();
   Log.w("HEADLINE", headline);
};

It fails!... Retrieves 0 results. (Should retrieve ~8)

The chances are that the issue is caused by:

Aliens... (Similar to androids, but uglier...)
Website encoding. (I tried to encode incoming HTML with ISO-8859-15, to handle portuguese special characters, but the issue remains)
Mal-formatted incoming HTML. (I doubt this could be the issue, since the selector works fine on "Try jsoup online webpage", and Jsoup usually handles broken HTML very well)
The use of the minus symbol in the class name ("-") is messing with Jsoup. (Seems, to me, to be the main (or at least, one) cause of the issue)
Something else... (Very probably!)

BUT... at http://try.jsoup.org if I fetch the URL: http://www.jornaldamarinha.pt using this CSS Query:

*[class*=zincontent-wrap]

Everything works just great, there! (Retrieves all the ~8 correct results!)

SO... to resume, all I need is to do exactly what that webpage does, but using code.

THANKS, in advance, for any light or workaround, about this! :)

2 revs · Accepted Answer · 2013-06-12 20:19:34Z

3

SOLUTION!... After all, everything in the above code, was working correctly, as I suspected, except... That CSS Query breaks on Android´s "default user agent". I just figured that setting "userAgent" to Jsoup´s connection method is VERY important! So, I´ve edited my code on the following way, and... Works like a charm now !! (Exactly with same results, as in http://try.jsoup.org webpage)

Document doc = Jsoup.connect("http://www.jornaldamarinha.pt")
                    .userAgent("Mozilla/5.0 Gecko/20100101 Firefox/21.0")
                    .timeout(0)
                    .get();

Hope this helps anyone else too! :)

edited Jun 12, 2013 at 20:19

community wiki

2 revs
Jorge Rosa

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Jsoup - CSS Query selector issue (?)

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related