0

I´m with an odd issue here, I´ve been using Jsoup 1.7.2 for a while, with no issues, only now, when I try to retrieve the main headlines from this website: www.jornaldamarinha.pt, using this code:

// Connecting...
Document doc = Jsoup.connect("http://www.jornaldamarinha.pt")
                    .timeout(0)
                    .get();

// "*[class*=zincontent-wrap]" in "Jsoup idiom", means:  
// Select all tags that contains classes with "zincontent-wrap" on its name.
Elements elems = doc.select("*[class*=zincontent-wrap]"); // Retrieves 0 results!

int t = elems.size();
Log.w("INFO", "Total Headlines: " + t);

// Loop trought all retrieved headlines:
for (Element e : elems) {
   String headline = e.select("a").text().toString();
   Log.w("HEADLINE", headline);
};

It fails!... Retrieves 0 results. (Should retrieve ~8)

The chances are that the issue is caused by:

  1. Aliens... (Similar to androids, but uglier...)
  2. Website encoding. (I tried to encode incoming HTML with ISO-8859-15, to handle portuguese special characters, but the issue remains)
  3. Mal-formatted incoming HTML. (I doubt this could be the issue, since the selector works fine on "Try jsoup online webpage", and Jsoup usually handles broken HTML very well)
  4. The use of the minus symbol in the class name ("-") is messing with Jsoup. (Seems, to me, to be the main (or at least, one) cause of the issue)
  5. Something else... (Very probably!)

BUT... at http://try.jsoup.org if I fetch the URL: http://www.jornaldamarinha.pt using this CSS Query:

*[class*=zincontent-wrap]

Everything works just great, there! (Retrieves all the ~8 correct results!)


SO... to resume, all I need is to do exactly what that webpage does, but using code.

THANKS, in advance, for any light or workaround, about this! :)

0

1 Answer 1

3

SOLUTION!... After all, everything in the above code, was working correctly, as I suspected, except... That CSS Query breaks on Android´s "default user agent". I just figured that setting "userAgent" to Jsoup´s connection method is VERY important! So, I´ve edited my code on the following way, and... Works like a charm now !! (Exactly with same results, as in http://try.jsoup.org webpage)

Document doc = Jsoup.connect("http://www.jornaldamarinha.pt")
                    .userAgent("Mozilla/5.0 Gecko/20100101 Firefox/21.0")
                    .timeout(0)
                    .get();

Hope this helps anyone else too! :)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.