2

Im tring to get the infobox from wiki pages. For this I'm using wiki api. The following is the url from which I'm getting json data.

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&format=json&titles="+first+"&rvsection=0

Where first is a variable containing the article title for Wikipedia.

I'm finding it extremely complex to parse this data to make a meaningful html out of it.

I was usuing $.each function initially. But the loop is very deep that I had to use 6-7 times to get to the actual data that I want. I think there would be better alternative than this. Please help me.

json data for reference

jQuery16209061950308827726_1334683337112({"query":{"pages":{"11039790":{"pageid":11039790,"ns":0,"title":"Animal","revisions":[{"*":"{{Redirect|Animalia}}\n{{Other uses}}\n{{pp-semi-protected|small=yes}}\n{{pp-move-indef}}\n{{Taxobox\n| color = {{taxobox color|[[animalia]]}}\n| name = Animals\n| fossil_range = [[Ediacaran]] \u2013 Recent {{fossilrange|610|0|}}\n| image = Animal diversity.png\n| image_width = 250px\n| domain = [[Eukaryota]]\n{{Taxobox_norank_entry | taxon = [[Opisthokonta]]}}\n{{Taxobox_norank_entry | taxon = [[Holozoa]]}}\n{{Taxobox_norank_entry | taxon = [[Filozoa]]}}\n| regnum = '''Animalia'''\n| regnum_authority = [[Carolus Linnaeus|Linnaeus]], [[Systema Naturae|1758]]\n| subdivision_ranks = [[Phylum|Phyla]]\n| subdivision =\n* '''Subkingdom [[Parazoa]]'''\n** [[Sponge|Porifera]]\n** [[Placozoa]]\n* '''Subkingdom [[Eumetazoa]]'''\n** '''[[Radiata]] (unranked)'''\n*** [[Ctenophora]]\n*** [[Cnidaria]]\n** '''[[Bilateria]] (unranked)'''\n*** [[Orthonectida]]\n*** [[Rhombozoa]]\n*** [[Acoelomorpha]]\n*** [[Chaetognatha]]\n*** '''Superphylum [[Deuterostomia]]'''\n**** [[Chordata]]\n**** [[Hemichordata]]\n**** [[Echinoderm]]ata\n**** [[Xenoturbellida]]\n**** [[Vetulicolia]] [[extinction|\u2020]]\n*** '''[[Protostomia]] (unranked)'''\n**** '''Superphylum [[Ecdysozoa]]'''\n***** [[Kinorhyncha]]\n***** [[Loricifera]]\n***** [[Priapulida]]\n***** [[Nematoda]]\n***** [[Nematomorpha]]\n***** [[Lobopodia]]\n***** [[Onychophora]]\n***** [[Tardigrada]]\n***** [[Arthropoda]]\n**** '''Superphylum [[Platyzoa]]'''\n***** [[Platyhelminthes]]\n***** [[Gastrotricha]]\n***** [[Rotifera]]\n***** [[Acanthocephala]]\n***** [[Gnathostomulida]]\n***** [[Micrognathozoa]]\n***** [[Cycliophora]]\n**** '''Superphylum [[Lophotrochozoa]]'''\n***** [[Sipuncula]]\n***** [[Hyolitha]] [[extinction|\u2020]]\n***** [[Nemertea]]\n***** [[Phoronida]]\n***** [[Bryozoa]]\n***** [[Entoprocta]]\n***** [[Brachiopoda]]\n***** [[Mollusca]]\n***** [[Annelida]]\n***** [[Echiura]]\n}}\n\n'''Animals''' are a major group of multicellular, [[eukaryotic]] [[organism]]s of the [[Kingdom (biology)|kingdom]] '''Animalia''' or '''Metazoa'''. Their [[body plan]] eventually becomes fixed as they [[Developmental biology|develop]], although some undergo a process of [[metamorphosis]] later on in their life. Most animals are [[Motility|motile]], meaning they can move spontaneously and independently. All animals are also [[heterotroph]]s, meaning they must ingest other organisms or their products for [[sustenance]].\n\nMost known animal [[phylum|phyla]] appeared in the fossil record as marine species during the [[Cambrian explosion]], about 542 million years ago."}]}}}})
8
  • Have you tried $.parseJSON(jsonObject)? Commented Apr 17, 2012 at 17:42
  • @SpYk3HH -- He obviously already has it parsed as an object/array if he's using it in $.each(), And, since this is a cross-domain request and jsonp datatype, it gets parsed automatically and $.parseJSON is not needed. Commented Apr 17, 2012 at 17:47
  • @Napster -- What part of the result do you want to loop through? What does first equal? Commented Apr 17, 2012 at 17:48
  • first is just a variable here. Commented Apr 17, 2012 at 17:53
  • How do you wish to display the data? As i see it, there's only one array in that result for you to loop through. Commented Apr 17, 2012 at 17:53

1 Answer 1

5

If you want the actual html as it is displayed in the wikipage, use action=parse instead. And yes, the result objects are deeply nested. But no reason to loop over them!

  • the first property is always the action, here: query
  • you have requested properties of pages, so you will receive pages
  • which are keyed by their page id. This is the only step to use a loop
  • Each page object has certain properties (like a title), you're interested in the revisions
  • this is an array of revision objects, you need the only and first
  • the sourcetext property of a revision object is the *

So, just do it:

if (data && data.query && data.query.pages)
    var pages = data.query.pages;
else
    // error: No pages returned / other problems!
for (var id in pages) { // in your case a loop over one property
    if (pages[id].revisions && pages[id].revisions[0] && pages[id].revisions[0]["*"])
        var content = pages[id].revisions[0]["*"];
    else
        // error: No revision content returned for whatever reasons!
}
// use "content" variable here

Dont forget to check for the existance of each object! If you requested no pages, there will be no pages object; this is only the case when the pages "array" is empty. A page may be missing/invalid title or something else, so that is has no revisions. etc.

Sign up to request clarification or add additional context in comments.

3 Comments

this is the first time im dealing with json data.Please can u explain in detail this line var pages = data && data.query && data.query.pages;
Thats a short form to extract properties of maybe-not-there objects. It may be better and cleaner to use the extended one (updated answer)...
thank you very much. can u help me with this also please stackoverflow.com/questions/10207480/wikimedia-api-getting-relavant-data-from-json-string

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.