1

I've successfully used ruby (1.8) and nokogiri's css parsing to pull out front facing data from web pages.

However I now need to pull out some data from a series of pages where the data is in the "meta" tags in the source code of the page.

One of the lines I need is the following:

<meta name="geo.position" content="35.667459;139.706256" />

I've tried using xpath put haven't been able to get it right.

Any help as to what syntax is needed would be much appreciated.

Thanks

3
  • 4
    You say "I've tried using xpath put haven't been able to get it right." Show us what you have tried so that we can help you do it right. Commented Oct 27, 2010 at 4:58
  • Thx Andy - just various ways of saying '//meta[blah]' etc. I just couldn't get the syntax correct to pull it out. I really wanted to do it with the css selector and now I know how. Commented Oct 27, 2010 at 5:32
  • Use github.com/BorisBresciani/rails_parse_head Commented Dec 18, 2019 at 10:23

2 Answers 2

2

This is a good case for a CSS attribute selector. For example:

doc.css('meta[name="geo.position"]').each do |meta_tag|
  puts meta_tag['content'] # => 35.667459;139.706256
end

The equivalent XPath expression is almost identical:

doc.xpath('//meta[@name = "geo.position"]').each do |meta_tag|
  puts meta_tag['content'] # => 35.667459;139.706256
end
Sign up to request clarification or add additional context in comments.

3 Comments

Wow thanks I had no idea you could use the css selector for meta tags too. If I wanted to get the lat/long from the js does the same apply? <script type="text/javascript"> //<![CDATA[ function onLoad() { if (GBrowserIsCompatible()) { var map = new GMap2(document.getElementById("map")); map.addControl(new GSmallMapControl()); var point1 = new GLatLng(35.667459, 139.706256); map.setCenter(point1, 15, G_NORMAL_MAP); var marker = new GMarker(point1,{clickable:false}); map.addOverlay(marker); } } //]]> </script>
No, Nokogiri doesn't do Javascript. You could extract the Javascript from the HTML using Nokogiri, then use a regex to get the lat/long. ` doc.at('script').content[/GLatLng\(([^)]+)\)/,1] # => "35.667459, 139.706256"` for instance.
Aha okay awesome thanks very much for your help - that has really made things much clearer.
1
require 'nokogiri'

doc = Nokogiri::HTML('<meta name="geo.position" content="35.667459;139.706256" />')
doc.at('//meta[@name="geo.position"]')['content'] # => "35.667459;139.706256"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.