Tag: nokogiri

Using Nokogiri's CSS method to get all elements within an alt tag

I am trying to use Nokogiri’s CSS method to get some names from my HTML. This is an example of the HTML: <section class=”container partner-customer padding-bottom–60″> <div> <div> <a id=”technologies”></a> <h4 class=”center-align”>The Team</h4> </div> </div> <div class=”consultant list-across wrap”> <div class=”engineering”> <img class=”” src=”https://v0001.jpg” alt=”Person 1″/> <p>Person 1<br>Founder, Chairman &amp; CTO</p> </div> <div class=”engineering”> <img […]

How to access multiple <p> tags one at a time

I have the following HTML: <div id=”test_id”> <p>Some words.</p> <p>Some more words.</p> <p>Even more words.</p> </div> If I parse the HTML using: doc = Nokogiri::HTML(open(“http://my_url”)) and run doc.css(‘#test_id’).text in the console I get: => “Some words.\nSome more words.\nEven more words” How do I get the first <p> element only? I think I figured it out […]

How to scrape data using Ruby which is generated by a Javascript function?

I am trying to scrape the data url link from the latest date (first row of the table) from this page. But it seems like the content of the table is generated by a Javascript function. I tried using Nokogiri to get it but in vain as nokogiri can not scrape Javascript. Then, I tried […]

“syntax error, unexpected tIDENTIFIER, expecting $end”

I put together this script based on this tutorial. require ‘nokogiri’ require ‘open-uri’ url = “http://sfbay.craigslist.org/sby/jjj/” data = Nokogiri::HTML(open(url)) puts data.at_css(‘.itempn’).text puts data.at_css(‘.itemcg’).text I keep getting this error: Macintosh:nokogiri rgrush$ ruby aaa.rb aaa.rb:1: syntax error, unexpected tIDENTIFIER, expecting $end url = “http://sf… ^ Any ideas? Could it be that one of my dependencies is out […]

Segmentation fault when I run rails S (cant compile nokogiri)

I have been in configuration hell for two days and I have tried just about everything on Stack Overflow to fix it. I feel like some of the stuff I have tried may have made things worse. I was using RVM, then I tried using rbenv, and now I am back to using RVM again. […]

Getting, visiting and limiting the number of links using Nokogiri and Mechanize?

I am trying to scrape the five latest stories from CNN.com and retrieve their links along with the first paragraph of each story. I have this simple script: url = “http://edition.cnn.com/?refresh=1” agent = Mechanize.new agent.get(“http://edition.cnn.com/?refresh=1”).search(“//div[@id=’cnn_maintt2bul’]/div/div/ul/li[count(*)=3]/a”).each do |headline| article = headline.text link = URI.join(url, headline[:href]).to_s page = headline.click(link) paragraph1 = page.at_css(“.adtag15090+ p”).text puts “#{article}” puts “#{link}” […]

How do I parse this data structure returned by Nokogiri in Ruby?

So I am cycling through an array element and this is the result returned: [nil, [#<Nokogiri::XML::Element:0x835386d4 name=”a” attributes=[#<Nokogiri::XML::Attr:0x835385f8 name=”href” value=”http://bham.craigslist.org/web/2961573018.html”>] children=[#<Nokogiri::XML::Text:0x835381c0 “Web Designer Full time”>]> What I would like to do is access href value, and then the text value. How do I do that? I tried this: puts i[:href] But that generates this error: […]

How can I get the first element's text using Nokogiri?

I am trying to get the text for Last sold date from this HTML: <td class=”browse-cell-date”> <span title=”Last sold date”> May 2002 </span> <button class=”btn btn-previous-sales js-btn-previous-sales”> Previous sales (1) <i class=”icon icon-down-open-1″/> </button> <div class=”previous-sales-panel is-hidden”> <span style=”display: block;”> Aug 1997 <span class=”fright”>£60,000</span> </span> </div> </td> I tried: date = val.search(“.//td[@class=’browse-cell-date’]”).children[1] It gave me […]

How do I loop through items in XML in Nokogiri in Ruby?

Given I have XML from this address, how would I loop through the search/events/event events? What I mean is loop through the event items and print their details to the screen? So far I have the following code: get_xml(‘/events/search’, :location => ‘London, United Kingdom’, :date => ‘Today’).at(‘events’) get_xml being the method which takes the XML […]

FF Xpather to Nokogiri — Can I just copy and paste?

I was doing this manually and then I got stuck and I can’t figure out why it’s not working. I downloaded xpather and it is giving me: /html/body/center/table/tbody/tr[3]/td/table as the path to the item I want. I have manually confirmed that this is correct but when I paste it into my code, all it does […]

Ruby is the best programming language in the world - Ruby on Rails.