Using Ruby to Scrape a Web Page
August 6, 2007 – 6:50 amI recently needed to grab a web page and check if a certain string existed on that page. Here is how to achieve that task simply. You can easily take this example and incorporate it into your Ruby on Rails application.
require 'open-uri'
string = "string to find"
url = "http://www.ThemBid.com/"
result = open(url)
text = result.read
text.scan(/#{string}/)
The last line will print out all instances of the string you are looking for. You can store that in a variable and use the length operator to determine how many times the string was found, like so:
found = text.scan(/#{string}/)
num_found = found.length
The text variable will include all the html from the page you called. You can use any of the Ruby string functions on that variable. Enjoy!
5 Responses to “Using Ruby to Scrape a Web Page”
That works for simple scraping, but for more complicated scraping, we have scRUBYt! (http://scrubyt.org/), too.
By Shadowfiend on Aug 6, 2007
This is nice and easy. Perl uses LWP::UserAgent, generally to do the same - but I do think that when you want to get into the text parsing a bit more, perl generally is in a world of it’s own for simplicity.
By ed on Aug 6, 2007
Good solution for that simple case. For better html parsing, take a look at Mechanize.
By Dan Manges on Aug 6, 2007
Have you seen hpricot? http://code.whytheluckystiff.net/hpricot/
It calls itself “A Fast, Enjoyable HTML Parser for Ruby” and that’s true!
By Dave on Aug 7, 2007
Checkout my article that discusses authentication strategies too.
http://cocoalocker.blogspot.com/2007/01/java-ruby-http-clients.html
By sundog on Aug 7, 2007