movie rental service

Using Ruby to Scrape a Web Page

August 6, 2007 – 6:50 am

I recently needed to grab a web page and check if a certain string existed on that page. Here is how to achieve that task simply. You can easily take this example and incorporate it into your Ruby on Rails application.

require 'open-uri'
string = "string to find"
url = "http://www.ThemBid.com/"
result = open(url)
text = result.read
text.scan(/#{string}/)

The last line will print out all instances of the string you are looking for. You can store that in a variable and use the length operator to determine how many times the string was found, like so:

found = text.scan(/#{string}/)
num_found = found.length

The text variable will include all the html from the page you called. You can use any of the Ruby string functions on that variable. Enjoy!

  1. 5 Responses to “Using Ruby to Scrape a Web Page”

  2. That works for simple scraping, but for more complicated scraping, we have scRUBYt! (http://scrubyt.org/), too.

    By Shadowfiend on Aug 6, 2007

  3. This is nice and easy. Perl uses LWP::UserAgent, generally to do the same - but I do think that when you want to get into the text parsing a bit more, perl generally is in a world of it’s own for simplicity.

    By ed on Aug 6, 2007

  4. Good solution for that simple case. For better html parsing, take a look at Mechanize.

    By Dan Manges on Aug 6, 2007

  5. Have you seen hpricot? http://code.whytheluckystiff.net/hpricot/
    It calls itself “A Fast, Enjoyable HTML Parser for Ruby” and that’s true!

    By Dave on Aug 7, 2007

  6. Checkout my article that discusses authentication strategies too.

    http://cocoalocker.blogspot.com/2007/01/java-ruby-http-clients.html

    By sundog on Aug 7, 2007

Post a Comment