Handy HTTP requests with Curb and Ruby

While working on one of the projects, i tried to find multi-purpose HTTP request class that can use different network interfaces/ip addresses with retry option (if connection slow or server not responding for some reason). Check out  a small class wrapper build on top of Ruby Curb, implemented as a module.

module ApiRequest
  USER_AGENTS = [
    'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3',
    'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727)',
    'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.3) Gecko/20100423 Ubuntu/10.04 (lucid) Firefox/3.6.3',
    'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_3; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.70 Safari/533.4',
    'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.2) Gecko/20100323 Namoroka/3.6.2',
    'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100401 Ubuntu/9.10 (karmic) Firefox/3.5.9'
  ]

  CONNECTION_TIMEOUT = 10

  @@interfaces = []

  # get random user-agent string for usage
  def random_agent
    USER_AGENTS[rand(USER_AGENTS.size-1)]
  end

  # get random IP/network interface specified in @@interfaces
  def random_interface
    size = @@interfaces.size
    size > 0 ? @@interfaces[rand(size-1)] : nil
  end

  # perform request, assign_to - specify network interface/ip
  def perform(url, assign_to=nil)
    puts url
    interface = assign_to.nil? ? self.random_interface : assign_to
    req = Curl::Easy.new(url)
    req.timeout = CONNECTION_TIMEOUT
    req.interface = interface unless interface.nil?
    req.headers['User-Agent'] = self.random_agent
    begin
      req.perform
      if req.response_code == 200
        return req.downloaded_bytes > 0 ? req.body_str : nil
      else
        nil
      end
    rescue Exception
      return nil
    end
  end

  # perform request by number of attempts
  def fetch(url, attempts=3)
    result = nil
    1.upto(attempts) do |a|
      result = self.perform(url)
      break unless result.nil?
    end
    return result
  end
end

And sample usage:

class TestRequest
  include ApiRequest

  def foo
     body = self.fetch('http://google.com')
  end
end

If module variable “@@interfaces” is array of ip addresses or network interfaces then one of them (randomly selected) will be used to perform request. Also, function “fetch” has parameter “attempts” which set to 3 by default. It means that operation will be invoked n times until result is downloaded from url. Otherwise – it returns nil. Function perform has a parameter “assign_to” (which it not used in “fetch” function) that allows to bind request to specified interface. It is useful if you have situation when you might use different workers that bound to exact interface or just one that uses random ip`s. Also, class ApiRequest has a list of user agents which it uses randomly for each performed request. Pastie: http://pastie.org/private/j19j3hbebte9bjqaydslmg

Filed in: Web Development