Handy HTTP requests with Curb and Ruby
June 13th, 2010 in Web Development, Code
While working on one of the projects, i tried to find multi-purpose HTTP request class that can use different network interfaces/ip addresses with retry option (if connection slow or server not responding for some reason). Check out a small class wrapper build on top of Ruby Curb, implemented as a module.
module ApiRequest
USER_AGENTS = [
'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3',
'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727)',
'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.3) Gecko/20100423 Ubuntu/10.04 (lucid) Firefox/3.6.3',
'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_3; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.70 Safari/533.4',
'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.2) Gecko/20100323 Namoroka/3.6.2',
'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100401 Ubuntu/9.10 (karmic) Firefox/3.5.9'
]
CONNECTION_TIMEOUT = 10
@@interfaces = []
# get random user-agent string for usage
def random_agent
USER_AGENTS[rand(USER_AGENTS.size-1)]
end
# get random IP/network interface specified in @@interfaces
def random_interface
size = @@interfaces.size
size > 0 ? @@interfaces[rand(size-1)] : nil
end
# perform request, assign_to - specify network interface/ip
def perform(url, assign_to=nil)
puts url
interface = assign_to.nil? ? self.random_interface : assign_to
req = Curl::Easy.new(url)
req.timeout = CONNECTION_TIMEOUT
req.interface = interface unless interface.nil?
req.headers['User-Agent'] = self.random_agent
begin
req.perform
if req.response_code == 200
return req.downloaded_bytes > 0 ? req.body_str : nil
else
nil
end
rescue Exception
return nil
end
end
# perform request by number of attempts
def fetch(url, attempts=3)
result = nil
1.upto(attempts) do |a|
result = self.perform(url)
break unless result.nil?
end
return result
end
end
And sample usage:
class TestRequest
include ApiRequest
def foo
body = self.fetch('http://google.com')
end
end
If module variable "@@interfaces" is array of ip addresses or network interfaces then one of them (randomly selected) will be used to perform request. Also, function "fetch" has parameter "attempts" which set to 3 by default. It means that operation will be invoked n times until result is downloaded from url. Otherwise - it returns nil. Function perform has a parameter "assign_to" (which it not used in "fetch" function) that allows to bind request to specified interface. It is useful if you have situation when you might use different workers that bound to exact interface or just one that uses random ip`s. Also, class ApiRequest has a list of user agents which it uses randomly for each performed request. Pastie: http://pastie.org/private/j19j3hbebte9bjqaydslmg
![]() |
Dan Sr. Software Engineer |
Comments (2)
Thanks. Just what I was looking for.
by Alexandre de Oliveira - 07/31/2010 @ 08:28pm
Excuse, the question is removed
by louis vuitton bag - 02/11/2011 @ 07:37pm