specifying a network interface, with a http get request

Andrew Parlane · Aug 28, 2008

Hi all, I'm fairly new to Ruby but have learnt a lot in the last month
and am enjoying the diverse features it includes.

My problem is, we have a machine with multiple IP addresses, eth0,
eth0:0 eth0:1 ... and I need to be able to select which IP it uses at
runtime.

The reason for this is to be able to paralelize web page downloads from
a site that has a max of 1 hit per seccond per IP address.

The only solution I've thought of so far, is to set up the server with
several proxies and send all requests through that.

Is there a more elegant and efficient solution in ruby? Some way to
choose eth0 / eth0:0 for each request?

Thanks in advance for any replies.

Andy

John Pritchard-williams · Aug 28, 2008

Hi Andy,

If you are using 'Mongrel' as your webserver (possibly this also works
on Webrick?) than you could at least set up your HTTP servers on a
per-NIC basis (I think..)

http://mongrel.rubyforge.org/web/mongrel/files/README.html

Has an example that appears to say 'listen on all NICs' (0.0.0.0).

So you could listen separately to each NIC with a different web-server
(and share a single web-app )...the trick is how to divert your
different users to each NIC though? I'm not sure how Ruby (or any
runtime) is going to be able to do this without some sort of
load-balancer sorting this out.

Maybe you could have one NIC dedicated as your 'incoming' request card,
but then send back different URLs (which map to the different NICS) back
to your client...but that would probably cause chaos for cookies etc...)

Dunno if that gives you any ideas or not...sorry if I'm way off track !

Cheers

John

Andrew Parlane · Aug 29, 2008

Hey John,

Thanks for your response, however I think you misunderstood my post. I'm
looking for a way to bind to a specific IP for outgoing requests. That
is on the server i want to use something like the following sudo code:

def getPage(whichIP)
ip = '';
case (whichIP)
when 0: ip = '123.456.789.001';
when 1: ip = '123.456.789.002';
when 2: ip = '123.456.789.003';
default: ip = '123.456.789.004';
end

soc = bind(ip,80); #bind to specific ip and port 80

soc.open(myUrl) do |sh|
return sh.read();
end
end

now if the url was for a page that had no content, except for the IP
address of the requester then the following code:

puts getPage(0);
puts getPage(1);
puts getPage(2);
puts getPage(3);

would output:
123.456.789.001
123.456.789.002
123.456.789.003
123.456.789.004

Hope that makes it clearer

Andy

Lex Williams · Aug 29, 2008

I'm no ruby expert , but I don't think that could be done from ruby .
Using a linux binary , let's say wget , how would you access a webpage
using a certain interface ? If we could figure that out , we could
figure out how to automate the whole process .

Andrew Parlane · Aug 29, 2008

I'm about to dig into the ruby source code and see how much socket
access is available. If we can use raw sockets then I should be able to
bind to the ip i want, otherwise I'm going to code a very simple proxy
handling program in c that does it for me.

as for using wget, you can use the --bind-address it seems to specify
which local IP to bind to. So that seems like a plausable option, using
IO.popen and stuff.

cheers.

Andy

Antonin Amand · Aug 29, 2008

As your on the client side you do not bind you connect.

More generally you should not try to modify your socket at runtime
because it is not the way it is designed to be used.
I'm not talking about how ruby implements but about underlying system
calls.

You'd rather create a pool of sockets before your main loop. And you'd
iterate over the pool during your main loop.

Regards,

Antonin

John Pritchard-williams · Aug 29, 2008

Hi Andy,

Thanks for the clarification, but I still don't quite understand what
you are trying to do here.

You said originally:

//
The reason for this is to be able to paralelize web page downloads from
a site that has a max of 1 hit per seccond per IP address.
//

And then in a follow-up post: (Which clarified that you are writing a
HTTP client of some sort I think).

//
now if the url was for a page that had no content, except for the IP
address
//

So you seem to have control over both client and server here I think?

My networking isn't that great, and I'm a Ruby-Newbie...but I think from
the client side this is a non-issue: the client just connects to an IP
(or host) directly and a port: if you know the IPs ahead of time, just
connect to them.

So I think then you are designing something like:

- A webserver which has multiple NICs , and one (perhaps) 'master' NIC.
- This webserver will return as plain-text (or XML or some data-format)
a list of these IP addresses.
- The client is able to make an initial-request to the 'master' NIC/URL
to retrieve a list of other hosts (the fact they are in fact located on
the same machine is not really relevant to the client).
- Once the client has this list of IPs it can just connect to them and
download whatever it wants.

So, (I think): so long as your webserver is listening on all NICs (or
you have multiple dedicated webservers listening on per NIC), you have a
fairly straight-forward programming task I think:

- A servlet to generate the IP address file, which sits on the server.
- Some client code to retrieve and process that IP address file and set
of some threads in parallel.

Correct ?

I guess you might want to share some sort of context between the
different parallel clients: you could use a cookie and (somesort) of
shared background context between web-servers : maybe a database, or
files in a shared directory ?

In short: I *think* there is essentially no big deal about getting a TCP
client to talk to a specific NIC - you just address with the IP address
and everything is just taken care of at the network layer.

I think you are writing an ad-hoc load-balancer here by the sound of
it...

Cheers

John

Andrew Parlane · Aug 29, 2008

Antonin,

I know what you mean, i said it was sudo code, and bind was the best i
could think of to describe what I meant. However your socket on the
client side has to bind to an IP, for it to work, if you look at raw
sockets, you are able to specify which IP address your connection goes
through.

John,

I'm trying to write a http client to download pages off a server I don't
own. The client however can only connect to the server once a seccond if
it only uses one IP, however the client is running on a machine that has
4 IP addresses, defined as eth0, eth0:0, eth0:1, eth0:2, I want the
client to send the request from a different one of those each time. The
example I gave about the server returning the IP address was just an
example to show what should be returned.

I think the answer is to use wget with the --bind-address option,
although I have yet to test this due to debugging other things atm.

If that doesn't work then running a proxy that i write in c, that uses
raw sockets is the way forward.

Thanks for all your help

Andy

John Pritchard-williams · Aug 29, 2008

Hi Andy,

If you use IP addresses (rather than a host), and those IP-addresses are
bound to single NICs, and you know these IP addresses, then I *think*
you don't have to do anything more complicated then literally specify
the IP addresses when you connect...

So, if you can do (as you say 'wget'):

telnet ip0 80
telnet ip1 80

And issue a :

GET HTTP://ip0/webapp HTTP/1.0

Then you are away.

Once complication I can think off, which is sometimes enforced on
webservers is that the 'GET' request has to correspond to the host you
have connected to.

I mean:

www.mywebsite.com, might be on IP1, IP2, so you could get to it (in a
browser) like:

http://www.mywebsite.com
-or-
http://x1.x1.x1.x1
-or-
http://x2.x2.x2.x2

Your browser will implicitly issue a 'GET' command based on the host you
provided in the URL. The server _may_ verify this is the same.

So, if you were to 'telnet' to http://www.mywebsite.com on port 80 but
then issue a GET like this:

GET http://x1.x1.x1.x1 HTTP/1.0

The server may reject you (since you connected on hostname, but tried to
GET a different 'host')

So your program may have to take this into account. (ie, you may need to
rewrite your URLs dependant on what host/IP you are talking to )

I hope that makes sense , and I hope I'm not wildly inaccurate
there...(as I say, not a n/w expert...)

John

Andrew Parlane · Aug 29, 2008

Its not running as a web request, the code for this is executed via the
terminal, and as wget has nothing to do with apache / mongrel it won't
be able to object to what IP is used reguardless.

Andy

John Pritchard-williams · Aug 29, 2008

Andrew said:
Its not running as a web request, the code for this is executed via the
terminal, and as wget has nothing to do with apache / mongrel it won't
be able to object to what IP is used reguardless.

Andy

Sorry Andy , I'm lost to what you are asking here, I thought you were
doing HTTP requests from your initial description. (Also wget is a
cmd-line HTTP client...)

//The reason for this is to be able to paralelize web page downloads
from
a site that has a max of 1 hit per seccond per IP address.//

No worries I think you have worked out what you need to do from your
earlier posts....

Cheers

John

Andrew Parlane · Aug 29, 2008

Hey John, Yeah i've worked out the solution now using wget.

Specifying HTTP/1.0 with ServerXmlHttp	1	Mar 6, 2009
Controling source IP address used on outbound HTTP GET request	2	Jan 16, 2007
HTTP 401/200 request pair on every Web Service call	0	Nov 21, 2007
Can I get a little help with my program? (string searching and regex)	0	Jan 8, 2009
How to count a very large volume of request	5	Feb 3, 2004
[ANN] Sipper 2.0.0 Released	1	Jun 24, 2009
Some notes on a high-performance Python application.	4	Mar 26, 2008
A bundle of newbie queries	7	Aug 1, 2003

specifying a network interface, with a http get request

Andrew Parlane

John Pritchard-williams

Andrew Parlane

Lex Williams

Andrew Parlane

Antonin Amand

John Pritchard-williams

Andrew Parlane

John Pritchard-williams

Andrew Parlane

John Pritchard-williams

Andrew Parlane

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads