Trying to GET google with socket....problem

Hey You · Apr 7, 2007

Well I don't know why the socket can't connect to Google. Here is my
source code:

require 'socket'
h = TCPSocket.new('www.google.ca',80)
h.print "GET /index.html HTTP/1.0\n\n"
a = h.read
puts a

I tried changing the HTTP to 1.1 but it still doesn't work.

Michael Gorsuch · Apr 7, 2007

I just ran this code in irb, and it worked without issue.

Can you provide the specific exception or unexpected results?

Michael Gorsuch · Apr 7, 2007

Also, can you provide the platform that you are using? I was using OS X.

Ryan Davis · Apr 7, 2007

Well I don't know why the socket can't connect to Google. Here is my
source code:

require 'socket'
h = TCPSocket.new('www.google.ca',80)
h.print "GET /index.html HTTP/1.0\n\n"
a = h.read
puts a

If you just want to get google (or whatever), use:

ruby -ropen-uri -e 'puts URI.parse("http://www.google.com/
index.html").read'

If you want to know the inner-workings of HTTP clients and servers,
use the above and trace it backwards. There is a lot of good code in
there.

Hey You · Apr 7, 2007

Michael said:
I just ran this code in irb, and it worked without issue.

Can you provide the specific exception or unexpected results?

Well I just ran the code and got this:

HTTP/1.0 302 Found

Location: http://www.google.ca/index.html

Cache-Control: private

Set-Cookie:
PREF=ID=e20f9edec5958042:TM=1175979001:LM=1175979001:S=shwmC1m6Amdg20nV;
expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com

Content-Type: text/html

Server: GWS/2.1

Content-Length: 228

Date: Sat, 07 Apr 2007 20:50:01 GMT

Connection: Keep-Alive

<HTML><HEAD><meta http-equiv="content-type"
content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="http://www.google.ca/index.html">here</A>.

</BODY></HTML>

Also I would like to stick to using sockets instead of other HTTP
clients

.

Hey You · Apr 7, 2007

Michael said:
Also, can you provide the platform that you are using? I was using OS
X.

Well I don't know what you meant right there but I'm using Windows XP.

Michael Gorsuch · Apr 7, 2007

OK, so you are getting a response back from the server.

I have no idea why you're getting a redirect from them, but you are getting a proper response over your socket.

Hey You · Apr 7, 2007

Michael said:
OK, so you are getting a response back from the server.

I have no idea why you're getting a redirect from them, but you are
getting a proper response over your socket.

Well thank you for the answer

. The thing is that it's weird that even
when I put the host as google.ca it still redirects me to google.ca.
Well thank you to everyone that has helped me and I appreciate it but I
am wondering something else now: Why when I put HTTP/1.1 the program
loads but it just stays blank, not doing anything.

Philipp Taprogge · Apr 7, 2007

Hi!

The answers to both of your questions is simple...

Thus spake Hey You on 04/07/2007 11:51 PM:

Well thank you for the answer . The thing is that it's weird that even
when I put the host as google.ca it still redirects me to google.ca.

That's because google redirects you to your localized version of
google and you did not specify the hostname in your get. You open a
socket to www.google.ca, but you only tell it to deliver some
"index.html". If that machine hosted multiple domains (which in fact
it does), it would not know whether to send you
www.google.ca/index.html or perhaps www.google.de/index.html.
So it informs you that it has an "/index.html" for you which it
figures might best suit your needs and that this page can be found
by issuing the following HTTP command:

GET www.google.ca/index.html HTTP/1.0\n\n

Well thank you to everyone that has helped me and I appreciate it but I
am wondering something else now: Why when I put HTTP/1.1 the program
loads but it just stays blank, not doing anything.

The answer to that question is even simpler:
In HTTP/1.0, you open a socket, issue a request, get a response and
close the socket again for each and every single item you need. You
open a socket for the html-page itself, another one to request an
image specified in that page and so on. So after each request, the
socket is closed by the server.

When you specify HTTP/1.1, you have another option: pipelining. When
you request a resource via HTTP/1.1, a compliant server MAY keep the
socket open for you after it's response so that you might specify
another request without having to open a whole new socket. If the
server does this, it is the client's responsibility to close the
socket when it does not require any more data.
Try it: open up a telnet connection to www.google.ca and issue your
request as HTTP/1.0. The socket will close immediately after the
response from the server.
Now do the same thing again but specify HTTP/1.1. This time the
socket stays open and your can issue another request (or the same
request again to keep things simple.

For further information I suggest you read rfc1945 and rfc2616
respectively.

HTH, HAND,

Phil

Hey You · Apr 7, 2007

Philipp said:
Hi!

The answers to both of your questions is simple...

Thus spake Hey You on 04/07/2007 11:51 PM:

That's because google redirects you to your localized version of
google and you did not specify the hostname in your get. You open a
socket to www.google.ca, but you only tell it to deliver some
"index.html". If that machine hosted multiple domains (which in fact
it does), it would not know whether to send you
www.google.ca/index.html or perhaps www.google.de/index.html.
So it informs you that it has an "/index.html" for you which it
figures might best suit your needs and that this page can be found
by issuing the following HTTP command:

GET www.google.ca/index.html HTTP/1.0\n\n

The answer to that question is even simpler:
In HTTP/1.0, you open a socket, issue a request, get a response and
close the socket again for each and every single item you need. You
open a socket for the html-page itself, another one to request an
image specified in that page and so on. So after each request, the
socket is closed by the server.

When you specify HTTP/1.1, you have another option: pipelining. When
you request a resource via HTTP/1.1, a compliant server MAY keep the
socket open for you after it's response so that you might specify
another request without having to open a whole new socket. If the
server does this, it is the client's responsibility to close the
socket when it does not require any more data.
Try it: open up a telnet connection to www.google.ca and issue your
request as HTTP/1.0. The socket will close immediately after the
response from the server.
Now do the same thing again but specify HTTP/1.1. This time the
socket stays open and your can issue another request (or the same
request again to keep things simple.

For further information I suggest you read rfc1945 and rfc2616
respectively.

HTH, HAND,

Phil

Thank you a lot Phil! I have learned a lot from you like how to POST
data (Yup, I learned) and much more and I am very grateful for all the
help you have given me. It makes sense why it didn't connect to
google.ca and I learned how to fix it right after my last post but I had
to go offline. I have also read RFC2616 but only bits and pieces of what
I have read are stuck in my head so I will keep re-reading it to learn
more. I will also read RFC1945 and I'm sorry for my newbish posts. It's
not that I'm lazy because I really am a hard worker but it's just that I
needed someone to point me to the right direction and that is what you
did

.

Brian Candler · Apr 8, 2007

Well I don't know why the socket can't connect to Google. Here is my
source code:

require 'socket'
h = TCPSocket.new('www.google.ca',80)
h.print "GET /index.html HTTP/1.0\n\n"
a = h.read
puts a

I tried changing the HTTP to 1.1 but it still doesn't work.

Two problems:
(1) Line terminator for HTTP is \r\n not \n
(2) You have not supplied a Host: header

h.print "GET /index.html HTTP/1.0\r\nHost: www.google.ca\r\n\r\n"

I say again: you must read and understand RFC 2616.

This documents HTTP/1.1, which has gained a lot of features. You could try
reading the earlier RFCs for HTTP/1.0 or HTTP/0.9 for a simplified protocol.

B.

Hey You · Apr 8, 2007

Brian said:
Two problems:
(1) Line terminator for HTTP is \r\n not \n
(2) You have not supplied a Host: header

h.print "GET /index.html HTTP/1.0\r\nHost: www.google.ca\r\n\r\n"

I say again: you must read and understand RFC 2616.

This documents HTTP/1.1, which has gained a lot of features. You could
try
reading the earlier RFCs for HTTP/1.0 or HTTP/0.9 for a simplified
protocol.

B.

Have you read what I last posted? Or did you just ignore it and gave me
the answer to a already answered question? Yes I have read RFC2616 more
than once and I do understand a lot of it but not all stays on my head
in the few times I read the document. I don't know but I have read in a
lot of places that for a line terminator you can also use "\n\n" and it
seems to work fine. Also putting the Host header or adding the full
domain to the code such as "GET www.google.ca/index.html" both specifies
which host we want so I don't see why change them.

Karl-Heinz Wild · Apr 8, 2007

Two problems:
(1) Line terminator for HTTP is \r\n not \n
(2) You have not supplied a Host: header

This means too send something like

GET /index.html HTTP/1.1\r\n
Host: www.google.ca\r\n
\r\n
\r\n

Regards
Karl-Heinz

Brian Candler · Apr 8, 2007

Have you read what I last posted? Or did you just ignore it and gave me
the answer to a already answered question? Yes I have read RFC2616 more
than once and I do understand a lot of it but not all stays on my head
in the few times I read the document. I don't know but I have read in a
lot of places that for a line terminator you can also use "\n\n" and it
seems to work fine.

Read RFC 2616 section 2.2:

" HTTP/1.1 defines the sequence CR LF as the end-of-line marker for all
protocol elements except the entity-body (see appendix 19.3 for
tolerant applications)."

and appendix 19.3 says:

" The line terminator for message-header fields is the sequence CRLF.
However, we recommend that applications, when parsing such headers,
recognize a single LF as a line terminator and ignore the leading CR."

So the upshot is: you're sending a malformed request, but some servers may
honour it.

Also putting the Host header or adding the full
domain to the code such as "GET www.google.ca/index.html" both specifies
which host we want so I don't see why change them.

No, "GET www.google.ca/index.html" is a completely malformed request and
will be rejected. In any case this is different to the GET request you
actually sent, quoted at the very top of this posting.

The hostname is *never* supplied as part of the GET line.

Of course you supplied it to Ruby's TCPSocket.new method, but at that point
the hostname is converted to an IP address before the connection is opened.
The name is not passed to the far end and therefore you must provide a Host:
header.

I'm sorry, but I'm dropping out of this conversation now. Your response was
arrogant. If you know nothing about HTTP, then I suggest you don't go around
telling people who know something about HTTP that they are wrong.

Regards,

Brian.

Xavier Noria · Apr 8, 2007

Have you read what I last posted? Or did you just ignore it and
gave me
the answer to a already answered question? Yes I have read RFC2616
more
than once and I do understand a lot of it but not all stays on my head
in the few times I read the document. I don't know but I have read
in a
lot of places that for a line terminator you can also use "\n\n"
and it
seems to work fine.

Perhaps you read that in a CGI context?

"The server MUST translate the header data from the CGI header field
syntax to the HTTP header field syntax if these differ. For example,
the character sequence for newline (such as Unix's ASCII NL) used by
CGI scripts may not be the same as that used by HTTP (ASCII CR
followed by LF)."

That's what allows CGIs to ouput things like

print "Content-Type: text/plain\n\n"

and forget about CRLFs.

-- fxn

Zephyr Pellerin · Apr 8, 2007

Brian said:
Two problems:
(1) Line terminator for HTTP is \r\n not \n
(2) You have not supplied a Host: header

h.print "GET /index.html HTTP/1.0\r\nHost: www.google.ca\r\n\r\n"

I say again: you must read and understand RFC 2616.

This documents HTTP/1.1, which has gained a lot of features. You could try
reading the earlier RFCs for HTTP/1.0 or HTTP/0.9 for a simplified protocol.

B.

That would be the issue.

Gary Wright · Apr 8, 2007

Also putting the Host header or adding the full
domain to the code such as "GET www.google.ca/index.html" both
specifies
which host we want so I don't see why change them.

The URI provided in the GET request can be an absolute URI only if
the request is going to a proxy server. In *that* case the GET would
look like:

GET http://proxy.domain.com/index.html

Otherwise the URI must be an absolute path (i.e., a path starting
with '/').
In that case the GET would look like:

GET /index.html

The problem with only having the path is that a web server that is
hosting
several websites can't determine from the GET request which site the
request pertains to. The incoming TCP connections only have a
destination
IP address, not a destination domain name. The solution to this problem
is the "Host:" header. By looking at the "Host:" header, the web server
can multiplex several websites at the same IP address. Without the
Host:
header you would have to have a separate IP address for every website.

So your request should be sent as:

GET /index.html HTTP/1.0
Host: www.google.ca

Trying to use clangd with VSCodium, CMake_World_COMPILER not set	1	Nov 4, 2024
Boomer trying to learn coding in C and C++	6	Dec 16, 2022
How come this doesn't work?	13	Apr 11, 2007
GET NEIL DEGRASSES TYSON, I ripped a hole with this one...	0	Nov 10, 2022
from 'socket' to 'eventmachine' http client.	0	Jan 24, 2011
Problem with bind for Socket class	5	Oct 14, 2008
Python client/server that reads HTML body from server	1	Apr 11, 2023
trying to require nokogiri	2	Mar 26, 2011

Trying to GET google with socket....problem

Hey You

Michael Gorsuch

Michael Gorsuch

Ryan Davis

Hey You

Hey You

Michael Gorsuch

Hey You

Philipp Taprogge

Hey You

Brian Candler

Hey You

Karl-Heinz Wild

Brian Candler

Xavier Noria

Zephyr Pellerin

Gary Wright

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads