Browser versus Java URLConnection

L

little_mm

Hi All

Perhaps someone knows the answer to this problem. I open a connection
to a URL and read lines one at a time from the URL using a
InputStreamReader and a BufferedReader:

// Open connection to URL
URLConnection conn =
(URLConnection)pageURL.openConnection();
conn.setReadTimeout(timeout);
conn.setConnectTimeout(timeout);
conn.setUseCaches(false);
InputStream pageStream = conn.getInputStream();
BufferedReader reader = new BufferedReader(new
InputStreamReader(pageStream));

String line;
StringBuffer pageBuffer = new StringBuffer();
while ((line = reader.readLine()) != null)
{
System.out.println(line);
pageBuffer.append(line);
}
return pageBuffer.toString();


However, the actual text I get back from the URL is different from that
saved out of a browser from the same URL. Particularly, the browser
saves £ characters, whereas the lines read in Java are missing
these characters altogether. Also, some of the characters have actually
been deleted in the Java lines. I have tried using different character
encodings in the second argument of the InputStreamReader, this has
virtually no effect, except using UTF-16 which returns a large number
of "?" characters in the stream. The content type header of the page
says it is ISO-8859-1, but this character encoding string with the
InputStreamReader changes nothing in the Java code: the £ symbol is
still missing.

In the browser, if I change the character encoding to "UTF-8" then the
£ symbol is still properly displayed in the browser. In other words,
it looks like I am receiving different data from the server depending
upon whether I use the browser or the code. I'm not sure if it has
anything to do with the encoding, but I'm just guessing.

Thanks,
Nubs.
 
A

Andrew Thompson

Perhaps someone knows the answer to this problem. I open a connection
to a URL ...

What URL (specifically)?
...However, the actual text I get back from the URL is different from that
saved out of a browser ...

What browser (make, version, OS - specifically)?

Is the saved text identical to the text shown when
you 'view source' in the 'a browser'?

Andrew T.
 
C

Chris Uppal

Perhaps someone knows the answer to this problem. I open a connection
to a URL and read lines one at a time from the URL using a
InputStreamReader and a BufferedReader: [...]
However, the actual text I get back from the URL is different from that
saved out of a browser from the same URL. Particularly, the browser
saves £ characters, whereas the lines read in Java are missing
these characters altogether. Also, some of the characters have actually
been deleted in the Java lines.

Maybe the website is using something like the Accept-Language: field in the
request to decide what currency (etc) to send back. I don't know what the Java
HTTP client will send in that field by default, but it is unlikely to be
'en-GB' which is what my browser would send.

I just tried it myself, but -- most unfortunately -- the site has just stopped
responding. I /do/ hope my little experiment didn't kill it...

-- chris
 
L

little_mm

Chris said:
Perhaps someone knows the answer to this problem. I open a connection
to a URL and read lines one at a time from the URL using a
InputStreamReader and a BufferedReader: [...]
However, the actual text I get back from the URL is different from that
saved out of a browser from the same URL. Particularly, the browser
saves £ characters, whereas the lines read in Java are missing
these characters altogether. Also, some of the characters have actually
been deleted in the Java lines.

Maybe the website is using something like the Accept-Language: field in the
request to decide what currency (etc) to send back. I don't know what the Java
HTTP client will send in that field by default, but it is unlikely to be
'en-GB' which is what my browser would send.

I just tried it myself, but -- most unfortunately -- the site has just stopped
responding. I /do/ hope my little experiment didn't kill it...

-- chris

Hi Chris - thanks for the response. So, question: how do you mimic the
browser's HTTP requests precisely, so that a website generally behaves
in the same way? For example, how do you change the Accept-Language
field?

Thanks,
Nubs.
 
T

Tor Iver Wilhelmsen

Hi Chris - thanks for the response. So, question: how do you mimic the
browser's HTTP requests precisely, so that a website generally behaves
in the same way? For example, how do you change the Accept-Language
field?

Look at URLConnection.setRequestProperty().
 
C

Chris Uppal

Hi Chris - thanks for the response. So, question: how do you mimic the
browser's HTTP requests precisely, so that a website generally behaves
in the same way?

I see that Tor has already answered. I want to add that their server is back
up this morning, and I've just tried again (it stayed up this time !). The bad
news is that changing the Accept-Language field to, say, "da" made no
difference -- it still sent back a page where the price of the first boot was
&pound; <some jaw-droppingly large number>. So that was a red-herring, I'm
afraid.

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,049
Latest member
Allen00Reed

Latest Threads

Top