Browser versus Java URLConnection

Discussion in 'Java' started by little_mm@ntlworld.com, Oct 4, 2006.

  1. Guest

    Hi All

    Perhaps someone knows the answer to this problem. I open a connection
    to a URL and read lines one at a time from the URL using a
    InputStreamReader and a BufferedReader:

    // Open connection to URL
    URLConnection conn =
    (URLConnection)pageURL.openConnection();
    conn.setReadTimeout(timeout);
    conn.setConnectTimeout(timeout);
    conn.setUseCaches(false);
    InputStream pageStream = conn.getInputStream();
    BufferedReader reader = new BufferedReader(new
    InputStreamReader(pageStream));

    String line;
    StringBuffer pageBuffer = new StringBuffer();
    while ((line = reader.readLine()) != null)
    {
    System.out.println(line);
    pageBuffer.append(line);
    }
    return pageBuffer.toString();


    However, the actual text I get back from the URL is different from that
    saved out of a browser from the same URL. Particularly, the browser
    saves £ characters, whereas the lines read in Java are missing
    these characters altogether. Also, some of the characters have actually
    been deleted in the Java lines. I have tried using different character
    encodings in the second argument of the InputStreamReader, this has
    virtually no effect, except using UTF-16 which returns a large number
    of "?" characters in the stream. The content type header of the page
    says it is ISO-8859-1, but this character encoding string with the
    InputStreamReader changes nothing in the Java code: the £ symbol is
    still missing.

    In the browser, if I change the character encoding to "UTF-8" then the
    £ symbol is still properly displayed in the browser. In other words,
    it looks like I am receiving different data from the server depending
    upon whether I use the browser or the code. I'm not sure if it has
    anything to do with the encoding, but I'm just guessing.

    Thanks,
    Nubs.
     
    , Oct 4, 2006
    #1
    1. Advertising

  2. wrote:
    ....
    > Perhaps someone knows the answer to this problem. I open a connection
    > to a URL ...


    What URL (specifically)?

    > ...However, the actual text I get back from the URL is different from that
    > saved out of a browser ...


    What browser (make, version, OS - specifically)?

    Is the saved text identical to the text shown when
    you 'view source' in the 'a browser'?

    Andrew T.
     
    Andrew Thompson, Oct 4, 2006
    #2
    1. Advertising

  3. Guest

    Thanks for the response Andrew.

    URL: http://www.net-a-porter.com/Shop/Shop/Shoes/All?pageNumber=0

    Browser: Mozilla Firefox, but same effect in IE6, OS: Windows XP.

    Yes, I think view source and save page are identical, although I
    haven't checked byte-for-byte.

    Nubs.

    Andrew Thompson wrote:

    > wrote:
    > ...
    > > Perhaps someone knows the answer to this problem. I open a connection
    > > to a URL ...

    >
    > What URL (specifically)?
    >
    > > ...However, the actual text I get back from the URL is different from that
    > > saved out of a browser ...

    >
    > What browser (make, version, OS - specifically)?
    >
    > Is the saved text identical to the text shown when
    > you 'view source' in the 'a browser'?
    >
    > Andrew T.
     
    , Oct 4, 2006
    #3
  4. Chris Uppal Guest

    wrote:

    > Perhaps someone knows the answer to this problem. I open a connection
    > to a URL and read lines one at a time from the URL using a
    > InputStreamReader and a BufferedReader:

    [...]
    > However, the actual text I get back from the URL is different from that
    > saved out of a browser from the same URL. Particularly, the browser
    > saves £ characters, whereas the lines read in Java are missing
    > these characters altogether. Also, some of the characters have actually
    > been deleted in the Java lines.


    Maybe the website is using something like the Accept-Language: field in the
    request to decide what currency (etc) to send back. I don't know what the Java
    HTTP client will send in that field by default, but it is unlikely to be
    'en-GB' which is what my browser would send.

    I just tried it myself, but -- most unfortunately -- the site has just stopped
    responding. I /do/ hope my little experiment didn't kill it...

    -- chris
     
    Chris Uppal, Oct 4, 2006
    #4
  5. Guest

    Chris Uppal wrote:

    > > Perhaps someone knows the answer to this problem. I open a connection
    > > to a URL and read lines one at a time from the URL using a
    > > InputStreamReader and a BufferedReader:

    > [...]
    > > However, the actual text I get back from the URL is different from that
    > > saved out of a browser from the same URL. Particularly, the browser
    > > saves £ characters, whereas the lines read in Java are missing
    > > these characters altogether. Also, some of the characters have actually
    > > been deleted in the Java lines.

    >
    > Maybe the website is using something like the Accept-Language: field in the
    > request to decide what currency (etc) to send back. I don't know what the Java
    > HTTP client will send in that field by default, but it is unlikely to be
    > 'en-GB' which is what my browser would send.
    >
    > I just tried it myself, but -- most unfortunately -- the site has just stopped
    > responding. I /do/ hope my little experiment didn't kill it...
    >
    > -- chris


    Hi Chris - thanks for the response. So, question: how do you mimic the
    browser's HTTP requests precisely, so that a website generally behaves
    in the same way? For example, how do you change the Accept-Language
    field?

    Thanks,
    Nubs.
     
    , Oct 4, 2006
    #5
  6. writes:

    > Hi Chris - thanks for the response. So, question: how do you mimic the
    > browser's HTTP requests precisely, so that a website generally behaves
    > in the same way? For example, how do you change the Accept-Language
    > field?


    Look at URLConnection.setRequestProperty().
     
    Tor Iver Wilhelmsen, Oct 4, 2006
    #6
  7. Guest

    Tor Iver Wilhelmsen wrote:

    > > Hi Chris - thanks for the response. So, question: how do you mimic the
    > > browser's HTTP requests precisely, so that a website generally behaves
    > > in the same way? For example, how do you change the Accept-Language
    > > field?

    >
    > Look at URLConnection.setRequestProperty().


    OK, many thanks Iver.
     
    , Oct 4, 2006
    #7
  8. Chris Uppal Guest

    wrote:

    > Hi Chris - thanks for the response. So, question: how do you mimic the
    > browser's HTTP requests precisely, so that a website generally behaves
    > in the same way?


    I see that Tor has already answered. I want to add that their server is back
    up this morning, and I've just tried again (it stayed up this time !). The bad
    news is that changing the Accept-Language field to, say, "da" made no
    difference -- it still sent back a page where the price of the first boot was
    &pound; <some jaw-droppingly large number>. So that was a red-herring, I'm
    afraid.

    -- chris
     
    Chris Uppal, Oct 5, 2006
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Matthew Louden
    Replies:
    1
    Views:
    6,946
    Scott M.
    Oct 11, 2003
  2. Russ

    script versus code versus ?

    Russ, Jun 10, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    2,507
  3. Replies:
    2
    Views:
    3,608
  4. Christoffer Sawicki
    Replies:
    5
    Views:
    263
    Christoffer Sawicki
    Sep 2, 2006
  5. Paul Butcher
    Replies:
    12
    Views:
    730
    Gary Wright
    Nov 28, 2007
Loading...

Share This Page