URLConnection and Cookies (googled already but still can not solve)?

Discussion in 'Java' started by Kaidi, Jan 16, 2004.

  1. Kaidi

    Kaidi Guest

    Hi,
    (I did a google on this topic but still can not solve my problem. :-(

    My problem basically is:
    I am programming a crawler in Java and some sits are using cookies. As
    Java does not handle cookies automatically, I find I can not access
    some pages.
    I read some articles such as from:
    http://martin.nobilitas.com/java/cookies.html
    http://www.informit.com/isapi/product_id~{1DF8B22B-055F-48DB-BD36-20B8017E9956}/content/index.asp
    Basically I can see that we need to do is to get the set-cookie
    header,
    then write it back next time when needed.

    However, when I did my test on bestbuy's home page, it seems not
    working well.
    Some pages seems do not ask for store cookies, but without cookie,
    they can
    not be accessed. One example is:
    http://www.bestbuy.com/site/olspage.jsp?j=1&id=cat12074&type=page&categoryRep=cat02000

    When I try to crawl this page using my Java program, it only returns a
    page
    saying that my brower does not support cookis. :-(
    (Using IE can access it properly. In IE's option, I deleted the
    cookies
    before trying the above page, still works.)

    Any one has any idea of this? Thanks a lot.
    PS: the code I am using is from end of this page:
    http://www.hccp.org/java-net-cookie-how-to.html
    http://www.hccp.org/cvs/org/hccp/net/CookieManager.java
    In the above code, I add a print line in storeCookies so that I can
    see all the header:
    .........
    for (int i=1; (headerName = conn.getHeaderFieldKey(i)) !=
    null; i++) {
    System.out.println("In storeCookies,
    "+headerName+"-->"+conn.getHeaderField(i));
    .........
    The headers I can see only have:

    In storeCookies, Server-->Apache
    In storeCookies, Last-Modified-->Mon, 24 Nov 2003 15:19:52 GMT
    In storeCookies, ETag-->"b0da7d-14ee-3fc22198"
    In storeCookies, Accept-Ranges-->bytes
    In storeCookies, Content-Length-->5358
    In storeCookies, Content-Type-->text/html
    In storeCookies, Date-->Fri, 16 Jan 2004 09:37:10 GMT
    In storeCookies, Connection-->keep-alive
    {bestbuy.com={}}

    So, since it does not have set cookies, why my Java program can not
    crawl it?

    For page crawling, I am using this code:
    --------------
    try {
    // try opening the URL
    URL url = new URL(url_string);
    URLConnection urlConnection = url.openConnection();
    urlConnection.setAllowUserInteraction(false);
    InputStream urlStream = url.openStream();
    // search the input stream for links
    // first, read in the entire URL
    byte b[] = new byte[1000];
    int numRead = urlStream.read(b);
    String content;
    if (numRead > 0)
    content = new String(b, 0, numRead);
    else
    content = new String("");
    // String content = new String(b, 0, numRead);
    while ((numRead != -1) && (content.length() < MAXSIZE)) {
    numRead = urlStream.read(b);
    if (numRead != -1) {
    String newContent = new String(b, 0, numRead);
    content += newContent;
    }
    }
    return content;
    --------------
     
    Kaidi, Jan 16, 2004
    #1
    1. Advertising

  2. "Kaidi" <> wrote in message
    news:...
    > Hi,
    > (I did a google on this topic but still can not solve my problem. :-(
    >
    > My problem basically is:
    > I am programming a crawler in Java and some sits are using cookies. As
    > Java does not handle cookies automatically, I find I can not access
    > some pages.


    You may find HttpClient helpful
    http://jakarta.apache.org/commons/httpclient/applications.html





    ----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
    http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
    ---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---
     
    William Brogden, Jan 16, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page