Searching google in java

Discussion in 'Java' started by mfasoccer@gmail.com, May 18, 2006.

  1. Guest

    Im working on a project that involves searching with google. I have
    been getting an http 403 error with the following code:

    import java.net.*;
    import java.io.*;

    public class GoogleSearchTest
    {
    public static void main(String[] args) throws Exception{
    URL hp = new URL("http://www.google.com/search?q=babelfish");
    URLConnection hpCon = hp.openConnection();
    hpCon.connect();
    InputStream input = hpCon.getInputStream(); // error traces to here

    /*
    This code is all irrelevant to my problem because
    the inputstream is refuted
    String content = "";
    int c;
    while((c = input.read()) != -1)
    content += (char)c;
    */
    }
    }

    I know that http 403 error means that the server understood the
    request, yet refused it. As you can probably tell I have very little
    network programming experience, so maybe more experienced programmers
    could help alter my approach, or explain a better one? Thanks.
    , May 18, 2006
    #1
    1. Advertising

  2. wrote:
    > Im working on a project that involves searching with google. I have
    > been getting an http 403 error with the following code:

    ....

    Google offers a Java API, see http://www.google.com/apis/. It is much
    easier than trying to get and parse a web page.

    Note that they limit automated searching to 1000 queries per day,
    non-commercial, and require a license key with each request.

    Patricia
    Patricia Shanahan, May 18, 2006
    #2
    1. Advertising

  3. Guest

    wrote:
    ....
    > I know that http 403 error means that the server understood the
    > request, yet refused it. As you can probably tell I have very little
    > network programming experience, so maybe more experienced programmers
    > could help alter my approach, or explain a better one? Thanks.


    A better approach would be to use Google' APIs as Patricia pointed
    out.

    However this is not always an option (the API didn't help
    for, eg, groups.google.com last time I checked [but this was
    a long time ago I admit]).

    Faking your user agent string will allow you to bypass the 403
    (and it probably would be a breach of Google's terms).



    --
    (Don't pay attention to my .sig) Text file size: 1509 bytes
    SHA1: bbfa3226005c2d4d04e3d72d49bfb1eb17e67f12
    MD5: 38dfd87012a2754059a88341d66e2ef4
    , May 18, 2006
    #3
  4. Guest

    > Faking your user agent string will allow you to bypass the 403

    Could any provide a sample of how to fake my agent string?
    , May 18, 2006
    #4
  5. Guest

    In your example, you insert one line:

    URLConnection hpCon = hp.openConnection();
    hpCon.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U;
    Windows NT 5.0; en-US; rv:1.7.8) Gecko/20050511");
    hpCon.connect();

    and that may work.

    But you still should respect Google's terms...
    , May 18, 2006
    #5
  6. wrote:
    > In your example, you insert one line:
    >
    > URLConnection hpCon = hp.openConnection();
    > hpCon.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U;
    > Windows NT 5.0; en-US; rv:1.7.8) Gecko/20050511");
    > hpCon.connect();
    >
    > and that may work.


    I'm not sure this is enough.
    You probably have to set the http.agent property:

    http://java.sun.com/j2se/1.5.0/docs/guide/net/properties.html
    Andrea Desole, May 18, 2006
    #6
  7. Andrea Desole wrote:
    > wrote:
    >> In your example, you insert one line:
    >>
    >> URLConnection hpCon = hp.openConnection();
    >> hpCon.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U;
    >> Windows NT 5.0; en-US; rv:1.7.8) Gecko/20050511");
    >> hpCon.connect();
    >>
    >> and that may work.

    >
    > I'm not sure this is enough.
    > You probably have to set the http.agent property:
    >
    > http://java.sun.com/j2se/1.5.0/docs/guide/net/properties.html


    Additional hint: better use a decent HTTP client such as Apache's as the
    standard library classes are quite limited.

    Regards

    robert
    Robert Klemme, May 18, 2006
    #7
  8. Guest

    > URLConnection hpCon = hp.openConnection();
    > hpCon.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U;
    > Windows NT 5.0; en-US; rv:1.7.8) Gecko/20050511");
    > hpCon.connect();
    >

    it works, thanks.
    , May 18, 2006
    #8
  9. VisionSet Guest

    <> wrote in message
    news:...
    > > URLConnection hpCon = hp.openConnection();
    > > hpCon.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U;
    > > Windows NT 5.0; en-US; rv:1.7.8) Gecko/20050511");
    > > hpCon.connect();
    > >

    > it works, thanks.


    But you'll still get the same restriction of 1000 hits per day however you
    do it.

    --
    Mike W
    VisionSet, May 18, 2006
    #9
  10. Guest

    > But you'll still get the same restriction of 1000 hits per day however you
    > do it.


    Does this mean that even regular searches that are executed through
    their website with an actual browser are also limited to 1000 hits per
    day?
    , May 18, 2006
    #10
  11. wrote:
    >> But you'll still get the same restriction of 1000 hits per day however you
    >> do it.

    >
    > Does this mean that even regular searches that are executed through
    > their website with an actual browser are also limited to 1000 hits per
    > day?


    You are not doing a regular search via a browser. You are trying to do
    some automated querying. Googles ToS prohibits this
    http://www.google.com/terms_of_service.html. Whatever you are trying to
    do, you idea is flawed, since it is based on the concept of violating
    the terms-of-service of the service you are using.

    And do you really think you are the first one who had the glorious idea
    to "work around" the API limitation (read: violate the ToS) by
    simulating a browser?

    The irony is that you even use a Google mail address to plan and
    announce your intended violation of Google's ToS in public. What a great
    idea.

    --
    The comp.lang.java.gui FAQ:
    ftp://ftp.cs.uu.nl/pub/NEWS.ANSWERS/computer-lang/java/gui/faq
    http://www.uni-giessen.de/faq/archiv/computer-lang.java.gui.faq/
    Thomas Weidenfeller, May 19, 2006
    #11
  12. wrote:
    >> But you'll still get the same restriction of 1000 hits per day however you
    >> do it.

    >
    > Does this mean that even regular searches that are executed through
    > their website with an actual browser are also limited to 1000 hits per
    > day?


    Google is *extremely* good at detecting automated queries. Just get
    your program working, query Google a few hundred times, then try to
    visit google.com in your browser. You will very likely see a message
    that they have detected you.

    Someone at my employer tried this the other day. A few hundred
    automated queries later and the entire Fortune 50 company had to go
    through a CAPTCHA each time we wanted to use Google. 180,000+ people.
    jeremiah johnson, May 19, 2006
    #12
  13. In article <>,
    jeremiah johnson <> wrote:
    >
    >Someone at my employer tried this the other day. A few hundred
    >automated queries later and the entire Fortune 50 company had to go
    >through a CAPTCHA each time we wanted to use Google. 180,000+ people.


    How good are their CAPTCHAs? Is there a way to see them without first
    getting oneself banned?

    Cheers
    Bent D
    --
    Bent Dalager - - http://www.pvv.org/~bcd
    powered by emacs
    Bent C Dalager, May 19, 2006
    #13
  14. ashesh Guest

    hi!! have any one have idea about Hibernet,if u do then plz tell me
    about this.
    ashesh, May 21, 2006
    #14
  15. IchBin Guest

    ashesh wrote:
    > hi!! have any one have idea about Hibernet,if u do then plz tell me
    > about this.
    >

    Do a google search on hibernet java

    then look at the first article.

    Thanks in Advance...
    IchBin, Pocono Lake, Pa, USA
    http://weconsultants.servebeer.com/JHackerAppManager
    __________________________________________________________________________

    'If there is one, Knowledge is the "Fountain of Youth"'
    -William E. Taylor, Regular Guy (1952-)
    IchBin, May 21, 2006
    #15
  16. Luke Webber Guest

    ashesh wrote:
    > hi!! have any one have idea about Hibernet,if u do then plz tell me
    > about this.


    I think you're looking for Hibernate, the Java ORM...

    http://www.hibernate.org/

    Cheers,
    Luke
    Luke Webber, May 23, 2006
    #16
  17. Oliver Wong Guest

    "Bent C Dalager" <> wrote in message
    news:e4k4ah$cjb$...
    > In article <>,
    > jeremiah johnson <> wrote:
    >>
    >>Someone at my employer tried this the other day. A few hundred
    >>automated queries later and the entire Fortune 50 company had to go
    >>through a CAPTCHA each time we wanted to use Google. 180,000+ people.

    >
    > How good are their CAPTCHAs? Is there a way to see them without first
    > getting oneself banned?


    When I google for "google captcha", I get
    http://www.spy.org.uk/spyblog/2005/06/stupid_google_virusspyware_cap.html
    which has a screenshot of their captcha test.

    - Oliver
    Oliver Wong, May 25, 2006
    #17
  18. Roedy Green Guest

    On 17 May 2006 18:01:13 -0700, ""
    <> wrote, quoted or indirectly quoted someone who
    said :

    >I know that http 403 error means that the server understood the
    >request, yet refused it.


    Here is what I would do. I don't know if this is the problem though.

    Use a sniffer to watch the same query given by a browser. See
    http://mindprod.com/jgloss/sniffer.html

    Pad your request header out with additional fields the browser sends,
    e.g. info on what encodings are acceptable in reply.
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
    Roedy Green, May 26, 2006
    #18
  19. Roedy Green Guest

    On Fri, 19 May 2006 09:49:05 +0000 (UTC), (Bent C
    Dalager) wrote, quoted or indirectly quoted someone who said :

    >How good are their CAPTCHAs? Is there a way to see them without first
    >getting oneself banned?


    It would not take too much cleverness. All they have to do in monitor
    hits per hour from a given IP. If it suddenly jumps up, and if the
    hits have a stereotyped rigidity of format and timing, they have you
    nailed.
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
    Roedy Green, May 26, 2006
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Andrew Thompson

    FAQ - references to Google/Google Groups

    Andrew Thompson, Jun 20, 2005, in forum: Java
    Replies:
    0
    Views:
    623
    Andrew Thompson
    Jun 20, 2005
  2. Replies:
    2
    Views:
    2,034
    Jeffrey Schwab
    Nov 28, 2005
  3. Oltmans

    Searching Google?

    Oltmans, Feb 17, 2009, in forum: Python
    Replies:
    4
    Views:
    262
    Tim Wintle
    Feb 18, 2009
  4. Philip Rhoades
    Replies:
    4
    Views:
    113
    Eric Hodel
    Aug 18, 2009
  5. stumblng.tumblr
    Replies:
    1
    Views:
    204
    stumblng.tumblr
    Feb 4, 2008
Loading...

Share This Page