problem with search engines

Discussion in 'Java' started by suman.tedla@gmail.com, Jan 31, 2005.

  1. Guest

    Hello All,

    I am having a problem with search engines.
    I am able to connect to google and yahoo thro java and retrieve the
    html code.

    say,for google the url is http://www.google.com/search?q=java

    I am getting the html code when "java" is the query.

    But when I click next in google(for next set of results) ,
    the url is ..


    http://www.google.com/search?q=java&hl=en&lr=&start=10&sa=N

    When I try to connect to this url , i am getting malformed url
    exceptions.

    It is the same with yahoo.

    Hope u understood my problem.
    How can i get the html code from this url???

    Can anyone help me??
     
    , Jan 31, 2005
    #1
    1. Advertising

  2. On 31 Jan 2005 01:51:55 -0800, wrote:

    > http://www.google.com/search?q=java&hl=en&lr=&start=10&sa=N


    Try.. <http://www.google.com/search?q=java&start=10>

    Alternately,
    <http://java.sun.com/j2se/1.5.0/docs/api/java/net/URLEncoder.html#encode(java.lang.String,%20java.lang.String)>

    > When I try to connect to this url , i am getting malformed url
    > exceptions.


    Using Java code? (If yes) What Java code?

    > Hope u understood my problem.


    No, not clearly. Code speaks volumes though.
    <http://www.physci.org/codes/sscce.jsp>

    --
    Andrew Thompson
    http://www.PhySci.org/codes/ Web & IT Help
    http://www.PhySci.org/ Open-source software suite
    http://www.1point1C.org/ Science & Technology
    http://www.LensEscapes.com/ Images that escape the mundane
     
    Andrew Thompson, Jan 31, 2005
    #2
    1. Advertising

  3. nsc Guest

    I think google excpects some cookie information along with the URL.
     
    nsc, Jan 31, 2005
    #3
  4. nsc Guest

    nsc, Jan 31, 2005
    #4
  5. boom Guest

    http://www.plsgoogleit.com


    <> wrote in message
    news:...
    > Hello All,
    >
    > I am having a problem with search engines.
    > I am able to connect to google and yahoo thro java and retrieve the
    > html code.
    >
    > say,for google the url is http://www.google.com/search?q=java
    >
    > I am getting the html code when "java" is the query.
    >
    > But when I click next in google(for next set of results) ,
    > the url is ..
    >
    >
    > http://www.google.com/search?q=java&hl=en&lr=&start=10&sa=N
    >
    > When I try to connect to this url , i am getting malformed url
    > exceptions.
    >
    > It is the same with yahoo.
    >
    > Hope u understood my problem.
    > How can i get the html code from this url???
    >
    > Can anyone help me??
    >
     
    boom, Jan 31, 2005
    #5
  6. Guest

    Hi, this is my code.I am trying to display the html code of google
    search results.I am using command line arguments to give input.Later I
    have to extract links.

    i have executed like

    java Demo http://www.google.com/search?q=java&hl=en&lr=&start=10&sa=N

    import java.net.*;
    import java.io.*;

    class Demo {

    public static void main(String[] args) throws Exception {
    URL url = new URL(args[0]);
    URLConnection conn = url.openConnection();
    conn.setRequestProperty("User-Agent","");
    conn.connect();
    BufferedReader in = new BufferedReader(new
    InputStreamReader(conn.getInputStream()));
    String line;
    while ((line = in.readLine()) != null)
    {
    System.out.print(line);
    }
    }
    }



    Why this is not working?
    Can anyone help me?
     
    , Jan 31, 2005
    #6
  7. wrote:

    > Hi, this is my code.I am trying to display the html code of google
    > search results.I am using command line arguments to give input.Later I
    > have to extract links.
    >

    [cut]
    > Why this is not working?
    > Can anyone help me?
    >


    http://www.google.com/intl/en/terms_of_service.html
    Read carefully: No Automated Querying

    --
    Olek
     
    Aleksander =?iso-8859-2?Q?Str=B1czek?=, Feb 1, 2005
    #7
  8. On Tue, 1 Feb 2005 18:45:21 +0000 (UTC), Aleksander Str±czek wrote:
    > http://www.google.com/intl/en/terms_of_service.html
    > Read carefully: No Automated Querying


    I think automated in this context means "unattended" or something
    along those lines (they are somewhat vague about what they mean, and
    maybe that's intentional). At any rate I'm pretty sure they don't
    intend to forbid the use of "computer software" which, as far as I can
    tell, most web browers consist of.

    Nothing in the OP's posts indicate he intends to do automated queries.
    Anyway, what difference should it make to the technical issue?

    /gordon

    --
    [ do not email me copies of your followups ]
    g o r d o n + n e w s @ b a l d e r 1 3 . s e
     
    Gordon Beaton, Feb 2, 2005
    #8
  9. Chris Uppal Guest

    Gordon Beaton wrote:

    > Nothing in the OP's posts indicate he intends to do automated queries.
    > Anyway, what difference should it make to the technical issue?


    One is that wanting a technical solution /suggests/ that the OP's interested in
    scanning more pages than it would be easy to do by hand. If Google decides to
    block over-frequent queries (which it does automatically, or so I believe) then
    the time spent coding solutions to the technical problems of constructing HTTP
    requests and parsing HTML may have been wasted.

    Much easier to use the Google API, I'd have thought:

    http://www.google.com/apis/.

    /if/ that still works...

    -- chris
     
    Chris Uppal, Feb 2, 2005
    #9
  10. On Wed, 2 Feb 2005 09:22:04 -0000, Chris Uppal wrote:
    > One is that wanting a technical solution /suggests/ that the OP's
    > interested in scanning more pages than it would be easy to do by
    > hand. If Google decides to block over-frequent queries (which it
    > does automatically, or so I believe) then the time spent coding
    > solutions to the technical problems of constructing HTTP requests
    > and parsing HTML may have been wasted.


    I've heard that some people write web browsers, web proxies, search
    engines, etc. There is also practical value in learning techniques
    that could have other uses. The technical issues are the same, and
    have nothing to do with website policies.

    > Much easier to use the Google API, I'd have thought:


    Probably, if it's just Google you're interested in (I believe the OP
    only used Google as a concrete example).

    /gordon

    --
    [ do not email me copies of your followups ]
    g o r d o n + n e w s @ b a l d e r 1 3 . s e
     
    Gordon Beaton, Feb 2, 2005
    #10
  11. Guest

    >> Much easier to use the Google API, I'd have thought:
    >>
    >> http://www.google.com/apis/.
    >>
    >> /if/ that still works...


    It works. Still works. Has not changed, has not been updated. I've
    sampled it but prefer to make a URL, URLConnection, and then delegate
    parsing to HTMLEditorKit.ParserCallback .

    google's terms of service are woefully unclear.
     
    , Feb 3, 2005
    #11
  12. Guest

    Code looks alright. Perhaps it is the parameter you are passing in?
    What happens when you put the url string into you code instead of
    having your shell deal with it? Just for a test put the URL into the
    code instead of passing it as a parameter to your program and see if
    that makes it work.

    You're on windows o0r unix?

    Pawel
     
    , Feb 3, 2005
    #12
  13. Tilman Bohn Guest

    In message <>,
    wrote on 31 Jan 2005 13:31:28 -0800:

    > Hi, this is my code.I am trying to display the html code of google
    > search results.I am using command line arguments to give input.Later I
    > have to extract links.
    >
    > i have executed like
    >
    > java Demo http://www.google.com/search?q=java&hl=en&lr=&start=10&sa=N


    At this point whatever shell you're using to execute this gets
    confused by either the ampersands or the equals signs (or both).

    [snip working code]

    > Why this is not working?


    It is working, but you're not passing the parameter you think you're
    passing. To see this, insert an appropriate System.println() at the
    beginning of your main(). To avoid the problem, try surrounding it with
    whatever quote signs are applicable to your shell or escaping the
    offending characters. ' or " are good candidates for the former, a
    backslash is a good candidate for the latter, but the precise fix depends
    entirely on your shell.

    > Can anyone help me?


    I don't know.

    --
    Cheers, Tilman

    `Boy, life takes a long time to live...' -- Steven Wright
     
    Tilman Bohn, Feb 6, 2005
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?R3JpZmY=?=

    eCommerce & search engines incompatibility problem

    =?Utf-8?B?R3JpZmY=?=, Nov 9, 2006, in forum: ASP .Net
    Replies:
    2
    Views:
    310
    Bob Milutinovic
    Nov 14, 2006
  2. phl
    Replies:
    0
    Views:
    353
  3. Chris K.
    Replies:
    3
    Views:
    742
    Chris K.
    Jun 8, 2008
  4. SROSeaner

    Meta-Search the Search Engines

    SROSeaner, Sep 15, 2004, in forum: ASP General
    Replies:
    1
    Views:
    148
  5. Griff
    Replies:
    2
    Views:
    103
    Bob Milutinovic
    Nov 14, 2006
Loading...

Share This Page