problem with search engines

S

suman.tedla

Hello All,

I am having a problem with search engines.
I am able to connect to google and yahoo thro java and retrieve the
html code.

say,for google the url is http://www.google.com/search?q=java

I am getting the html code when "java" is the query.

But when I click next in google(for next set of results) ,
the url is ..


http://www.google.com/search?q=java&hl=en&lr=&start=10&sa=N

When I try to connect to this url , i am getting malformed url
exceptions.

It is the same with yahoo.

Hope u understood my problem.
How can i get the html code from this url???

Can anyone help me??
 
S

suman.tedla

Hi, this is my code.I am trying to display the html code of google
search results.I am using command line arguments to give input.Later I
have to extract links.

i have executed like

java Demo http://www.google.com/search?q=java&hl=en&lr=&start=10&sa=N

import java.net.*;
import java.io.*;

class Demo {

public static void main(String[] args) throws Exception {
URL url = new URL(args[0]);
URLConnection conn = url.openConnection();
conn.setRequestProperty("User-Agent","");
conn.connect();
BufferedReader in = new BufferedReader(new
InputStreamReader(conn.getInputStream()));
String line;
while ((line = in.readLine()) != null)
{
System.out.print(line);
}
}
}



Why this is not working?
Can anyone help me?
 
G

Gordon Beaton


I think automated in this context means "unattended" or something
along those lines (they are somewhat vague about what they mean, and
maybe that's intentional). At any rate I'm pretty sure they don't
intend to forbid the use of "computer software" which, as far as I can
tell, most web browers consist of.

Nothing in the OP's posts indicate he intends to do automated queries.
Anyway, what difference should it make to the technical issue?

/gordon
 
C

Chris Uppal

Gordon said:
Nothing in the OP's posts indicate he intends to do automated queries.
Anyway, what difference should it make to the technical issue?

One is that wanting a technical solution /suggests/ that the OP's interested in
scanning more pages than it would be easy to do by hand. If Google decides to
block over-frequent queries (which it does automatically, or so I believe) then
the time spent coding solutions to the technical problems of constructing HTTP
requests and parsing HTML may have been wasted.

Much easier to use the Google API, I'd have thought:

http://www.google.com/apis/.

/if/ that still works...

-- chris
 
G

Gordon Beaton

One is that wanting a technical solution /suggests/ that the OP's
interested in scanning more pages than it would be easy to do by
hand. If Google decides to block over-frequent queries (which it
does automatically, or so I believe) then the time spent coding
solutions to the technical problems of constructing HTTP requests
and parsing HTML may have been wasted.

I've heard that some people write web browsers, web proxies, search
engines, etc. There is also practical value in learning techniques
that could have other uses. The technical issues are the same, and
have nothing to do with website policies.
Much easier to use the Google API, I'd have thought:

Probably, if it's just Google you're interested in (I believe the OP
only used Google as a concrete example).

/gordon
 
O

opalpa

Much easier to use the Google API, I'd have thought:
It works. Still works. Has not changed, has not been updated. I've
sampled it but prefer to make a URL, URLConnection, and then delegate
parsing to HTMLEditorKit.ParserCallback .

google's terms of service are woefully unclear.
 
O

opalpa

Code looks alright. Perhaps it is the parameter you are passing in?
What happens when you put the url string into you code instead of
having your shell deal with it? Just for a test put the URL into the
code instead of passing it as a parameter to your program and see if
that makes it work.

You're on windows o0r unix?

Pawel
 
T

Tilman Bohn

In message <[email protected]>,
Hi, this is my code.I am trying to display the html code of google
search results.I am using command line arguments to give input.Later I
have to extract links.

i have executed like

java Demo http://www.google.com/search?q=java&hl=en&lr=&start=10&sa=N

At this point whatever shell you're using to execute this gets
confused by either the ampersands or the equals signs (or both).

[snip working code]
Why this is not working?

It is working, but you're not passing the parameter you think you're
passing. To see this, insert an appropriate System.println() at the
beginning of your main(). To avoid the problem, try surrounding it with
whatever quote signs are applicable to your shell or escaping the
offending characters. ' or " are good candidates for the former, a
backslash is a good candidate for the latter, but the precise fix depends
entirely on your shell.
Can anyone help me?

I don't know.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,521
Members
44,995
Latest member
PinupduzSap

Latest Threads

Top