How to get a html doc of a web site?

V

Victoria

I have 2 Questions:

Q1 : Given a URL of a site, how can I get the web document (html file)
of this site? (a Chinese Web site)

Q2 : How can I write a program which can search the google given a
keyword (in Chinese) and then returns a list of links (e.g. the top 10
links)?

I really need help coz I have just a little time to finish the
project, however, I don't have the experience in writing those Java
program. If you have time, can u help me to think about how to do it?

Victoria
 
M

Marco Schmidt

Victoria:
Q1 : Given a URL of a site, how can I get the web document (html file)
of this site? (a Chinese Web site)

Read the API docs for java.net.URLConnection. Create one for your URL
and call getInputStream to access the data. It doesn't matter if the
content is Chinese text or anything else, at that level you are
dealing with binary data.
Q2 : How can I write a program which can search the google given a
keyword (in Chinese) and then returns a list of links (e.g. the top 10
links)?

Check out <http://www.google.com/apis/>.

You can also do it manually (assemble a URL, download search result
page and parse it for links), but that most likely is a breach of
Google's terms of service.
I really need help coz I have just a little time to finish the
project, however, I don't have the experience in writing those Java
program. If you have time, can u help me to think about how to do it?

Little time, no experience with Java and a rather complex task (Q2).
Not looking good. :/

Regards,
Marco
 
D

david m-

Victoria said:
I have 2 Questions:

Q1 : Given a URL of a site, how can I get the web document (html file)
of this site? (a Chinese Web site)

Q2 : How can I write a program which can search the google given a
keyword (in Chinese) and then returns a list of links (e.g. the top 10
links)?

I really need help coz I have just a little time to finish the
project, however, I don't have the experience in writing those Java
program. If you have time, can u help me to think about how to do it?

Victoria

Q1

import java.io.InputStreamReader;
import java.io.BufferedReader;
import java.net.URL;

public class Main
{
public static void main(String [] args)
{
System.out.println("Please wait...");
try


BufferedReader in = new BufferedReader(new InputStreamReader( new
URL("http://www.chinanews.com.cn/").openStream() ));

int iLine = 0;
String line;

// Retrieve first 10 lines only
while( (iLine < 10) && ((line = in.readLine()) != null))
{
iLine++;
System.out.println( line );
}
}
catch (Exception e)
{
}
}
}
 
V

Victoria

Macro:

Thankyou very much for yr reply !
Do u know any sample programs about my Questions ...
Becasue time is really limited for me~

Thanks~
Victoria
 
A

Andrew Thompson

| Macro:

No Victoria, a 'Macro' is a group of instructions,
whereas the gentleman's name is 'Marco'

| Do u know any sample programs about my Questions ...

You've already been given the first part
on a platter Victoria, do you need me to
'gift wrap' it for you?

| Becasue time is really limited for me~

Between you and me (and whoever else reads
this public forum) we do not actually care.
Your time constraints are yours alone,
and generally come down to either,
a) you should develop better time planning
b) you should hire more people

Show some effort (_any_ effort) and I
am confident you will get more help..
 
V

Victoria

I 've tried the program and see the Chinese displayed.

Actually my task is try to get the no of occurence of some specified
keywords in a web site using (http://www.yahoo.com.hk).

So I construct a txt file contains a keyword, I use
File f_keyword = new File ( "test.txt" );
FileInputStream FIS = new FileInputStream (f_keyword );
BufferedReader BR = new BufferedReader(new InputStreamReader( FIS ));
to read this file and the keyword is Traditional Chinese.

However , there is a NullPointerException when I try to
public static int Count(String str, String keyword){
int count=0;

int index=str.indexOf(keyword);

while(index!=-1)
{
count++;
index=str.indexOf(keyword,index+1);

}
return count;
}


Is this about the Charset Problem?
How to solve the problem?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,444
Messages
2,571,709
Members
48,796
Latest member
Greg L.
Top