Getting data from a web page back in C

D

ddk1965

Hello,

I want to get certain data from a web page.
Example:
http://finance.yahoo.com/q?s=dow
I want to grab the number behind the field P/E -> 12.8 to a variable
in C.

Any idea to tackle this problem?
I don't want a commercial package because I need to use it in windows
& Linux.

Tx,

Danny
Belgium
 
J

Jens Thoms Toerring

ddk1965 said:
I want to get certain data from a web page.
Example:
http://finance.yahoo.com/q?s=dow
I want to grab the number behind the field P/E -> 12.8 to a variable
in C.
Any idea to tackle this problem?
I don't want a commercial package because I need to use it in windows
& Linux.

That's a rather complex undertaking. First thing you need to
do is get the HTML text of the web page. That can't be done
with standard C, you will need some external library to do
that. Best use one that does all the work for you (perhaps
libcurl will do). Once you got that data (in a file or mem-
ory) you can use all the available string comparison func-
tions to find the bit of information you're looking for.
That can get a bit ugly since the data are rather likely
surrounded by lots of HTML tags etc. (which may even change
from time to time). Once you found the place you could use
e.g. scanf() or sscanf() to read in the number. Voila, you
are done;-)

I actually would rather not do all that in C. There are other
languages that probably will make the whole task a lot simpler
(my personal preference would probably be Perl, but there are
lots of other languages that will get the job done). If you
have a program for just the task of extracting the bit of in-
formation you need you then could start it via e.g. system()
and, when it's doen, read it what it produced from within your
C program.

BTW, some providers of such data don't allow "scraping" their
web pages, i.e. extracting information for further uses (and
some may even try to keep you from doing so by introducing
some tech- nical hurdles). I don't know if this is also the
case for Yahoo, but you should check their terms of use be-
fore you go on.
Regards, Jens
 
J

jacob navia

ddk1965 said:
Hello,

I want to get certain data from a web page.
Example:
http://finance.yahoo.com/q?s=dow
I want to grab the number behind the field P/E -> 12.8 to a variable
in C.

Any idea to tackle this problem?
I don't want a commercial package because I need to use it in windows
& Linux.

Tx,

Danny
Belgium

(1) Get the source of the web page into a text file in your computer.
If you use the lcc-win compiler, there is a library that allows you
to do that with a single function call. Under linux you can just use
system("wget ... ");
(2) The data is embedded within a table (called "Dow chemical") You will
have to parse the HTML table format, what is relatively easy. 6th
row, second column and you are all set. This means skipping the
first five line separators of the table, then skipping the first
column separator. In principle you can do that with strstr(),
there are no technical difficulties to do what you have in mind

(3) Forget all people that tell you that "C is not good for this task".
They are just not very well informed. I have done very similar jobs
in C without any problems!:
 
W

Wolfgang Draxinger

ddk1965 said:
Hello,

I want to get certain data from a web page.
Example:
http://finance.yahoo.com/q?s=dow
I want to grab the number behind the field P/E -> 12.8 to a
variable in C.

This task involves using the HTTP protocol, to fetch the data,
and some HTML parser to analyse the code. Unfortunately the page
is not XHTML. Would it be XHTML you could use a XPath to address
the element of interest.
Any idea to tackle this problem?
I don't want a commercial package because I need to use it in
windows & Linux.

Use "libcurl" for the HTTP part. If the page is XML-like enough
you can use "expat" for parsing and XPath addressing.

Both are not C language specific and thus off topic here.


Wolfgang
 
A

Antoninus Twink

Under linux you can just use system("wget ... ");

Using a library (e.g. libcurl) would be a much better solution!
(3) Forget all people that tell you that "C is not good for this task".
They are just not very well informed. I have done very similar jobs
in C without any problems!:

All things being equal, using a language like Perl in which regular
expressions are first-class objects will /greatly/ simplify the code if
the main purpose of the program is to manipulate text.
 
J

jacob navia

Antoninus said:
Using a library (e.g. libcurl) would be a much better solution!


All things being equal, using a language like Perl in which regular
expressions are first-class objects will /greatly/ simplify the code if
the main purpose of the program is to manipulate text.

lcc-win features full perl regular expressions. If you do not
use lcc-win you can always use the library and link it with
your code, if you need full perl regexp.

For this specific case, I would just

strstr(PageText,tableMarker);

then do a little verification to see this is the right table, like
looking the title.

When the table is found you can strstr with the line separator 5 times
to get into the right line, then strstr the column separator to get to
the right column...

And there you are. You read the number from the text.

All in a few ms.

For instance I developed (in C) a package to scan the pages in

http://marsrovers.jpl.nasa.gov/gallery/all/opportunity.html

to automatically download the latest images from the planet Mars.

I do a very similar scanning, ignoring 99% of the web page text
and using some specific markers to get to the URL of the images.

I assume a fixed format but the program is working since 5 years
and it has never failed...
 
F

Flash Gordon

jacob said:
(1) Get the source of the web page into a text file in your computer.
If you use the lcc-win compiler, there is a library that allows you
to do that with a single function call. Under linux you can just use
system("wget ... ");

wget is not installed on all Linux boxes. It's installed on all the ones
I build because I use it, but I've come across ones without it. There
are other potential problems with using it.

In any case, I would use one library available on both. Personally I use
libcurl, but there are other alternative.
(2) The data is embedded within a table (called "Dow chemical") You will
have to parse the HTML table format, what is relatively easy. 6th
row, second column and you are all set. This means skipping the
first five line separators of the table, then skipping the first
column separator. In principle you can do that with strstr(),
there are no technical difficulties to do what you have in mind


I would use a library that parses html so that the code can be robust. I
doubt that page is specifically available for automated processing so
could easily change.
(3) Forget all people that tell you that "C is not good for this task".
They are just not very well informed. I have done very similar jobs
in C without any problems!:

The best language to use for this task (in this case) is the one you can
get the best libraries for to do most of the work in my opinion.
 
R

Richard Bos

Antoninus Twink said:
Using a library (e.g. libcurl) would be a much better solution!


All things being equal, using a language like Perl in which regular
expressions are first-class objects will /greatly/ simplify the code if
the main purpose of the program is to manipulate text.

Both of those strongly depend on the set of HTML files that is to be
manipulated. When I did something akin to what the OP wants, I used
system("wget...") and C code - and apart from the argument to the system
call, and from the assumption of ASCII, which, after all, is inherent in
HTML, it was perfectly portable ISO C code - but I could do so because I
knew in advance that both the structure of the web site and the text of
the individual web pages were completely regular. For files that simple,
and the purpose to which I put the data, PERL would have been
considerably less legible than C.

Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top