not sure who to ask... sorting data from a webpage...

E

Eric

Hi there, I'm wondering if anyone might now how I can sort through
data from a web site.

Here's what I mean: I go to a page like this,
http://biz.yahoo.com/research/earncal/20050727.html

and make lists in a text file that look like this,
""""""""
July 27/05
am:
zbra ycc xel wec wlp wlm vcg vitx uco umc tup trps twti tmo mos faf ba
tin tds tem sup su seo fon see std res rcl rol rok resp quot pub px
prai plug plc pas pfsb ptc pnp pfcb oxgn ocas nus nsc nfx mpp mnst mx
mtlk mdp mwv mso mpx mmp lz liz tvl lii kyo komg iris ips ipt intt iff
ifcj ilog ibas.ob holx hit hw hhs gifi gbbk gemp grmn fcl forr fsrv
fmsb fmbi fnf eqr eog dyax dtc dbd do cfr cgx cop cgen cbbo cnh cksw
ctec cbi gib cra csar caj calp cach bc biom brg bhl bms beav bol rate
ava attu arw ant apu ahc amrn agn ati apd amg actu acpw

time not:
wgbc wri wlt vitr upl ttmi toc eml skx rai rjet rgen o rndc pnw ptnr
oste opy omx nwpx nu nem njr nls mnc mips mesa mth lpx kmg kmt hmc hlt
hca gsic sab flyi flml fe xide exac eeft eqix eni ele csx covd.ob cnxt
cpts chrz cl chir cra belfb augt aspm amkr alda agu aby

pm:
zmh xl wxs wits wsh wpi wsii vas vrtx vrlk vtr vvc vari var uhs xprsa
tyl trid twp twi thrx ttek tk te talx smbi sxl stnr stts sfn sspi ssi
sp sfcc sero sanyy rop rsg rrc rhd qdel quik str phm pgi plxs pdg pxlw
pmtr osip open ntri ntct cetl mtsc mrvc motv mrh mcel mcrl wfr mdsi
mck mxo mant mtw lmnx lsi linn.ob ltbg lpnt psco kex jll ipas issx
imgc ingr ifsia idti imdc htrn hlex hrs hgr gmk gva job fbn fbr foxh
fbc chrx fcgi fic eyet esrx esst eres epix ets edap dre hill driv dtpi
ddr dnb cytk cybe cts clb cnqr ctg clrk cogt clf cenx cv ctlm cldn
cald cdn cbt vnt bne bpfh bcgi blkb bjfi bjct belm bgf acls atsi ahl
arrs amcc appb apac anik adpi atgn alex acl arg atac actu (1) akr atx
"""""""""

I do this by hand. As you can see there are 3 main categories,
1)before market open, 2) time not supplied and, 3) after market close
and some specific times of earnings release.

Can any one tell me how to create these lists without typing them all
out by hand?

thanks for any help
Eric
 
A

Adrienne

Hi there, I'm wondering if anyone might now how I can sort through
data from a web site.

Here's what I mean: I go to a page like this,
http://biz.yahoo.com/research/earncal/20050727.html

and make lists in a text file that look like this,
""""""""
I do this by hand. As you can see there are 3 main categories,
1)before market open, 2) time not supplied and, 3) after market close
and some specific times of earnings release.

Can any one tell me how to create these lists without typing them all
out by hand?

thanks for any help

It's a cheat, but it works. Open the page you want in IE, and open
Excel. Copy the information from IE, paste into Excel. Then you can use
Excel to manipulate it and save it as a text file, or save it as a dbf
file, whichever is better for you.

I do not think this will work with any other browser except IE. Of
course, I could be wrong.
 
M

mbstevens

Eric said:
Hi there, I'm wondering if anyone might now how I can sort through
data from a web site.

Here's what I mean: I go to a page like this,
http://biz.yahoo.com/research/earncal/20050727.html

and make lists in a text file that look like this,
""""""""
July 27/05
am:
zbra ycc xel wec wlp wlm vcg vitx uco umc tup trps twti tmo mos faf ba
tin tds tem sup su seo fon see std res rcl rol rok resp quot pub px
I do this by hand. As you can see there are 3 main categories,
1)before market open, 2) time not supplied and, 3) after market close
and some specific times of earnings release.

Can any one tell me how to create these lists without typing them all
out by hand?

thanks for any help
Eric

It could be completely automated all the way from the web page to a
formatted file on your local machine.

You could use Perl's LWP::Simple module to get the webpage and put it
into a variable.

Next you could use Perl's HTML::parser module to extract the plain text
you want from the HTML. You would likely also have to use the split
function and regular expressions as suppliments to this.

Perl has sophisticated sorting facilities once you get the information
you want sucked into an array. The array could then be written in
whatever format you want to a file.

There is lots of Perl documentation online, and you can get ActivePerl
for Windows at activestate.com. If you havn't programmed Perl before
there will be a learning period, but it will automate your task
completely. Similar facilities exist for Python, the language the
Google search engine was written in.
 
A

Animesh Kumar

mbstevens said:
Eric wrote:
There is lots of Perl documentation online, and you can get ActivePerl
for Windows at activestate.com. If you havn't programmed Perl before
there will be a learning period, but it will automate your task
completely. Similar facilities exist for Python, the language the
Google search engine was written in.

Among the lot of documentation, I find the following most useful and
succinct:

http://www.comp.leeds.ac.uk/Perl/start.html

I tried installing ActiveState Perl but I didnt like it. It takes way
too long ot install and doesn't runs properly on Win-XP with SP2.
Instead I use perl inside Cygwin. Soon I will get back to Linux like
good old days.

Best
A
 
M

mbstevens

Animesh said:
Among the lot of documentation, I find the following most useful and
succinct:

http://www.comp.leeds.ac.uk/Perl/start.html

I tried installing ActiveState Perl but I didnt like it. It takes way
too long ot install and doesn't runs properly on Win-XP with SP2.
Instead I use perl inside Cygwin.

Hmm. Havn't tried it on Win since SP2 -- I would be interested in
knowing if anyone else is having trouble running Active State Perl on
Win with SP2.
Soon I will get back to Linux like
good old days.

An op system that comes with Perl, Python, and Common Lisp is much more
comfortable than one that comes with proprietary languages, all right.
You can buy a big hard disk fro $50 US these days, leave your XP on the
machine, and install 4 or 5 linux systems on the same machine. Just
study Grub and LILO.
 
D

data64

You could use Perl's LWP::Simple module to get the webpage and put it
into a variable.

Next you could use Perl's HTML::parser module to extract the plain text
you want from the HTML. You would likely also have to use the split
function and regular expressions as suppliments to this.

Actually in this case, I would suggest Template::Extract rather than
Html::parser as an simpler way of extracting data.
But then with Perl there's usually more than one way of doing it.

data64
 

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top