not sure who to ask... sorting data from a webpage...

Discussion in 'HTML' started by Eric, Jul 27, 2005.

  1. Eric

    Eric Guest

    Hi there, I'm wondering if anyone might now how I can sort through
    data from a web site.

    Here's what I mean: I go to a page like this,
    http://biz.yahoo.com/research/earncal/20050727.html

    and make lists in a text file that look like this,
    """"""""
    July 27/05
    am:
    zbra ycc xel wec wlp wlm vcg vitx uco umc tup trps twti tmo mos faf ba
    tin tds tem sup su seo fon see std res rcl rol rok resp quot pub px
    prai plug plc pas pfsb ptc pnp pfcb oxgn ocas nus nsc nfx mpp mnst mx
    mtlk mdp mwv mso mpx mmp lz liz tvl lii kyo komg iris ips ipt intt iff
    ifcj ilog ibas.ob holx hit hw hhs gifi gbbk gemp grmn fcl forr fsrv
    fmsb fmbi fnf eqr eog dyax dtc dbd do cfr cgx cop cgen cbbo cnh cksw
    ctec cbi gib cra csar caj calp cach bc biom brg bhl bms beav bol rate
    ava attu arw ant apu ahc amrn agn ati apd amg actu acpw

    time not:
    wgbc wri wlt vitr upl ttmi toc eml skx rai rjet rgen o rndc pnw ptnr
    oste opy omx nwpx nu nem njr nls mnc mips mesa mth lpx kmg kmt hmc hlt
    hca gsic sab flyi flml fe xide exac eeft eqix eni ele csx covd.ob cnxt
    cpts chrz cl chir cra belfb augt aspm amkr alda agu aby

    pm:
    zmh xl wxs wits wsh wpi wsii vas vrtx vrlk vtr vvc vari var uhs xprsa
    tyl trid twp twi thrx ttek tk te talx smbi sxl stnr stts sfn sspi ssi
    sp sfcc sero sanyy rop rsg rrc rhd qdel quik str phm pgi plxs pdg pxlw
    pmtr osip open ntri ntct cetl mtsc mrvc motv mrh mcel mcrl wfr mdsi
    mck mxo mant mtw lmnx lsi linn.ob ltbg lpnt psco kex jll ipas issx
    imgc ingr ifsia idti imdc htrn hlex hrs hgr gmk gva job fbn fbr foxh
    fbc chrx fcgi fic eyet esrx esst eres epix ets edap dre hill driv dtpi
    ddr dnb cytk cybe cts clb cnqr ctg clrk cogt clf cenx cv ctlm cldn
    cald cdn cbt vnt bne bpfh bcgi blkb bjfi bjct belm bgf acls atsi ahl
    arrs amcc appb apac anik adpi atgn alex acl arg atac actu (1) akr atx
    """""""""

    I do this by hand. As you can see there are 3 main categories,
    1)before market open, 2) time not supplied and, 3) after market close
    and some specific times of earnings release.

    Can any one tell me how to create these lists without typing them all
    out by hand?

    thanks for any help
    Eric
    Eric, Jul 27, 2005
    #1
    1. Advertising

  2. Eric

    Adrienne Guest

    Gazing into my crystal ball I observed Eric <> writing in
    news::

    > Hi there, I'm wondering if anyone might now how I can sort through
    > data from a web site.
    >
    > Here's what I mean: I go to a page like this,
    > http://biz.yahoo.com/research/earncal/20050727.html
    >
    > and make lists in a text file that look like this,
    > """"""""

    <snip list>

    > I do this by hand. As you can see there are 3 main categories,
    > 1)before market open, 2) time not supplied and, 3) after market close
    > and some specific times of earnings release.
    >
    > Can any one tell me how to create these lists without typing them all
    > out by hand?
    >
    > thanks for any help


    It's a cheat, but it works. Open the page you want in IE, and open
    Excel. Copy the information from IE, paste into Excel. Then you can use
    Excel to manipulate it and save it as a text file, or save it as a dbf
    file, whichever is better for you.

    I do not think this will work with any other browser except IE. Of
    course, I could be wrong.

    --
    Adrienne Boswell (Opera lover)
    http://www.cavalcade-of-coding.info
    Please respond to the group so others can share
    Adrienne, Jul 27, 2005
    #2
    1. Advertising

  3. Eric

    mbstevens Guest

    Eric wrote:
    > Hi there, I'm wondering if anyone might now how I can sort through
    > data from a web site.
    >
    > Here's what I mean: I go to a page like this,
    > http://biz.yahoo.com/research/earncal/20050727.html
    >
    > and make lists in a text file that look like this,
    > """"""""
    > July 27/05
    > am:
    > zbra ycc xel wec wlp wlm vcg vitx uco umc tup trps twti tmo mos faf ba
    > tin tds tem sup su seo fon see std res rcl rol rok resp quot pub px


    > I do this by hand. As you can see there are 3 main categories,
    > 1)before market open, 2) time not supplied and, 3) after market close
    > and some specific times of earnings release.
    >
    > Can any one tell me how to create these lists without typing them all
    > out by hand?
    >
    > thanks for any help
    > Eric


    It could be completely automated all the way from the web page to a
    formatted file on your local machine.

    You could use Perl's LWP::Simple module to get the webpage and put it
    into a variable.

    Next you could use Perl's HTML::parser module to extract the plain text
    you want from the HTML. You would likely also have to use the split
    function and regular expressions as suppliments to this.

    Perl has sophisticated sorting facilities once you get the information
    you want sucked into an array. The array could then be written in
    whatever format you want to a file.

    There is lots of Perl documentation online, and you can get ActivePerl
    for Windows at activestate.com. If you havn't programmed Perl before
    there will be a learning period, but it will automate your task
    completely. Similar facilities exist for Python, the language the
    Google search engine was written in.
    --
    mbstevens
    http://www.mbstevens.com/
    mbstevens, Jul 27, 2005
    #3
  4. mbstevens wrote:

    > Eric wrote:
    >

    <snip>

    >
    > There is lots of Perl documentation online, and you can get ActivePerl
    > for Windows at activestate.com. If you havn't programmed Perl before
    > there will be a learning period, but it will automate your task
    > completely. Similar facilities exist for Python, the language the
    > Google search engine was written in.


    Among the lot of documentation, I find the following most useful and
    succinct:

    http://www.comp.leeds.ac.uk/Perl/start.html

    I tried installing ActiveState Perl but I didnt like it. It takes way
    too long ot install and doesn't runs properly on Win-XP with SP2.
    Instead I use perl inside Cygwin. Soon I will get back to Linux like
    good old days.

    Best
    A
    Animesh Kumar, Jul 27, 2005
    #4
  5. Eric

    mbstevens Guest

    Animesh Kumar wrote:
    > mbstevens wrote:
    >
    >> Eric wrote:
    >>

    > <snip>
    >
    >>
    >> There is lots of Perl documentation online, and you can get ActivePerl
    >> for Windows at activestate.com. If you havn't programmed Perl before
    >> there will be a learning period, but it will automate your task
    >> completely. Similar facilities exist for Python, the language the
    >> Google search engine was written in.

    >
    >
    > Among the lot of documentation, I find the following most useful and
    > succinct:
    >
    > http://www.comp.leeds.ac.uk/Perl/start.html
    >
    > I tried installing ActiveState Perl but I didnt like it. It takes way
    > too long ot install and doesn't runs properly on Win-XP with SP2.
    > Instead I use perl inside Cygwin.


    Hmm. Havn't tried it on Win since SP2 -- I would be interested in
    knowing if anyone else is having trouble running Active State Perl on
    Win with SP2.

    > Soon I will get back to Linux like
    > good old days.


    An op system that comes with Perl, Python, and Common Lisp is much more
    comfortable than one that comes with proprietary languages, all right.
    You can buy a big hard disk fro $50 US these days, leave your XP on the
    machine, and install 4 or 5 linux systems on the same machine. Just
    study Grub and LILO.
    mbstevens, Jul 28, 2005
    #5
  6. Eric

    data64 Guest

    >
    > You could use Perl's LWP::Simple module to get the webpage and put it
    > into a variable.
    >
    > Next you could use Perl's HTML::parser module to extract the plain text
    > you want from the HTML. You would likely also have to use the split
    > function and regular expressions as suppliments to this.
    >


    Actually in this case, I would suggest Template::Extract rather than
    Html::parser as an simpler way of extracting data.
    But then with Perl there's usually more than one way of doing it.

    data64
    data64, Jul 29, 2005
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. JAlexa9898
    Replies:
    2
    Views:
    419
    Andrew Thompson
    Feb 2, 2005
  2. Replies:
    1
    Views:
    421
  3. Replies:
    0
    Views:
    551
  4. Paul
    Replies:
    14
    Views:
    845
    Alexey Smirnov
    Jun 19, 2008
  5. sifar
    Replies:
    5
    Views:
    417
Loading...

Share This Page