JavaScript web scraping test cases?

Discussion in 'Python' started by John J. Lee, Aug 20, 2003.

  1. John J. Lee

    John J. Lee Guest

    I've put together a Python package for scraping / testing pages that
    depend on embedded JavaScript code (without depending on IE, Mozilla
    or Konqueror, and with the DOM etc. all implemented in pure Python --
    mostly a hacked 4DOM, with some bits from pxdom; the JavaScript
    interpreter I'm using ATM is spidermonkey). It's still missing a lot
    and is pre-alpha, but it works, just barely.

    Anyway, the point of this post is that I'm looking for pages to test
    it on, so if you have a page that you'd like scraped (one that uses
    JavaScript in some non-trivial way, of course! -- for dynamically
    modifying forms, setting cookies, or whatever), mail me the details:
    better that than some randomly-selected site from the Internet.
    Obviously, it should be something that doesn't violate any terms &
    conditions of use or otherwise cause people trouble, and preferably
    that doesn't require any signup.


    [In fact, TBH, my completely ad-hoc methodology with this is to write
    some web scraping code, discover that the JavaScript breaks things,
    often by depending on some nonstandard DOM feature, hack the DOM a
    bit, etc. Hopefully I'll reach a point in understanding where I can
    rewrite the DOM from scratch ('scratch' here being 4DOM), properly, to
    match some approximation of 'HTML DOM as deployed'...]


    John
     
    John J. Lee, Aug 20, 2003
    #1
    1. Advertising

  2. John J. Lee

    John J. Lee Guest

    (John J. Lee) writes:
    [...]
    > Anyway, the point of this post is that I'm looking for pages to test
    > it on, so if you have a page that you'd like scraped (one that uses
    > JavaScript in some non-trivial way, of course! -- for dynamically
    > modifying forms, setting cookies, or whatever), mail me the details:
    > better that than some randomly-selected site from the Internet.
    > Obviously, it should be something that doesn't violate any terms &
    > conditions of use or otherwise cause people trouble, and preferably
    > that doesn't require any signup.

    [...]

    Nobody?

    I'll get my coat. ;-)


    John
     
    John J. Lee, Aug 22, 2003
    #2
    1. Advertising


  3. >> Anyway, the point of this post is that I'm looking for pages to test
    >> it on, so if you have a page that you'd like scraped (one that uses
    >> JavaScript in some non-trivial way, of course! ...


    John> Nobody?

    Sorry, I couldn't think of anything off the top of my head. In my own pages
    I've only ever used JS in trivial ways. Aside from a calendar on the Mojam
    search results pages, I don't think JS is used on our sites at all. Still,
    you're welcome to try it out on something like

    http://www.mojam.com/concerts/search?key=performer&value=greg brown

    Skip
     
    Skip Montanaro, Aug 23, 2003
    #3
  4. John J. Lee

    John J Lee Guest

    On Fri, 22 Aug 2003, Skip Montanaro wrote:

    >
    > >> Anyway, the point of this post is that I'm looking for pages to test
    > >> it on, so if you have a page that you'd like scraped (one that uses
    > >> JavaScript in some non-trivial way, of course! ...

    >
    > John> Nobody?
    >
    > Sorry, I couldn't think of anything off the top of my head. In my own pages

    [...]

    Oh, I'm sure I'll have no trouble finding test cases -- I just thought
    that, rather than some random sites that are of no use to anyone, there is
    bound to be somebody out there who actually wanted to scrape a particular
    page in the past, and had not bothered previously thanks to the
    inconvenience of having to read & reproduce the effect of the JS code
    (particularly code that messes about with forms). It would be nice to be
    doing something useful at the same time as writing tests!

    Of course, I already have those sites that gave rise to the 'itch' to do
    this in the first place, but I'm sure there's lots of the browser object
    model that they don't exercise...


    John
     
    John J Lee, Aug 23, 2003
    #4
  5. Cousin Stanley, Aug 25, 2003
    #5
  6. John J. Lee

    John J. Lee Guest

    "Cousin Stanley" <> writes:

    > I'm not sure what types of applications
    > you're looking for,


    The kind that people actually want to use <wink>.

    As I said, there's no problem finding test cases, I just thought that
    while I was about this, somebody might happen be reading who was
    actually trying to scrape a JS page.


    > but I have some JavaScript plots
    > that might be interesting to test ...
    >
    > http://fastq.com/~sckitching/JS/Circle_MH.htm

    [...]

    Konqueror 3.1 didn't show anything, Mozilla 1.4 printed some pretty
    circles, then froze!


    John
     
    John J. Lee, Aug 25, 2003
    #6
  7. John ...

    Although it's been a while since I tested these scripts
    I thought I remembered testing successfully in both
    Mozilla 0.95 and IE 5.1 at the time ...

    I tested this morning using Moz 1.3.1 and 2 out of 3 failed,
    but all 3 worked in IE 6 ...

    The JS used in these scripts, although a bit hackish,
    doesn't use any particular IE magic ...

    I zipped up all 3 scripts for convenience,
    if you want to look at the sources ...

    http://fastq.com/~sckitching/JS/JS_Plots.zip

    Differences in JS/DOM implementations from browser to browser
    hurt my head and seem to be an endless source of problems
    for web developers ...

    --
    Cousin Stanley
    Human Being
    Phoenix, Arizona
     
    Cousin Stanley, Aug 25, 2003
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Charlie
    Replies:
    1
    Views:
    392
    Lauwie
    Aug 22, 2003
  2. Ken
    Replies:
    2
    Views:
    3,978
    news.rcn.com
    Jul 16, 2004
  3. David Jones

    Web Scraping/Site Scraping

    David Jones, Jul 11, 2004, in forum: Python
    Replies:
    4
    Views:
    553
    Andrew Bennetts
    Jul 13, 2004
  4. Chris Smith
    Replies:
    5
    Views:
    315
    Sumit
    Nov 15, 2005
  5. David Mitchell
    Replies:
    10
    Views:
    275
Loading...

Share This Page