screen scraping

Discussion in 'ASP General' started by Roland Hall, Mar 26, 2005.

  1. Roland Hall

    Roland Hall Guest

    Am I correct in assuming screen scraping is just the response text sent to
    the browser? If so, would that mean that this could not be screen scraped?

    function moi() {
    var tag = '<a href=';
    var tagType1 = '"mail'+'to:', tagType2 = '">', tagType3 = '<\/a>';
    var user1 = 'web', user2 = 'master', user3 = '@';
    var dom1 = 'danger', dom2 = 'ous', dom3 = 'ly';
    var tld = '.us';
    document.write(tag+tagType1+user1+user2+user3+dom1+dom2+dom3+tld+tagType2+user1+user2+user3+dom1+dom2+dom3+tld+tagType3);
    }

    --
    Roland Hall
    /* This information is distributed in the hope that it will be useful, but
    without any warranty; without even the implied warranty of merchantability
    or fitness for a particular purpose. */
    Technet Script Center - http://www.microsoft.com/technet/scriptcenter/
    WSH 5.6 Documentation - http://msdn.microsoft.com/downloads/list/webdev.asp
    MSDN Library - http://msdn.microsoft.com/library/default.asp
    Roland Hall, Mar 26, 2005
    #1
    1. Advertising

  2. Roland Hall

    Mark Schupp Guest

    Screen scraping is a technique, not a format. The technique is to intercept
    the raw data (in this case HTML)that would normally be displayed on the
    client system screen and extract data from it. In ASP context screen
    scraping would typically be done by having a server-side component (such as
    xmlhttprequest) perform a get or post to a url and return the raw HTML as
    text. Then a parser of some kind is used to extract the desired information.

    The example you present would be difficult (though not impossible) to
    screen-scrape server-side. The parser would have to be able to evaluate the
    output of the JavaScript function to get the data. I have seen references to
    using the HTML browser component (MSHTML object) to do things like this but
    I don't think it works well server-side.

    --
    Mark Schupp
    Head of Development
    Integrity eLearning
    www.ielearning.com


    "Roland Hall" <nobody@nowhere> wrote in message
    news:...
    > Am I correct in assuming screen scraping is just the response text sent to
    > the browser? If so, would that mean that this could not be screen

    scraped?
    >
    > function moi() {
    > var tag = '<a href=';
    > var tagType1 = '"mail'+'to:', tagType2 = '">', tagType3 = '<\/a>';
    > var user1 = 'web', user2 = 'master', user3 = '@';
    > var dom1 = 'danger', dom2 = 'ous', dom3 = 'ly';
    > var tld = '.us';
    >

    document.write(tag+tagType1+user1+user2+user3+dom1+dom2+dom3+tld+tagType2+us
    er1+user2+user3+dom1+dom2+dom3+tld+tagType3);
    > }
    >
    > --
    > Roland Hall
    > /* This information is distributed in the hope that it will be useful, but
    > without any warranty; without even the implied warranty of merchantability
    > or fitness for a particular purpose. */
    > Technet Script Center - http://www.microsoft.com/technet/scriptcenter/
    > WSH 5.6 Documentation -

    http://msdn.microsoft.com/downloads/list/webdev.asp
    > MSDN Library - http://msdn.microsoft.com/library/default.asp
    >
    >
    Mark Schupp, Mar 28, 2005
    #2
    1. Advertising

  3. Roland Hall

    Guest

    Roland Hall wrote:
    > Am I correct in assuming screen scraping is just the response text

    sent to
    > the browser? If so, would that mean that this could not be screen

    scraped?
    >
    > function moi() {
    > var tag = '<a href=';
    > var tagType1 = '"mail'+'to:', tagType2 = '">', tagType3 = '<\/a>';
    > var user1 = 'web', user2 = 'master', user3 = '@';
    > var dom1 = 'danger', dom2 = 'ous', dom3 = 'ly';
    > var tld = '.us';
    >

    document.write(tag+tagType1+user1+user2+user3+dom1+dom2+dom3+tld+tagType2+user1+user2+user3+dom1+dom2+dom3+tld+tagType3);
    > }


    Anything can be scraped. If you want to hide an email address, put a
    form up and send the email server side so that the email address can
    never be retrieved over HTML.
    , Mar 29, 2005
    #3
  4. Roland Hall

    Roland Hall Guest

    <> wrote in message
    news:...
    :
    : Anything can be scraped. If you want to hide an email address, put a
    : form up and send the email server side so that the email address can
    : never be retrieved over HTML.

    Hi Larry...

    Thanks for responding...

    I understand a form is best but I was looking for a way to defeat the
    javascript. Surely a spammer is not going to capture all scripts and
    process them in hopes of finding a single email address. The goal of a
    spammer is to be lazy and get as much as possible with as little effort as
    possible. There is no benefit to processing every script they spider with
    no guarantee to of finding an email address encoded in it somewhere. I see
    the benefit of finding one in plain sight since 99.99% of them will be that
    way.

    I also shouldn't have said "screen" scraped as it's not really the screen
    memory that's being queried but rather the response text. Javascript
    doesn't show the results, except to the browser. I have not seen a way to
    grab those results although I can think of some possibilities which appear
    to be a lot of effort. I just don't see the ROI but would welcome any info
    on how it is accomplished.

    --
    Roland Hall
    /* This information is distributed in the hope that it will be useful, but
    without any warranty; without even the implied warranty of merchantability
    or fitness for a particular purpose. */
    Technet Script Center - http://www.microsoft.com/technet/scriptcenter/
    WSH 5.6 Documentation - http://msdn.microsoft.com/downloads/list/webdev.asp
    MSDN Library - http://msdn.microsoft.com/library/default.asp
    Roland Hall, Mar 29, 2005
    #4
  5. Roland Hall

    Roland Hall Guest

    "Mark Schupp" wrote in message news:...
    : Screen scraping is a technique, not a format.

    Hi Mark...

    Thanks for responding. I didn't realize I said it was a format and I should
    have said HTML scraping since it's not really screen scraping like it would
    be on a terminal.

    : The technique is to intercept
    : the raw data (in this case HTML)that would normally be displayed on the
    : client system screen and extract data from it. In ASP context screen
    : scraping would typically be done by having a server-side component (such
    as
    : xmlhttprequest) perform a get or post to a url and return the raw HTML as
    : text. Then a parser of some kind is used to extract the desired
    information.

    Yes, I'm familiar with that process.

    : The example you present would be difficult (though not impossible) to
    : screen-scrape server-side. The parser would have to be able to evaluate
    the
    : output of the JavaScript function to get the data. I have seen references
    to
    : using the HTML browser component (MSHTML object) to do things like this
    but
    : I don't think it works well server-side.

    I have not been able to do it either. I think it may require HTML scraping
    the site and then "screen" scraping my page, implying printing it to a text
    file and then reloading and parsing that or capturing it from my screen
    memory, the former being the easier of the two. This would require the
    result look like instead of user at domain dot com. I think
    I'll test the first since so many suggest using encoded javascript to hide
    from spammers.

    --
    Roland Hall
    /* This information is distributed in the hope that it will be useful, but
    without any warranty; without even the implied warranty of merchantability
    or fitness for a particular purpose. */
    Technet Script Center - http://www.microsoft.com/technet/scriptcenter/
    WSH 5.6 Documentation - http://msdn.microsoft.com/downloads/list/webdev.asp
    MSDN Library - http://msdn.microsoft.com/library/default.asp
    Roland Hall, Mar 29, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Robert Martinez

    Screen Scraping C#

    Robert Martinez, Aug 26, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    2,728
    Robert Martinez
    Aug 26, 2003
  2. George Durzi

    HTML Screen Scraping Q

    George Durzi, Feb 25, 2004, in forum: ASP .Net
    Replies:
    2
    Views:
    497
    George Durzi
    Feb 25, 2004
  3. Jim Giblin

    Screen scraping in ASP.NET

    Jim Giblin, Aug 16, 2004, in forum: ASP .Net
    Replies:
    3
    Views:
    794
    Jens Christian Mikkelsen
    Aug 16, 2004
  4. niv

    screen scraping question

    niv, Oct 8, 2004, in forum: ASP .Net
    Replies:
    2
    Views:
    403
    Scott Allen
    Oct 8, 2004
  5. David Jones

    Web Scraping/Site Scraping

    David Jones, Jul 11, 2004, in forum: Python
    Replies:
    4
    Views:
    491
    Andrew Bennetts
    Jul 13, 2004
Loading...

Share This Page