download blocking

Discussion in 'HTML' started by Helmut Blass, Apr 23, 2005.

  1. Helmut Blass

    Helmut Blass Guest

    hi,
    I have written a VB programm, which automatically downloads web-pages which
    are linked to rss-feeds. Unfortunately there are some sites which cannot be
    downloaded by program but only viewed online.
    I guess there must be some html or javascript trick which blocks the download
    process.
    does anybody know how this dirty trick works?

    thanx for your help, Helmut

    --
    Der Staat ist die große Fiktion, nach der sich jedermann bemüht, auf Kosten
    jedermanns zu leben.

    Frédéric Bastiat
    Helmut Blass, Apr 23, 2005
    #1
    1. Advertising

  2. Helmut Blass

    lostinspace Guest

    ----- Original Message -----
    From: "Helmut Blass" <>
    Newsgroups: alt.html
    Sent: Saturday, April 23, 2005 3:57 AM
    Subject: download blocking


    >hi,
    >I have written a VB programm, which automatically downloads web-pages which
    >are linked to rss-feeds. Unfortunately there are some sites which cannot be
    >downloaded by program but only viewed online.
    >I guess there must be some html or javascript trick which blocks the
    >download
    >process.
    >does anybody know how this dirty trick works?


    >thanx for your help, Helmut


    >--
    >Der Staat ist die große Fiktion, nach der sich jedermann bemüht, auf Kosten
    >jedermanns zu leben.


    >Frédéric Bastiat



    Please help me understand this?
    You created a software which crawls and scrapes websites, thereby needlessly
    using websites bandwith for your own purposes?

    Perhaps even violating UAG's and TOS'.

    Then you desire other webmasters to advise you of how to circumvent (hack)
    prevention tactics?

    PISS-OFF!
    lostinspace, Apr 23, 2005
    #2
    1. Advertising

  3. lostinspace wrote:

    >>I have written a VB programm, which automatically downloads web-pages which
    >>are linked to rss-feeds. Unfortunately there are some sites which cannot be
    >>downloaded by program but only viewed online.
    >>I guess there must be some html or javascript trick which blocks the
    >>download
    >>process.
    >>does anybody know how this dirty trick works?


    No dirty tricks, just some bad vb code on your part. If you can see it
    in a browser, you can grab it with VB and inet, and save it to a file.

    > Please help me understand this?
    > You created a software which crawls and scrapes websites, thereby needlessly
    > using websites bandwith for your own purposes?


    Or more innocently, they want to read it off line later.

    > PISS-OFF!


    Better to be pissed off, than pissed on....

    --
    -=tn=-
    Travis Newbury, Apr 23, 2005
    #3
  4. Helmut Blass

    Helmut Blass Guest

    "lostinspace" <> wrote:

    >Please help me understand this?
    >You created a software which crawls and scrapes websites, thereby needlessly
    >using websites bandwith for your own purposes?


    every web-surfer uses bandlwith for his purposes. my program just does
    automatically what you are doing manually. is there much difference???

    Helmut

    --
    Der Staat ist die große Fiktion, nach der sich jedermann bemüht, auf Kosten
    jedermanns zu leben.

    Frédéric Bastiat
    Helmut Blass, Apr 23, 2005
    #4
  5. Helmut Blass

    Helmut Blass Guest

    In article <3mqae.5338$>, Travis Newbury <> wrote:

    >No dirty tricks, just some bad vb code on your part. If you can see it
    >in a browser, you can grab it with VB and inet, and save it to a file.


    in most cases it works. however in few cases I can' grab grab it with vb and
    inet. so there must be some tricky mechanism...

    Helmut
    Helmut Blass, Apr 23, 2005
    #5
  6. Helmut Blass

    lostinspace Guest

    ----- Original Message -----
    From: "Helmut Blass" <>
    Newsgroups: alt.html
    Sent: Saturday, April 23, 2005 8:25 AM
    Subject: Re: download blocking


    "lostinspace" <> wrote:

    >>Please help me understand this?
    >>You created a software which crawls and scrapes websites, thereby
    >>needlessly
    >>using websites bandwith for your own purposes?


    >every web-surfer uses bandlwith for his purposes. my program just does
    >automatically what you are doing manually. is there much difference???


    >Helmut


    Most asuuredly there is a difference and if you incapable of relaizing the
    difference, your no different than a thief in the night!

    The majority of websites were neither created or intended with this type of
    delivery and presentation in mind.
    That's why before scraping/downloading you might try reading the websites
    UAG/TOS and your own internet providers, as well.
    lostinspace, Apr 23, 2005
    #6
  7. Helmut Blass

    lostinspace Guest

    ----- Original Message -----
    From: "Travis Newbury" <>
    Newsgroups: alt.html
    Sent: Saturday, April 23, 2005 7:31 AM
    Subject: Re: download blocking


    > lostinspace wrote:
    >
    >>>I have written a VB programm, which automatically downloads web-pages
    >>>which
    >>>are linked to rss-feeds. Unfortunately there are some sites which cannot
    >>>be
    >>>downloaded by program but only viewed online.
    >>>I guess there must be some html or javascript trick which blocks the
    >>>download
    >>>process.
    >>>does anybody know how this dirty trick works?

    >
    > No dirty tricks, just some bad vb code on your part. If you can see it in
    > a browser, you can grab it with VB and inet, and save it to a file.
    >
    >> Please help me understand this?
    >> You created a software which crawls and scrapes websites, thereby
    >> needlessly using websites bandwith for your own purposes?

    >
    > Or more innocently, they want to read it off line later.
    >
    >> PISS-OFF!

    >
    > Better to be pissed off, than pissed on....
    >
    > --
    > -=tn=-


    "> Or more innocently, they want to read it off line later."

    Violation of my sites TOS and will get you (as well as innocents in the same
    IP range as your provider) denied access in the future.
    lostinspace, Apr 23, 2005
    #7
  8. Helmut Blass

    Oli Filth Guest

    Helmut Blass wrote:
    > hi,
    > I have written a VB programm, which automatically downloads web-pages which
    > are linked to rss-feeds. Unfortunately there are some sites which cannot be
    > downloaded by program but only viewed online.
    > I guess there must be some html or javascript trick which blocks the download
    > process.
    > does anybody know how this dirty trick works?
    >


    What are you sending as your User-Agent HTTP header? If you "fake" this
    by setting it to that of a standard browser, it might help, as the
    server of the site will just assume you're a browser.

    (P.S. This is a complete guess, but give it a go :) )


    --
    Oli
    Oli Filth, Apr 23, 2005
    #8
  9. Helmut Blass

    Andy Dingley Guest

    On Sat, 23 Apr 2005 07:57:55 GMT, (Helmut Blass)
    wrote:
    >I have written a VB programm, which automatically downloads web-pages which
    >are linked to rss-feeds. Unfortunately there are some sites which cannot be
    >downloaded by program but only viewed online.


    We can guess, but if you tell us the URLs then we can look at the actual
    examples. Also tell us why you can't download them - do you get
    anything, the wrong thing, or just a 404 ?

    My two gueses:

    It's related to the HTTP user-agent string that you're sending. The site
    only accepts browsers that it recognises. This is stupid behaviour on
    behalf of the site, so stupid that I don't think this is likely. You
    should be able to work around it easily by impersonating IE.

    Secondly (and more likely) you're probably using the MSXML component
    within your VB program. This uses XML and RSS 0.9* isn't an XML
    protocol. It looks a lot like XML, but most feeds are either not valid
    RSS, or not even well-formed XML. For a "production grade" RSS reader
    you can't rely on all feeds being well-formed XML, all the time.


    And I don't know waht "lostinspace"s problem is, but he's a clueless
    muppet if he doesn't realise what RSS is about.
    Andy Dingley, Apr 23, 2005
    #9
  10. Helmut Blass

    lostinspace Guest

    ----- Original Message -----
    From: "Andy Dingley" <>
    Newsgroups: alt.html
    Sent: Saturday, April 23, 2005 11:49 AM
    Subject: Re: download blocking


    > On Sat, 23 Apr 2005 07:57:55 GMT, (Helmut Blass)
    > wrote:
    >>I have written a VB programm, which automatically downloads web-pages
    >>which
    >>are linked to rss-feeds. Unfortunately there are some sites which cannot
    >>be
    >>downloaded by program but only viewed online.

    >
    > We can guess, but if you tell us the URLs then we can look at the actual
    > examples. Also tell us why you can't download them - do you get
    > anything, the wrong thing, or just a 404 ?
    >
    > My two gueses:
    >
    > It's related to the HTTP user-agent string that you're sending. The site
    > only accepts browsers that it recognises. This is stupid behaviour on
    > behalf of the site, so stupid that I don't think this is likely. You
    > should be able to work around it easily by impersonating IE.
    >
    > Secondly (and more likely) you're probably using the MSXML component
    > within your VB program. This uses XML and RSS 0.9* isn't an XML
    > protocol. It looks a lot like XML, but most feeds are either not valid
    > RSS, or not even well-formed XML. For a "production grade" RSS reader
    > you can't rely on all feeds being well-formed XML, all the time.
    >
    >
    > And I don't know waht "lostinspace"s problem is, but he's a clueless
    > muppet if he doesn't realise what RSS is about.
    >


    http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html
    http://blogs.law.harvard.edu/tech/rss#whatIsRss
    http://www.webreference.com/authoring/languages/xml/rss/intro/

    As a webmaster with very unique and copyrighted content (which exists
    NOHWHERE else,) I should allow crawling of my sites under the pretense of
    offline-use while the material is harvested to either sell to 3rd partys,
    present to third parties; outside my websites or have the material
    interpretated for any other 3rd party benefit.

    Hogwash.

    If vialble orgs desire my content, than let them approach me with
    compensation and/or permission for the sweat of my brow, otherwise let them
    eat 403's.

    My sites are unique in these types of materials, however so are many others.
    Few issues regarding traffic and visitors as related to websites are cut and
    dry or black and white.
    Each webmaster must make their own decisions on what is beneficial and
    detriemental to their websites and base their websites actions on what they
    desire.

    One example would be "Helmut" whom would never get into my sites from a DE
    IP range or a DE referral search.
    Of course he may fake his IP for limited access. That's not the same as a
    full-scrape.
    WHY?
    Their is no possible way for a DE visitor or traffic to enhance or benefit
    my websites. They only draw resources and materials, which I have little
    time to spend monitoring for plagiarism.
    lostinspace, Apr 23, 2005
    #10
  11. lostinspace wrote:

    > "> Or more innocently, they want to read it off line later."
    > Violation of my sites TOS and will get you (as well as innocents in the same
    > IP range as your provider) denied access in the future.


    So our money is no good. Great business decision there...


    --
    -=tn=-
    Travis Newbury, Apr 23, 2005
    #11
  12. Helmut Blass

    lostinspace Guest

    ----- Original Message -----
    From: "Travis Newbury" <>
    Newsgroups: alt.html
    Sent: Saturday, April 23, 2005 12:47 PM
    Subject: Re: download blocking


    > lostinspace wrote:
    >
    >> "> Or more innocently, they want to read it off line later."
    >> Violation of my sites TOS and will get you (as well as innocents in the
    >> same IP range as your provider) denied access in the future.

    >
    > So our money is no good. Great business decision there...
    >
    >
    > --
    > -=tn=-


    Webmasters only have two options for dealing with violations of UAG/TOS

    1) litigation
    2) denial of service

    The easiest and quickest solution is denial of service.
    In many intances these is possible based on UA or "referrer", however in
    many instances an IP range is necessary.

    Were there establsihed protocol prcoedures by internet providers for
    enforcment of their own UAG's that their customers violate, then these
    aforementioned limitations would not be necessary.

    Lately many internet providers are breaking up previously large IP ranges
    into smaller more localized multiple ranges making the denial of many
    innocents less likely.

    BTW, my content is a specific breed of horses and your only interest in such
    critters is likley in the feasting of ;-)))))
    lostinspace, Apr 23, 2005
    #12
  13. lostinspace wrote:
    >>And I don't know waht "lostinspace"s problem is, but he's a clueless
    >>muppet if he doesn't realise what RSS is about.


    I think you are right...

    > http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html
    > http://blogs.law.harvard.edu/tech/rss#whatIsRss
    > http://www.webreference.com/authoring/languages/xml/rss/intro/


    Ah... Your "proof" that you knows what rss is, are also the first 3
    links when you google "what is rss" Coincidence?

    > As a webmaster with very unique and copyrighted content (which exists
    > NOHWHERE else,)


    Can we see this totally unique rss feed?

    --
    -=tn=-
    Travis Newbury, Apr 23, 2005
    #13
  14. Helmut Blass

    Toby Inkster Guest

    Helmut Blass wrote:

    > in most cases it works. however in few cases I can' grab grab it with vb and
    > inet. so there must be some tricky mechanism...


    Can you give us an example URL for such a page?

    --
    Toby A Inkster BSc (Hons) ARCS
    Contact Me ~ http://tobyinkster.co.uk/contact
    Toby Inkster, Apr 23, 2005
    #14
  15. Helmut Blass

    lostinspace Guest

    ----- Original Message -----
    From: "Travis Newbury" <>
    Newsgroups: alt.html
    Sent: Saturday, April 23, 2005 12:56 PM
    Subject: Re: download blocking


    > lostinspace wrote:
    >>>And I don't know waht "lostinspace"s problem is, but he's a clueless
    >>>muppet if he doesn't realise what RSS is about.

    >
    > I think you are right...
    >
    >> http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html
    >> http://blogs.law.harvard.edu/tech/rss#whatIsRss
    >> http://www.webreference.com/authoring/languages/xml/rss/intro/

    >
    > Ah... Your "proof" that you knows what rss is, are also the first 3 links
    > when you google "what is rss" Coincidence?
    >
    >> As a webmaster with very unique and copyrighted content (which exists
    >> NOHWHERE else,)

    >
    > Can we see this totally unique rss feed?
    >
    > --
    > -=tn=-


    Never said my sites were rss!

    I've provided links to my websites in these forums long ago. The only result
    is visitors which are not interested in my content, rather becoming pests.
    Still get referrals from the google archives for threads that are more than
    five years old.

    The original mail in this thread is repeated below.
    Please note the subject line on this thread?
    There is mention of RSS in the inquiry, however the brunt of the inquiry is
    related to "the how of circumventing blocked downloads" also mentions that
    webmasters which practice such things are practicing dirty and unscrupulous
    tricks.

    When in fact, it's the downloader who is atrocious.


    ----- Original Message -----
    From: "Helmut Blass" <>
    Newsgroups: alt.html
    Sent: Saturday, April 23, 2005 3:57 AM
    Subject: download blocking


    >hi,
    >I have written a VB programm, which automatically downloads web-pages which
    >are linked to rss-feeds. Unfortunately there are some sites which cannot be
    >downloaded by program but only viewed online.
    >I guess there must be some html or javascript trick which blocks the
    >download process. does anybody know how this dirty trick works?


    >thanx for your help, Helmut
    lostinspace, Apr 24, 2005
    #15
  16. Helmut Blass

    Andy Dingley Guest

    On Sat, 23 Apr 2005 16:27:48 GMT, "lostinspace"
    <> wrote:

    >As a webmaster with very unique and copyrighted content (which exists
    >NOHWHERE else,) I should allow crawling of my sites


    Yes you should. Or else you should _prevent_ it by technical means.
    Whining about people stealing it when you've got it online and hanging
    out in the breeze is just pathetic.

    >under the pretense of
    >offline-use while the material is harvested to either sell to 3rd partys,
    >present to third parties; outside my websites


    If I can read it, I can steal it. Get over it.

    Or else get a LiveJournal, the perfect soapbox for teenage angst.

    >or have the material
    >interpretated for any other 3rd party benefit.


    "Interpretated" ? Are you channelling George Bush ?


    >My sites are unique in these types of materials, however so are many others.


    What part of "unique" is confusing you here ?

    Now I know your patterns for the Ultimate Tinfoil Hat are very important
    to you, but quite honestly the rest of the world doesn't actually _want_
    them. If we really wanted your content, we'd grab a .torrent of it.

    But what does this have to do with RSS anyway ? Is the concept of
    syndication entirely alien to you ? It's about _publishing_, you know,
    that stuff about _distributing_ data that the web was built for doing ?

    There _are_ CUG RSS feeds, but even those ought to reject unauthorised
    access with a reasonable error (and 401 is more appropriate than 403),
    not just a technical glitch.
    Andy Dingley, Apr 24, 2005
    #16
  17. lostinspace wrote:
    > The original mail in this thread is repeated below.
    > Please note the subject line on this thread?
    > There is mention of RSS in the inquiry, however the brunt of the inquiry is
    > related to "the how of circumventing blocked downloads" also mentions that
    > webmasters which practice such things are practicing dirty and unscrupulous
    > tricks.


    What ever, the OP's message was obviously about rss feeds. But it
    doesn't matter anyway, lets all just walk away from this thread as
    friends. I think the OP's issue has been addressed.

    EVERYONE SING....

    we are the world......
    we are the children....

    Hey, I'm not hearing you....

    --
    -=tn=-
    Travis Newbury, Apr 24, 2005
    #17
  18. Helmut Blass

    lostinspace Guest

    "EVERYONE SING....

    we are the world......
    we are the children....

    Hey, I'm not hearing you...."

    DITTO :)
    lostinspace, Apr 24, 2005
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Hendra Gunawan
    Replies:
    1
    Views:
    12,346
    Allan Herriman
    Apr 8, 2004
  2. Andre Kelmanson

    blocking i/o vs. non blocking i/o (performance)

    Andre Kelmanson, Oct 10, 2003, in forum: C Programming
    Replies:
    3
    Views:
    899
    Valentin Tihomirov
    Oct 12, 2003
  3. nukleus
    Replies:
    14
    Views:
    803
    Chris Uppal
    Jan 22, 2007
  4. Christian
    Replies:
    5
    Views:
    714
    Esmond Pitt
    Dec 2, 2007
  5. Serge Savoie
    Replies:
    4
    Views:
    244
    Serge Savoie
    Oct 1, 2008
Loading...

Share This Page