download blocking

H

Helmut Blass

hi,
I have written a VB programm, which automatically downloads web-pages which
are linked to rss-feeds. Unfortunately there are some sites which cannot be
downloaded by program but only viewed online.
I guess there must be some html or javascript trick which blocks the download
process.
does anybody know how this dirty trick works?

thanx for your help, Helmut
 
L

lostinspace

----- Original Message -----
From: "Helmut Blass" <[email protected]>
Newsgroups: alt.html
Sent: Saturday, April 23, 2005 3:57 AM
Subject: download blocking

hi,
I have written a VB programm, which automatically downloads web-pages which
are linked to rss-feeds. Unfortunately there are some sites which cannot be
downloaded by program but only viewed online.
I guess there must be some html or javascript trick which blocks the
download
process.
does anybody know how this dirty trick works?
thanx for your help, Helmut
Frédéric Bastiat


Please help me understand this?
You created a software which crawls and scrapes websites, thereby needlessly
using websites bandwith for your own purposes?

Perhaps even violating UAG's and TOS'.

Then you desire other webmasters to advise you of how to circumvent (hack)
prevention tactics?

PISS-OFF!
 
T

Travis Newbury

No dirty tricks, just some bad vb code on your part. If you can see it
in a browser, you can grab it with VB and inet, and save it to a file.
Please help me understand this?
You created a software which crawls and scrapes websites, thereby needlessly
using websites bandwith for your own purposes?

Or more innocently, they want to read it off line later.
PISS-OFF!

Better to be pissed off, than pissed on....
 
H

Helmut Blass

lostinspace said:
Please help me understand this?
You created a software which crawls and scrapes websites, thereby needlessly
using websites bandwith for your own purposes?

every web-surfer uses bandlwith for his purposes. my program just does
automatically what you are doing manually. is there much difference???

Helmut
 
H

Helmut Blass

No dirty tricks, just some bad vb code on your part. If you can see it
in a browser, you can grab it with VB and inet, and save it to a file.

in most cases it works. however in few cases I can' grab grab it with vb and
inet. so there must be some tricky mechanism...

Helmut
 
L

lostinspace

----- Original Message -----
From: "Helmut Blass" <[email protected]>
Newsgroups: alt.html
Sent: Saturday, April 23, 2005 8:25 AM
Subject: Re: download blocking


every web-surfer uses bandlwith for his purposes. my program just does
automatically what you are doing manually. is there much difference???

Most asuuredly there is a difference and if you incapable of relaizing the
difference, your no different than a thief in the night!

The majority of websites were neither created or intended with this type of
delivery and presentation in mind.
That's why before scraping/downloading you might try reading the websites
UAG/TOS and your own internet providers, as well.
 
L

lostinspace

----- Original Message -----
From: "Travis Newbury" <>
Newsgroups: alt.html
Sent: Saturday, April 23, 2005 7:31 AM
Subject: Re: download blocking

No dirty tricks, just some bad vb code on your part. If you can see it in
a browser, you can grab it with VB and inet, and save it to a file.


Or more innocently, they want to read it off line later.


Better to be pissed off, than pissed on....

"> Or more innocently, they want to read it off line later."

Violation of my sites TOS and will get you (as well as innocents in the same
IP range as your provider) denied access in the future.
 
O

Oli Filth

Helmut said:
hi,
I have written a VB programm, which automatically downloads web-pages which
are linked to rss-feeds. Unfortunately there are some sites which cannot be
downloaded by program but only viewed online.
I guess there must be some html or javascript trick which blocks the download
process.
does anybody know how this dirty trick works?

What are you sending as your User-Agent HTTP header? If you "fake" this
by setting it to that of a standard browser, it might help, as the
server of the site will just assume you're a browser.

(P.S. This is a complete guess, but give it a go :) )
 
A

Andy Dingley

I have written a VB programm, which automatically downloads web-pages which
are linked to rss-feeds. Unfortunately there are some sites which cannot be
downloaded by program but only viewed online.

We can guess, but if you tell us the URLs then we can look at the actual
examples. Also tell us why you can't download them - do you get
anything, the wrong thing, or just a 404 ?

My two gueses:

It's related to the HTTP user-agent string that you're sending. The site
only accepts browsers that it recognises. This is stupid behaviour on
behalf of the site, so stupid that I don't think this is likely. You
should be able to work around it easily by impersonating IE.

Secondly (and more likely) you're probably using the MSXML component
within your VB program. This uses XML and RSS 0.9* isn't an XML
protocol. It looks a lot like XML, but most feeds are either not valid
RSS, or not even well-formed XML. For a "production grade" RSS reader
you can't rely on all feeds being well-formed XML, all the time.


And I don't know waht "lostinspace"s problem is, but he's a clueless
muppet if he doesn't realise what RSS is about.
 
L

lostinspace

----- Original Message -----
From: "Andy Dingley" <>
Newsgroups: alt.html
Sent: Saturday, April 23, 2005 11:49 AM
Subject: Re: download blocking

We can guess, but if you tell us the URLs then we can look at the actual
examples. Also tell us why you can't download them - do you get
anything, the wrong thing, or just a 404 ?

My two gueses:

It's related to the HTTP user-agent string that you're sending. The site
only accepts browsers that it recognises. This is stupid behaviour on
behalf of the site, so stupid that I don't think this is likely. You
should be able to work around it easily by impersonating IE.

Secondly (and more likely) you're probably using the MSXML component
within your VB program. This uses XML and RSS 0.9* isn't an XML
protocol. It looks a lot like XML, but most feeds are either not valid
RSS, or not even well-formed XML. For a "production grade" RSS reader
you can't rely on all feeds being well-formed XML, all the time.


And I don't know waht "lostinspace"s problem is, but he's a clueless
muppet if he doesn't realise what RSS is about.

http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html
http://blogs.law.harvard.edu/tech/rss#whatIsRss
http://www.webreference.com/authoring/languages/xml/rss/intro/

As a webmaster with very unique and copyrighted content (which exists
NOHWHERE else,) I should allow crawling of my sites under the pretense of
offline-use while the material is harvested to either sell to 3rd partys,
present to third parties; outside my websites or have the material
interpretated for any other 3rd party benefit.

Hogwash.

If vialble orgs desire my content, than let them approach me with
compensation and/or permission for the sweat of my brow, otherwise let them
eat 403's.

My sites are unique in these types of materials, however so are many others.
Few issues regarding traffic and visitors as related to websites are cut and
dry or black and white.
Each webmaster must make their own decisions on what is beneficial and
detriemental to their websites and base their websites actions on what they
desire.

One example would be "Helmut" whom would never get into my sites from a DE
IP range or a DE referral search.
Of course he may fake his IP for limited access. That's not the same as a
full-scrape.
WHY?
Their is no possible way for a DE visitor or traffic to enhance or benefit
my websites. They only draw resources and materials, which I have little
time to spend monitoring for plagiarism.
 
T

Travis Newbury

lostinspace said:
"> Or more innocently, they want to read it off line later."
Violation of my sites TOS and will get you (as well as innocents in the same
IP range as your provider) denied access in the future.

So our money is no good. Great business decision there...
 
L

lostinspace

----- Original Message -----
From: "Travis Newbury" <>
Newsgroups: alt.html
Sent: Saturday, April 23, 2005 12:47 PM
Subject: Re: download blocking

So our money is no good. Great business decision there...

Webmasters only have two options for dealing with violations of UAG/TOS

1) litigation
2) denial of service

The easiest and quickest solution is denial of service.
In many intances these is possible based on UA or "referrer", however in
many instances an IP range is necessary.

Were there establsihed protocol prcoedures by internet providers for
enforcment of their own UAG's that their customers violate, then these
aforementioned limitations would not be necessary.

Lately many internet providers are breaking up previously large IP ranges
into smaller more localized multiple ranges making the denial of many
innocents less likely.

BTW, my content is a specific breed of horses and your only interest in such
critters is likley in the feasting of ;-)))))
 
T

Travis Newbury

T

Toby Inkster

Helmut said:
in most cases it works. however in few cases I can' grab grab it with vb and
inet. so there must be some tricky mechanism...

Can you give us an example URL for such a page?
 
L

lostinspace

----- Original Message -----
From: "Travis Newbury" <>
Newsgroups: alt.html
Sent: Saturday, April 23, 2005 12:56 PM
Subject: Re: download blocking

I think you are right...


Ah... Your "proof" that you knows what rss is, are also the first 3 links
when you google "what is rss" Coincidence?


Can we see this totally unique rss feed?

Never said my sites were rss!

I've provided links to my websites in these forums long ago. The only result
is visitors which are not interested in my content, rather becoming pests.
Still get referrals from the google archives for threads that are more than
five years old.

The original mail in this thread is repeated below.
Please note the subject line on this thread?
There is mention of RSS in the inquiry, however the brunt of the inquiry is
related to "the how of circumventing blocked downloads" also mentions that
webmasters which practice such things are practicing dirty and unscrupulous
tricks.

When in fact, it's the downloader who is atrocious.


----- Original Message -----
From: "Helmut Blass" <[email protected]>
Newsgroups: alt.html
Sent: Saturday, April 23, 2005 3:57 AM
Subject: download blocking
 
A

Andy Dingley

As a webmaster with very unique and copyrighted content (which exists
NOHWHERE else,) I should allow crawling of my sites

Yes you should. Or else you should _prevent_ it by technical means.
Whining about people stealing it when you've got it online and hanging
out in the breeze is just pathetic.
under the pretense of
offline-use while the material is harvested to either sell to 3rd partys,
present to third parties; outside my websites

If I can read it, I can steal it. Get over it.

Or else get a LiveJournal, the perfect soapbox for teenage angst.
or have the material
interpretated for any other 3rd party benefit.

"Interpretated" ? Are you channelling George Bush ?

My sites are unique in these types of materials, however so are many others.

What part of "unique" is confusing you here ?

Now I know your patterns for the Ultimate Tinfoil Hat are very important
to you, but quite honestly the rest of the world doesn't actually _want_
them. If we really wanted your content, we'd grab a .torrent of it.

But what does this have to do with RSS anyway ? Is the concept of
syndication entirely alien to you ? It's about _publishing_, you know,
that stuff about _distributing_ data that the web was built for doing ?

There _are_ CUG RSS feeds, but even those ought to reject unauthorised
access with a reasonable error (and 401 is more appropriate than 403),
not just a technical glitch.
 
T

Travis Newbury

lostinspace said:
The original mail in this thread is repeated below.
Please note the subject line on this thread?
There is mention of RSS in the inquiry, however the brunt of the inquiry is
related to "the how of circumventing blocked downloads" also mentions that
webmasters which practice such things are practicing dirty and unscrupulous
tricks.

What ever, the OP's message was obviously about rss feeds. But it
doesn't matter anyway, lets all just walk away from this thread as
friends. I think the OP's issue has been addressed.

EVERYONE SING....

we are the world......
we are the children....

Hey, I'm not hearing you....
 
L

lostinspace

"EVERYONE SING....

we are the world......
we are the children....

Hey, I'm not hearing you...."

DITTO :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top