----- Original Message -----
From: "Andy Dingley" <>
Newsgroups: alt.html
Sent: Saturday, April 23, 2005 11:49 AM
Subject: Re: download blocking
We can guess, but if you tell us the URLs then we can look at the actual
examples. Also tell us why you can't download them - do you get
anything, the wrong thing, or just a 404 ?
My two gueses:
It's related to the HTTP user-agent string that you're sending. The site
only accepts browsers that it recognises. This is stupid behaviour on
behalf of the site, so stupid that I don't think this is likely. You
should be able to work around it easily by impersonating IE.
Secondly (and more likely) you're probably using the MSXML component
within your VB program. This uses XML and RSS 0.9* isn't an XML
protocol. It looks a lot like XML, but most feeds are either not valid
RSS, or not even well-formed XML. For a "production grade" RSS reader
you can't rely on all feeds being well-formed XML, all the time.
And I don't know waht "lostinspace"s problem is, but he's a clueless
muppet if he doesn't realise what RSS is about.
http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html
http://blogs.law.harvard.edu/tech/rss#whatIsRss
http://www.webreference.com/authoring/languages/xml/rss/intro/
As a webmaster with very unique and copyrighted content (which exists
NOHWHERE else,) I should allow crawling of my sites under the pretense of
offline-use while the material is harvested to either sell to 3rd partys,
present to third parties; outside my websites or have the material
interpretated for any other 3rd party benefit.
Hogwash.
If vialble orgs desire my content, than let them approach me with
compensation and/or permission for the sweat of my brow, otherwise let them
eat 403's.
My sites are unique in these types of materials, however so are many others.
Few issues regarding traffic and visitors as related to websites are cut and
dry or black and white.
Each webmaster must make their own decisions on what is beneficial and
detriemental to their websites and base their websites actions on what they
desire.
One example would be "Helmut" whom would never get into my sites from a DE
IP range or a DE referral search.
Of course he may fake his IP for limited access. That's not the same as a
full-scrape.
WHY?
Their is no possible way for a DE visitor or traffic to enhance or benefit
my websites. They only draw resources and materials, which I have little
time to spend monitoring for plagiarism.