security quirk

R

RichD

I read Wall Street Journal, and occasionally check
articles on their Web site. It's mostly free, with some items
available to subscribers only. It seems random, which ones
they block, about 20%.

Anywho, sometimes I use their search utility, the usual author
or title search, and it blocks, then I look it up on Google, and
link from there, and it loads! ok, Web gurus, what's going on?
 
R

Roedy Green

Anywho, sometimes I use their search utility, the usual author
or title search, and it blocks, then I look it up on Google, and
link from there, and it loads! ok, Web gurus, what's going on?

This is not Java, but one way this could happen is Google buys or gets
a free subscription to the WSJ. That enables them to spider and index
it.

The WSJ designed their security system around their own search engine
refusing to find pages, not on refusing the serve them once the URL is
known. I have a dim view of the WSJ for reasons unrelated to the
competence of their programmers.
--
Roedy Green Canadian Mind Products http://mindprod.com
The first 90% of the code accounts for the first 90% of the development time.
The remaining 10% of the code accounts for the other 90% of the development
time.
~ Tom Cargill Ninety-ninety Law
 
M

Martin Musatov

I read Wall Street Journal, and occasionally check<NotepadPlus>
<UserLang name="MUSATOV" ext=".myl" udlVersion="2.0">
<Settings>
<Global caseIgnored="no" allowFoldOfComments="no"
forceLineCommentsAtBOL="no" foldCompact="yes" />
<Prefix Keywords1="no" Keywords2="no" Keywords3="no"
Keywords4="no" Keywords5="no" Keywords6="no" Keywords7="no"
Keywords8="no" />
</Settings>
<KeywordLists>
<Keywords name="Comments" id="0">00commentBegin 01comment
02commentEnd 03 04</Keywords>
<Keywords name="Numbers, additional" id="1"></Keywords>
<Keywords name="Numbers, prefixes" id="2"></Keywords>
<Keywords name="Numbers, extras with prefixes" id="3"></
Keywords>
<Keywords name="Numbers, suffixes" id="4"></Keywords>
<Keywords name="Operators1" id="5">();</Keywords>
<Keywords name="Operators2" id="6"></Keywords>
<Keywords name="Folders in code1, open" id="7">Open</
Keywords>
<Keywords name="Folders in code1, middle" id="8">middle</
Keywords>
<Keywords name="Folders in code1, close" id="9">Close</
Keywords>
<Keywords name="Folders in code2, open" id="10">Open</
Keywords>
<Keywords name="Folders in code2, middle" id="11">middle</
Keywords>
<Keywords name="Folders in code2, close" id="12">Close</
Keywords>
<Keywords name="Folders in comment, open" id="13">Open</
Keywords>
<Keywords name="Folders in comment, middle"
id="14">middle</Keywords>
<Keywords name="Folders in comment, close" id="15">Close</
Keywords>
<Keywords name="Keywords1" id="16">%%</Keywords>
<Keywords name="Keywords2" id="17"></Keywords>
<Keywords name="Keywords3" id="18"></Keywords>
<Keywords name="Keywords4" id="19"></Keywords>
<Keywords name="Keywords5" id="20"></Keywords>
<Keywords name="Keywords6" id="21"></Keywords>
<Keywords name="Keywords7" id="22"></Keywords>
<Keywords name="Keywords8" id="23"></Keywords>
<Keywords name="Delimiters" id="24"></Keywords>
</KeywordLists>
<Styles>
<WordsStyle name="DEFAULT" styleID="0" fgColor="FFFFFF"
bgColor="000000" fontName="Monotype Corsiva" fontStyle="7"
fontSize="14" nesting="0" />
<WordsStyle name="COMMENTS" styleID="1" fgColor="000000"
bgColor="FFFFFF" fontStyle="0" nesting="0" />
<WordsStyle name="LINE COMMENTS" styleID="2"
fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
<WordsStyle name="NUMBERS" styleID="3" fgColor="000000"
bgColor="FFFFFF" fontStyle="0" nesting="0" />
<WordsStyle name="KEYWORDS1" styleID="4" fgColor="000000"
bgColor="FFFFFF" fontStyle="0" nesting="0" />
<WordsStyle name="KEYWORDS2" styleID="5" fgColor="000000"
bgColor="FFFFFF" fontStyle="0" nesting="0" />
<WordsStyle name="KEYWORDS3" styleID="6" fgColor="000000"
bgColor="FFFFFF" fontStyle="0" nesting="0" />
<WordsStyle name="KEYWORDS4" styleID="7" fgColor="000000"
bgColor="FFFFFF" fontStyle="0" nesting="0" />
<WordsStyle name="KEYWORDS5" styleID="8" fgColor="000000"
bgColor="FFFFFF" fontStyle="0" nesting="0" />
<WordsStyle name="KEYWORDS6" styleID="9" fgColor="000000"
bgColor="FFFFFF" fontStyle="0" nesting="0" />
<WordsStyle name="KEYWORDS7" styleID="10" fgColor="000000"
bgColor="FFFFFF" fontStyle="0" nesting="0" />
<WordsStyle name="KEYWORDS8" styleID="11" fgColor="000000"
bgColor="FFFFFF" fontStyle="0" nesting="0" />
<WordsStyle name="OPERATORS" styleID="12" fgColor="000000"
bgColor="FFFFFF" fontStyle="0" nesting="0" />
<WordsStyle name="FOLDER IN CODE1" styleID="13"
fgColor="FFFFFF" bgColor="000000" fontName="" fontStyle="7"
fontSize="10" nesting="0" />
<WordsStyle name="FOLDER IN CODE2" styleID="14"
fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
<WordsStyle name="FOLDER IN COMMENT" styleID="15"
fgColor="FFFFFF" bgColor="000000" fontName="Times New Roman"
fontStyle="7" fontSize="8" nesting="0" />
<WordsStyle name="DELIMITERS1" styleID="16"
fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
<WordsStyle name="DELIMITERS2" styleID="17"
fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
<WordsStyle name="DELIMITERS3" styleID="18"
fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
<WordsStyle name="DELIMITERS4" styleID="19"
fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
<WordsStyle name="DELIMITERS5" styleID="20"
fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
<WordsStyle name="DELIMITERS6" styleID="21"
fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
<WordsStyle name="DELIMITERS7" styleID="22"
fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
<WordsStyle name="DELIMITERS8" styleID="23"
fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
</Styles>
</UserLang>
 
G

Gandalf Parker

Web gurus, what's going on?

That is the fault of the site itself.
If they are going to block access to users then they should also block
access to the automated spiders that hit the site to collect data.
 
R

RichD

That is the fault of the site itself.
If they are going to block access to users then they should also block
access to the automated spiders that hit the site to collect data.

well yeah, but what's going on, under the hood?
How does it get confused? How could this
happen? I'm looking for some insight, regarding a
hypothetical programmimg glitch -
 
A

Auric__

Martin said:
I read Wall Street Journal, and occasionally check<NotepadPlus>
<UserLang name="MUSATOV" ext=".myl" udlVersion="2.0"> [snip]
</UserLang>
</NotepadPlus>

Ignoring the big ol' unneccessary crosspost... What the ****?
 
A

alex23

well yeah, but what's going on, under the hood?
How does it get confused?  How could this
happen?  I'm looking for some insight, regarding a
hypothetical programmimg glitch -

As has been stated, this has nothing to do with Python, so please stop
posting your questions here.

However, here's an answer to get you to stop repeating yourself: it's
not uncommon to find that content you're restricted from accessing via
a site's own search is available to you through Google. This has to do
with Google's policy of _requiring_ that pages that it is allowed to
index _must_ be available for view. Any site that allows Google to
index its pages that then blocks you from viewing them will swiftly
find themselves web site-a non gratis in Google search. As most
websites are attention whores, they'll do anything to ensure they
remain within Google's indices.
 
A

Arne Vajhøj

I read Wall Street Journal, and occasionally check
articles on their Web site. It's mostly free, with some items
available to subscribers only. It seems random, which ones
they block, about 20%.

Anywho, sometimes I use their search utility, the usual author
or title search, and it blocks, then I look it up on Google, and
link from there, and it loads! ok, Web gurus, what's going on?

WSJ want their articles to be findable from Google.

So they open up for Google indexing them.

If they require any type of registration to see an article,
then Google will remove the link.

So therefore WSJ (and many other web sites!) gives more access
if you come from Google than if not.

Arne
 
R

Roedy Green

well yeah, but what's going on, under the hood?
How does it get confused? How could this
happen? I'm looking for some insight, regarding a
hypothetical programmimg glitch -
Monitor the responses in all newsgroups you post to.
--
Roedy Green Canadian Mind Products http://mindprod.com
The first 90% of the code accounts for the first 90% of the development time.
The remaining 10% of the code accounts for the other 90% of the development
time.
~ Tom Cargill Ninety-ninety Law
 
G

Gandalf Parker

well yeah, but what's going on, under the hood?
How does it get confused? How could this
happen? I'm looking for some insight, regarding a
hypothetical programmimg glitch -

(from alt.hacker)

You dont understand. It is not in the code. It is in the site.
It is as if someone comes and picks fruit off of your tree, and you are
questioning the tree for how it bears fruit.

The site creates web pages.
Google collects web pages.
The site needs to set things like robot.txt to tell Google to NOT collect
the pages in the archives. Which is not an absolute protection but at least
its an effort that works for most sites.
 
R

Roedy Green

The site creates web pages.
Google collects web pages.
The site needs to set things like robot.txt to tell Google to NOT collect
the pages in the archives. Which is not an absolute protection but at least
its an effort that works for most sites.

To the site, Google is just a voracious reader. If they block readers
from hoovering up content, that automatically stops Google.

The site owners wanted Google to spider the site, bring in customers,
then hit them with a fee. They forgot that anyone coming in directly
via Google's links would bypass their own search engine.
--
Roedy Green Canadian Mind Products http://mindprod.com
The first 90% of the code accounts for the first 90% of the development time.
The remaining 10% of the code accounts for the other 90% of the development
time.
~ Tom Cargill Ninety-ninety Law
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top