security quirk

Discussion in 'Java' started by RichD, Jan 30, 2013.

  1. RichD

    RichD Guest

    I read Wall Street Journal, and occasionally check
    articles on their Web site. It's mostly free, with some items
    available to subscribers only. It seems random, which ones
    they block, about 20%.

    Anywho, sometimes I use their search utility, the usual author
    or title search, and it blocks, then I look it up on Google, and
    link from there, and it loads! ok, Web gurus, what's going on?


    --
    Rich
     
    RichD, Jan 30, 2013
    #1
    1. Advertising

  2. RichD

    Roedy Green Guest

    On Tue, 29 Jan 2013 20:55:44 -0800 (PST), RichD
    <> wrote, quoted or indirectly quoted someone
    who said :

    >Anywho, sometimes I use their search utility, the usual author
    >or title search, and it blocks, then I look it up on Google, and
    >link from there, and it loads! ok, Web gurus, what's going on?


    This is not Java, but one way this could happen is Google buys or gets
    a free subscription to the WSJ. That enables them to spider and index
    it.

    The WSJ designed their security system around their own search engine
    refusing to find pages, not on refusing the serve them once the URL is
    known. I have a dim view of the WSJ for reasons unrelated to the
    competence of their programmers.
    --
    Roedy Green Canadian Mind Products http://mindprod.com
    The first 90% of the code accounts for the first 90% of the development time.
    The remaining 10% of the code accounts for the other 90% of the development
    time.
    ~ Tom Cargill Ninety-ninety Law
     
    Roedy Green, Jan 30, 2013
    #2
    1. Advertising

  3. On Jan 29, 8:55 pm, RichD <> wrote:
    > I read Wall Street Journal, and occasionally check<NotepadPlus>

    <UserLang name="MUSATOV" ext=".myl" udlVersion="2.0">
    <Settings>
    <Global caseIgnored="no" allowFoldOfComments="no"
    forceLineCommentsAtBOL="no" foldCompact="yes" />
    <Prefix Keywords1="no" Keywords2="no" Keywords3="no"
    Keywords4="no" Keywords5="no" Keywords6="no" Keywords7="no"
    Keywords8="no" />
    </Settings>
    <KeywordLists>
    <Keywords name="Comments" id="0">00commentBegin 01comment
    02commentEnd 03 04</Keywords>
    <Keywords name="Numbers, additional" id="1"></Keywords>
    <Keywords name="Numbers, prefixes" id="2"></Keywords>
    <Keywords name="Numbers, extras with prefixes" id="3"></
    Keywords>
    <Keywords name="Numbers, suffixes" id="4"></Keywords>
    <Keywords name="Operators1" id="5">();</Keywords>
    <Keywords name="Operators2" id="6"></Keywords>
    <Keywords name="Folders in code1, open" id="7">Open</
    Keywords>
    <Keywords name="Folders in code1, middle" id="8">middle</
    Keywords>
    <Keywords name="Folders in code1, close" id="9">Close</
    Keywords>
    <Keywords name="Folders in code2, open" id="10">Open</
    Keywords>
    <Keywords name="Folders in code2, middle" id="11">middle</
    Keywords>
    <Keywords name="Folders in code2, close" id="12">Close</
    Keywords>
    <Keywords name="Folders in comment, open" id="13">Open</
    Keywords>
    <Keywords name="Folders in comment, middle"
    id="14">middle</Keywords>
    <Keywords name="Folders in comment, close" id="15">Close</
    Keywords>
    <Keywords name="Keywords1" id="16">%%</Keywords>
    <Keywords name="Keywords2" id="17"></Keywords>
    <Keywords name="Keywords3" id="18"></Keywords>
    <Keywords name="Keywords4" id="19"></Keywords>
    <Keywords name="Keywords5" id="20"></Keywords>
    <Keywords name="Keywords6" id="21"></Keywords>
    <Keywords name="Keywords7" id="22"></Keywords>
    <Keywords name="Keywords8" id="23"></Keywords>
    <Keywords name="Delimiters" id="24"></Keywords>
    </KeywordLists>
    <Styles>
    <WordsStyle name="DEFAULT" styleID="0" fgColor="FFFFFF"
    bgColor="000000" fontName="Monotype Corsiva" fontStyle="7"
    fontSize="14" nesting="0" />
    <WordsStyle name="COMMENTS" styleID="1" fgColor="000000"
    bgColor="FFFFFF" fontStyle="0" nesting="0" />
    <WordsStyle name="LINE COMMENTS" styleID="2"
    fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
    <WordsStyle name="NUMBERS" styleID="3" fgColor="000000"
    bgColor="FFFFFF" fontStyle="0" nesting="0" />
    <WordsStyle name="KEYWORDS1" styleID="4" fgColor="000000"
    bgColor="FFFFFF" fontStyle="0" nesting="0" />
    <WordsStyle name="KEYWORDS2" styleID="5" fgColor="000000"
    bgColor="FFFFFF" fontStyle="0" nesting="0" />
    <WordsStyle name="KEYWORDS3" styleID="6" fgColor="000000"
    bgColor="FFFFFF" fontStyle="0" nesting="0" />
    <WordsStyle name="KEYWORDS4" styleID="7" fgColor="000000"
    bgColor="FFFFFF" fontStyle="0" nesting="0" />
    <WordsStyle name="KEYWORDS5" styleID="8" fgColor="000000"
    bgColor="FFFFFF" fontStyle="0" nesting="0" />
    <WordsStyle name="KEYWORDS6" styleID="9" fgColor="000000"
    bgColor="FFFFFF" fontStyle="0" nesting="0" />
    <WordsStyle name="KEYWORDS7" styleID="10" fgColor="000000"
    bgColor="FFFFFF" fontStyle="0" nesting="0" />
    <WordsStyle name="KEYWORDS8" styleID="11" fgColor="000000"
    bgColor="FFFFFF" fontStyle="0" nesting="0" />
    <WordsStyle name="OPERATORS" styleID="12" fgColor="000000"
    bgColor="FFFFFF" fontStyle="0" nesting="0" />
    <WordsStyle name="FOLDER IN CODE1" styleID="13"
    fgColor="FFFFFF" bgColor="000000" fontName="" fontStyle="7"
    fontSize="10" nesting="0" />
    <WordsStyle name="FOLDER IN CODE2" styleID="14"
    fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
    <WordsStyle name="FOLDER IN COMMENT" styleID="15"
    fgColor="FFFFFF" bgColor="000000" fontName="Times New Roman"
    fontStyle="7" fontSize="8" nesting="0" />
    <WordsStyle name="DELIMITERS1" styleID="16"
    fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
    <WordsStyle name="DELIMITERS2" styleID="17"
    fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
    <WordsStyle name="DELIMITERS3" styleID="18"
    fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
    <WordsStyle name="DELIMITERS4" styleID="19"
    fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
    <WordsStyle name="DELIMITERS5" styleID="20"
    fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
    <WordsStyle name="DELIMITERS6" styleID="21"
    fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
    <WordsStyle name="DELIMITERS7" styleID="22"
    fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
    <WordsStyle name="DELIMITERS8" styleID="23"
    fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
    </Styles>
    </UserLang>
    </NotepadPlus>

    > articles on their Web site.  It's mostly free, with some items
    > available to subscribers only.  It seems random, which ones
    > they block, about 20%.
    >
    > Anywho, sometimes I use their search utility, the usual author
    > or title search, and it blocks, then I look it up on Google, and
    > link from there, and it loads!  ok, Web gurus, what's going on?
    >
    > --
    > Rich
     
    Martin Musatov, Jan 30, 2013
    #3
  4. RichD <> contributed wisdom to news:b968c6c6-5aa9-
    :

    > Web gurus, what's going on?
    >


    That is the fault of the site itself.
    If they are going to block access to users then they should also block
    access to the automated spiders that hit the site to collect data.
     
    Gandalf Parker, Jan 30, 2013
    #4
  5. RichD

    RichD Guest

    On Jan 30, Gandalf Parker <>
    wrote:
    > > Web gurus, what's going on?

    >
    > That is the fault of the site itself.
    > If they are going to block access to users then they should also block
    > access to the automated spiders that hit the site to collect data.


    well yeah, but what's going on, under the hood?
    How does it get confused? How could this
    happen? I'm looking for some insight, regarding a
    hypothetical programmimg glitch -


    --
    Rich
     
    RichD, Jan 30, 2013
    #5
  6. RichD

    Auric__ Guest

    Martin Musatov wrote:

    > On Jan 29, 8:55 pm, RichD <> wrote:
    >> I read Wall Street Journal, and occasionally check<NotepadPlus>

    > <UserLang name="MUSATOV" ext=".myl" udlVersion="2.0">

    [snip]
    > </UserLang>
    > </NotepadPlus>


    Ignoring the big ol' unneccessary crosspost... What the ****?

    --
    Oooh, I just learned a new euphemism.
     
    Auric__, Jan 30, 2013
    #6
  7. RichD

    alex23 Guest

    On Jan 31, 5:39 am, RichD <> wrote:
    > well yeah, but what's going on, under the hood?
    > How does it get confused?  How could this
    > happen?  I'm looking for some insight, regarding a
    > hypothetical programmimg glitch -


    As has been stated, this has nothing to do with Python, so please stop
    posting your questions here.

    However, here's an answer to get you to stop repeating yourself: it's
    not uncommon to find that content you're restricted from accessing via
    a site's own search is available to you through Google. This has to do
    with Google's policy of _requiring_ that pages that it is allowed to
    index _must_ be available for view. Any site that allows Google to
    index its pages that then blocks you from viewing them will swiftly
    find themselves web site-a non gratis in Google search. As most
    websites are attention whores, they'll do anything to ensure they
    remain within Google's indices.
     
    alex23, Jan 31, 2013
    #7
  8. RichD

    Arne Vajhøj Guest

    On 1/29/2013 11:55 PM, RichD wrote:
    > I read Wall Street Journal, and occasionally check
    > articles on their Web site. It's mostly free, with some items
    > available to subscribers only. It seems random, which ones
    > they block, about 20%.
    >
    > Anywho, sometimes I use their search utility, the usual author
    > or title search, and it blocks, then I look it up on Google, and
    > link from there, and it loads! ok, Web gurus, what's going on?


    WSJ want their articles to be findable from Google.

    So they open up for Google indexing them.

    If they require any type of registration to see an article,
    then Google will remove the link.

    So therefore WSJ (and many other web sites!) gives more access
    if you come from Google than if not.

    Arne
     
    Arne Vajhøj, Jan 31, 2013
    #8
  9. RichD

    Roedy Green Guest

    On Wed, 30 Jan 2013 11:39:41 -0800 (PST), RichD
    <> wrote, quoted or indirectly quoted someone
    who said :

    >well yeah, but what's going on, under the hood?
    >How does it get confused? How could this
    >happen? I'm looking for some insight, regarding a
    >hypothetical programmimg glitch -

    Monitor the responses in all newsgroups you post to.
    --
    Roedy Green Canadian Mind Products http://mindprod.com
    The first 90% of the code accounts for the first 90% of the development time.
    The remaining 10% of the code accounts for the other 90% of the development
    time.
    ~ Tom Cargill Ninety-ninety Law
     
    Roedy Green, Jan 31, 2013
    #9
  10. RichD <> contributed wisdom to news:badd4188-196b-
    :

    > On Jan 30, Gandalf Parker <>
    > wrote:
    >> > Web gurus, what's going on?

    >>
    >> That is the fault of the site itself.
    >> If they are going to block access to users then they should also block
    >> access to the automated spiders that hit the site to collect data.

    >
    > well yeah, but what's going on, under the hood?
    > How does it get confused? How could this
    > happen? I'm looking for some insight, regarding a
    > hypothetical programmimg glitch -


    (from alt.hacker)

    You dont understand. It is not in the code. It is in the site.
    It is as if someone comes and picks fruit off of your tree, and you are
    questioning the tree for how it bears fruit.

    The site creates web pages.
    Google collects web pages.
    The site needs to set things like robot.txt to tell Google to NOT collect
    the pages in the archives. Which is not an absolute protection but at least
    its an effort that works for most sites.
     
    Gandalf Parker, Jan 31, 2013
    #10
  11. RichD

    Roedy Green Guest

    On Thu, 31 Jan 2013 14:07:21 +0000 (UTC), Gandalf Parker
    <> wrote, quoted or indirectly
    quoted someone who said :

    >The site creates web pages.
    >Google collects web pages.
    >The site needs to set things like robot.txt to tell Google to NOT collect
    >the pages in the archives. Which is not an absolute protection but at least
    >its an effort that works for most sites.


    To the site, Google is just a voracious reader. If they block readers
    from hoovering up content, that automatically stops Google.

    The site owners wanted Google to spider the site, bring in customers,
    then hit them with a fee. They forgot that anyone coming in directly
    via Google's links would bypass their own search engine.
    --
    Roedy Green Canadian Mind Products http://mindprod.com
    The first 90% of the code accounts for the first 90% of the development time.
    The remaining 10% of the code accounts for the other 90% of the development
    time.
    ~ Tom Cargill Ninety-ninety Law
     
    Roedy Green, Feb 1, 2013
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chris Morris

    Re: dtd and quirk mode

    Chris Morris, Jul 22, 2003, in forum: HTML
    Replies:
    0
    Views:
    400
    Chris Morris
    Jul 22, 2003
  2. Richard
    Replies:
    1
    Views:
    418
  3. Blinky the Shark

    Konqueror Quirk? Safari?

    Blinky the Shark, Feb 18, 2004, in forum: HTML
    Replies:
    9
    Views:
    455
    Blinky the Shark
    Feb 20, 2004
  4. Spartanicus

    IE6 quirk check

    Spartanicus, Jul 16, 2004, in forum: HTML
    Replies:
    2
    Views:
    450
    Spartanicus
    Jul 16, 2004
  5. RichD

    security quirk

    RichD, Jan 30, 2013, in forum: Python
    Replies:
    10
    Views:
    198
    Gandalf Parker
    Jan 31, 2013
Loading...

Share This Page