Re: urllib2 FTP Weirdness

Discussion in 'Python' started by Chris Angelico, Jan 23, 2013.

  1. On Thu, Jan 24, 2013 at 7:07 AM, Nick Cash
    <> wrote:
    > Python 2.7.3 on linux
    >
    > This has me fairly stumped. It looks like
    > urllib2.urlopen("ftp://some.ftp.site/path").read()
    > will either immediately return '' or hang indefinitely. But
    > response = urllib2.urlopen("ftp://some.ftp.site/path")
    > response.read()
    > works fine and returns what is expected. This is only an issue with urllib2, vanilla urllib doesn't do it.
    >
    > The site I first noticed it on is private, but I can reproduce it with "ftp://ftp2.census.gov/".


    Confirmed on 2.6.5 on Windows, fwiw. This is extremely weird. Possibly
    it's some kind of race condition??

    ChrisA
     
    Chris Angelico, Jan 23, 2013
    #1
    1. Advertising

  2. Chris Angelico

    Hans Mulder Guest

    On 24/01/13 00:58:04, Chris Angelico wrote:
    > On Thu, Jan 24, 2013 at 7:07 AM, Nick Cash
    > <> wrote:
    >> Python 2.7.3 on linux
    >>
    >> This has me fairly stumped. It looks like
    >> urllib2.urlopen("ftp://some.ftp.site/path").read()
    >> will either immediately return '' or hang indefinitely. But
    >> response = urllib2.urlopen("ftp://some.ftp.site/path")
    >> response.read()
    >> works fine and returns what is expected. This is only an issue with urllib2, vanilla urllib doesn't do it.
    >>
    >> The site I first noticed it on is private, but I can reproduce it with "ftp://ftp2.census.gov/".

    >
    > Confirmed on 2.6.5 on Windows, fwiw. This is extremely weird.


    It works fine with 2.7.3 on my Mac.

    > Possibly it's some kind of race condition??


    If urllib2 is using active mode FTP, then a firewall on your box
    could explain what you're seeing. But then, that's why active
    mode is hardly used these days.


    Hope this helps,

    -- HansM
     
    Hans Mulder, Jan 24, 2013
    #2
    1. Advertising

  3. On Thu, 24 Jan 2013 01:45:31 +0100, Hans Mulder wrote:

    > On 24/01/13 00:58:04, Chris Angelico wrote:
    >> On Thu, Jan 24, 2013 at 7:07 AM, Nick Cash
    >> <> wrote:
    >>> Python 2.7.3 on linux
    >>>
    >>> This has me fairly stumped. It looks like
    >>> urllib2.urlopen("ftp://some.ftp.site/path").read()
    >>> will either immediately return '' or hang indefinitely. But
    >>> response = urllib2.urlopen("ftp://some.ftp.site/path")
    >>> response.read()
    >>> works fine and returns what is expected. This is only an issue with
    >>> urllib2, vanilla urllib doesn't do it.
    >>>
    >>> The site I first noticed it on is private, but I can reproduce it with
    >>> "ftp://ftp2.census.gov/".

    >>
    >> Confirmed on 2.6.5 on Windows, fwiw. This is extremely weird.

    >
    > It works fine with 2.7.3 on my Mac.
    >
    >> Possibly it's some kind of race condition??

    >
    > If urllib2 is using active mode FTP, then a firewall on your box could
    > explain what you're seeing. But then, that's why active mode is hardly
    > used these days.



    Explain please?

    I cannot see how the firewall could possible distinguish between using a
    temporary variable or not in these two snippets:

    # no temporary variable hangs, or fails
    urllib2.urlopen("ftp://ftp2.census.gov/").read()


    # temporary variable succeeds
    response = urllib2.urlopen("ftp://ftp2.census.gov/")
    response.read()



    --
    Steven
     
    Steven D'Aprano, Jan 24, 2013
    #3
  4. On 24Jan2013 04:12, Steven D'Aprano <> wrote:
    | On Thu, 24 Jan 2013 01:45:31 +0100, Hans Mulder wrote:
    | > On 24/01/13 00:58:04, Chris Angelico wrote:
    | >> Possibly it's some kind of race condition??
    | >
    | > If urllib2 is using active mode FTP, then a firewall on your box could
    | > explain what you're seeing. But then, that's why active mode is hardly
    | > used these days.
    |
    | Explain please?

    You do know the difference between active and passive FTP, yes?

    | I cannot see how the firewall could possible distinguish between using a
    | temporary variable or not in these two snippets:
    |
    | # no temporary variable hangs, or fails
    | urllib2.urlopen("ftp://ftp2.census.gov/").read()
    |
    | # temporary variable succeeds
    | response = urllib2.urlopen("ftp://ftp2.census.gov/")
    | response.read()

    Timing. (Let me say I consider this scenario unlikely, very unlikely.
    But...)

    If the latter is consistently slightly slower then the firewall may be an
    issue if active FTP is being used. "Active" FTP requires the FTP server
    to connect to you to deliver the data: your end opens a listening TCP
    socket and says "get", supplying the socket details.

    Really the TCP protocol is suppose to be plenty robust enough for this
    not to be timing - the opening SYN packet will get resent if the first
    try doesn't elicit a response.

    For this to work over a firewall the firewall must (1) read your FTP
    control connection to see the port announcements and then (2) open a
    firewall hole to let the FTP server connect in, probably including a
    NAT or RDR arrangement to catch the incoming connection and deliver it
    to your end. Let us not even consider other NATting firewalls further
    upstream with your ISP.

    Active FTP (the original FTP mode) is horrible. Passive FTP is more
    conventional: the server listens and you connect to fetch the file. But
    it still requires the server to accept connections on multiple ports;
    ugh.

    I hate FTP and really don't understand why it is still in common use.
    --
    Cameron Simpson <>

    To be positive: To be mistaken at the top of one's voice.
    Ambrose Bierce (1842-1914), U.S. author. The Devil's Dictionary (1881-1906).
     
    Cameron Simpson, Feb 6, 2013
    #4
  5. On Thu, 07 Feb 2013 10:06:32 +1100, Cameron Simpson wrote:

    > | I cannot see how the firewall could possible distinguish between using
    > | a temporary variable or not in these two snippets:
    > |
    > | # no temporary variable hangs, or fails
    > | urllib2.urlopen("ftp://ftp2.census.gov/").read()
    > |
    > | # temporary variable succeeds
    > | response = urllib2.urlopen("ftp://ftp2.census.gov/")
    > | response.read()
    >
    > Timing. (Let me say I consider this scenario unlikely, very unlikely.
    > But...)
    >
    > If the latter is consistently slightly slower


    On my laptop, the difference is of the order of 10 microseconds. About
    half a million times smaller than the amount of time it takes to open the
    connection in the first place.


    > then the firewall may be
    > an issue if active FTP is being used. "Active" FTP requires the FTP
    > server to connect to you to deliver the data: your end opens a listening
    > TCP socket and says "get", supplying the socket details.


    If you are thinking that the socket gets closed if the read is delayed
    too much, that doesn't explain the results you are getting. The read
    succeeds when there is a delay, not when there is no delay. Almost as if
    something is saying "oh, the read request came in too soon after the
    connection was made, must block".

    What can I say? I cannot reproduce the issue you are having. If you can
    reproduce it, try again without the firewall. If bypassing the firewall
    makes the issue go away, then go and yell at your network admins until
    they fix it.


    --
    Steven
     
    Steven D'Aprano, Feb 7, 2013
    #5
  6. On 07Feb2013 02:43, Steven D'Aprano <> wrote:
    | On Thu, 07 Feb 2013 10:06:32 +1100, Cameron Simpson wrote:
    | > Timing. (Let me say I consider this scenario unlikely, very unlikely.
    | > But...)
    | > If the latter is consistently slightly slower
    |
    | On my laptop, the difference is of the order of 10 microseconds.

    Like I said, I do not consider this likely.

    | > then the firewall may be
    | > an issue if active FTP is being used. "Active" FTP requires the FTP
    | > server to connect to you to deliver the data: your end opens a listening
    | > TCP socket and says "get", supplying the socket details.
    |
    | If you are thinking that the socket gets closed if the read is delayed
    | too much, that doesn't explain the results you are getting. The read
    | succeeds when there is a delay, not when there is no delay. Almost as if
    | something is saying "oh, the read request came in too soon after the
    | connection was made, must block".

    Exactly so.

    For active FTP the firewall must accept an _inbound_ connection
    from the server. If that connection's opening SYN packet comes in
    _before_ the firewall has set up the special purpose rule to accept this
    (remember, the fw is not the FTP client) then the firewall will quite
    possibly _reject_ the inbound SYN packet, causing the server to see
    "connection refused".

    client:
    open listening socket for data
    say "GET foo" to server, with socket details
    server:
    connect to the socket
    send data...

    The firewall's in the middle, watching for the socket details. When it
    sees them it must create an inbound forwarding rule to let the server's
    inbound DATA connection through to the client. But the server believes
    the socket is already available (because it is - the client makes it
    before supplying the socket details) and may dispatch the DATA
    connection before the firewall gets its rule in place.

    | What can I say? I cannot reproduce the issue you are having. If you can
    | reproduce it, try again without the firewall. If bypassing the firewall
    | makes the issue go away, then go and yell at your network admins until
    | they fix it.

    If it is a speed thing, it may not be fixable. The fix is to use PASV
    mode FTP or better still to avoid FTP entirely. I certainly don't support
    active FTP on the firewalls I administer.

    Cheers,
    --
    Cameron Simpson <>

    Symbol? What's a symbol? This is a rose.
    - R.A. MacAvoy, _Tea with the Black Dragon_
     
    Cameron Simpson, Feb 7, 2013
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. steve
    Replies:
    4
    Views:
    533
    Brian van den Broek
    Mar 13, 2005
  2. Josef Cihal
    Replies:
    0
    Views:
    765
    Josef Cihal
    Sep 5, 2005
  3. Devraj
    Replies:
    2
    Views:
    488
    Devraj
    Oct 29, 2007
  4. D. Buck
    Replies:
    2
    Views:
    493
    D. Buck
    Jun 29, 2004
  5. Nick Cash

    urllib2 FTP Weirdness

    Nick Cash, Jan 23, 2013, in forum: Python
    Replies:
    1
    Views:
    120
    Steven D'Aprano
    Jan 24, 2013
Loading...

Share This Page