Binary data, command output, and Ruby

Discussion in 'Ruby' started by Phrogz, Oct 1, 2007.

  1. Phrogz

    Phrogz Guest

    I have a script that pulls pages from our wiki server. It was working
    using Net:HTTP and open-uri with basic_authentication, but our
    sysadmin disabled basic authentication and left NTLM as the only
    authentication method.

    Instead of trying to figure out how to use the Ruby NTLM library, I
    decide to just use curl. It was working nicely for the HTML pages
    using this form:
    def fetch_http_ntlm( url )
    `curl #{url} --ntlm -# -u #{USER}:#{PASS}`
    end

    However, the above fails for binary files. (Pulling down images
    embedded in pages.) So I had to switch it to this:
    def fetch_http_ntlm( url )
    file_name = "C:\\tmp_#{Time.new.to_i}"
    `curl #{url} --ntlm -# -u #{USER}:#{PASS} -o #{file_name}`
    raw = File.open( file_name, 'rb' ){ |f| f.read }
    File.delete( file_name )
    raw
    end

    In other words, I have curl write the output to a file, and then read
    in the file using binary mode, and delete the file.

    Should I have to do this? Is it a general problem that commands can't
    cleanly return binary data to the 'console', and hence can't be
    captured using the above format? Or is curl on Windows at fault, and
    should be doing something different? Or is Ruby Windows at fault? Or
    is Windows itself at fault?


    Also - I didn't try using the Tempfile library for the above, since
    the documentation for Tempfile.new says:
    'Creates a temporary file of mode 0600 in the temporary directory
    whose name is basename.pid.n and opens with mode "w+".' If this
    documentation is correct, does this mean that the Tempfile library
    doesn't work for binary files on Windows?
     
    Phrogz, Oct 1, 2007
    #1
    1. Advertising

  2. Phrogz

    Phrogz Guest

    On Oct 1, 10:15 am, Phrogz <> wrote:
    > I have a script that pulls pages from our wiki server. It was working
    > using Net:HTTP and open-uri with basic_authentication, but our
    > sysadmin disabled basic authentication and left NTLM as the only
    > authentication method.
    >
    > Instead of trying to figure out how to use the Ruby NTLM library, I
    > decide to just use curl. It was working nicely for the HTML pages
    > using this form:
    > def fetch_http_ntlm( url )
    > `curl #{url} --ntlm -# -u #{USER}:#{PASS}`
    > end
    >
    > However, the above fails for binary files. (Pulling down images
    > embedded in pages.) So I had to switch it to this:
    > def fetch_http_ntlm( url )
    > file_name = "C:\\tmp_#{Time.new.to_i}"
    > `curl #{url} --ntlm -# -u #{USER}:#{PASS} -o #{file_name}`
    > raw = File.open( file_name, 'rb' ){ |f| f.read }
    > File.delete( file_name )
    > raw
    > end
    >
    > In other words, I have curl write the output to a file, and then read
    > in the file using binary mode, and delete the file.
    >
    > Should I have to do this? Is it a general problem that commands can't
    > cleanly return binary data to the 'console', and hence can't be
    > captured using the above format? Or is curl on Windows at fault, and
    > should be doing something different? Or is Ruby Windows at fault? Or
    > is Windows itself at fault?


    Followup - this does not seem to be a core problem of terminal
    commands returning binary data, or a core failing of Ruby. From my OS
    X box at home:

    Slim2:~/Desktop phrogz$ cat send_bytes.rb
    print [13,7,129,250,0,70,111,111].map{ |b| b.chr }.join

    Slim2:~/Desktop phrogz$ cat get_bytes.rb
    result = `ruby send_bytes.rb`
    p result.length, result

    Slim2:~/Desktop phrogz$ ruby get_bytes.rb
    8
    "\r\a\201\372\000Foo"

    This is also not a problem with curl (at least on *nix):

    Slim2:~/Desktop phrogz$ curl -s -O http://phrogz.net/tmp/gkhead.jpg
    Slim2:~/Desktop phrogz$ irb
    irb(main):001:0> good = IO.read( 'gkhead.jpg' ); good.length
    => 21443
    irb(main):002:0> url = 'http://phrogz.net/tmp/gkhead.jpg'
    => "http://phrogz.net/tmp/gkhead.jpg"
    irb(main):003:0> test = `curl -s #{url}`; test.length
    => 21443
    irb(main):004:0> test == good
    => true

    Tomorrow I'll see which of the above fails back on my Windows box.
    Glad this isn't a fundamental Ruby or shell workflow problem, anyhow.
     
    Phrogz, Oct 2, 2007
    #2
    1. Advertising

  3. Phrogz

    Phrogz Guest

    On Oct 1, 9:38 pm, Phrogz <> wrote:
    > Followup - this does not seem to be a core problem of terminal
    > commands returning binary data, or a core failing of Ruby. From my OS
    > X box at home:
    >
    > Slim2:~/Desktop phrogz$ cat send_bytes.rb
    > print [13,7,129,250,0,70,111,111].map{ |b| b.chr }.join
    >
    > Slim2:~/Desktop phrogz$ cat get_bytes.rb
    > result = `ruby send_bytes.rb`
    > p result.length, result
    >
    > Slim2:~/Desktop phrogz$ ruby get_bytes.rb
    > 8
    > "\r\a\201\372\000Foo"
    >
    > This is also not a problem with curl (at least on *nix):
    >
    > Slim2:~/Desktop phrogz$ curl -s -Ohttp://phrogz.net/tmp/gkhead.jpg
    > Slim2:~/Desktop phrogz$ irb
    > irb(main):001:0> good = IO.read( 'gkhead.jpg' ); good.length
    > => 21443
    > irb(main):002:0> url = 'http://phrogz.net/tmp/gkhead.jpg'
    > => "http://phrogz.net/tmp/gkhead.jpg"
    > irb(main):003:0> test = `curl -s #{url}`; test.length
    > => 21443
    > irb(main):004:0> test == good
    > => true
    >
    > Tomorrow I'll see which of the above fails back on my Windows box.


    Here are the results from Windows. Binary per se doesn't fail, but
    using it with curl makes it break eventually.

    Any suggestions on how to further pare this down to see if this is a
    Ruby-Windows problem, a Windows shell problem, or a Curl-Windows
    problem?


    c:\>type send_bytes.rb
    print [13,7,129,250,0,70,111,111].map{ |b| b.chr }.join

    c:\>type get_bytes.rb
    result = `ruby send_bytes.rb`
    p result.length, result

    c:\>ruby get_bytes.rb
    8
    "\r\a\201\372\000Foo"


    c:\>curl -s -O http://phrogz.net/tmp/gkhead.jpg

    c:\>irb
    irb(main):001:0> good = File.open( 'gkhead.jpg', 'rb' ){ |f| f.read };
    good.length
    => 21443

    irb(main):002:0> url = 'http://phrogz.net/tmp/gkhead.jpg'
    => "http://phrogz.net/tmp/gkhead.jpg"

    irb(main):003:0> test = `curl -s #{url}`; test.length
    => 2010

    irb(main):008:0> 0.step( test.length, 100 ){ |i|
    irb(main):009:1* range = i...(i+100)
    irb(main):010:1> if good[ range ] != test[ range ]
    irb(main):011:2> p good[ range ], test[ range ], range
    irb(main):012:2> break
    irb(main):013:2> end
    irb(main):014:1> }
    "\000\000\000\004\000\000\000\0008BIM\004\032\006Slices
    \000\000\000\000m
    \000\000\000\006\000\000\000\000\000\000\000\000\000\000\001\276\000\000\001\231\000\000\000\006\000g
    \000k\000h\000e\000a\000d
    \000\000\000\001\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\000\000\000\001\231\000\000"
    "\000\000\000\004\000\000\000\0008BIM\004$\023\222\vDW$\026\020EG
    \377\320\346\177\335q9}K\236:{5C\357L\026\372\330\251\207\261W>
    \372\301v\346O\222b\373\027/\276p\310\372\351\370\246\036\314\327~
    \366\260\\\t\037\002\236\253\356X\373\267\237\346)\352{\221\221\367I
    \352\177\322\2223z`\227\335W"
    700...800
     
    Phrogz, Oct 2, 2007
    #3
  4. Phrogz

    Phrogz Guest

    OK, so this seems like a Ruby Windows problem:

    C:\>curl -s -O http://phrogz.net/tmp/gkhead.jpg
    C:\>curl -s http://phrogz.net/tmp/gkhead.jpg > test.jpg
    C:\>irb
    irb(main):001:0> good = File.open( 'gkhead.jpg', 'rb' ){ |f| f.read };
    good.length
    => 21443
    irb(main):002:0> test = File.open( 'test.jpg', 'rb' ){ |f| f.read };
    test.length
    => 21443
    irb(main):003:0> suck = `curl -s http://phrogz.net/tmp/gkhead.jpg`;
    suck.length
    => 2010


    good = File.open( 'gkhead.jpg', 'rb' ){ |f| f.read }
    test = `curl -s http://phrogz.net/tmp/gkhead.jpg`

    0.upto( test.length-1 ){ |i|
    if test[ i ] != good[ i ]
    s1 = good[ (i-5)..(i+2) ]
    s2 = test[ (i-5)..(i+2) ]
    p s1, s2
    puts
    [ s1, s2 ].each{ |str|
    puts str.unpack( 'B8'*str.length ).join('|')
    }
    break
    end
    }

    #=> "8BIM\004\032\006S"
    #=> "8BIM\004$\023\222"
    #=>
    #=> 00111000|01000010|01001001|01001101|00000100|00011010|00000110|
    01010011
    #=> 00111000|01000010|01001001|01001101|00000100|00100100|00010011|
    10010010


    Windows console can properly redirect binary command output to a file,
    but (after a certain point or certain binary sequence?) Ruby gets
    munged binary data back instead.

    I'll take this to ruby-core unless someone can point out why this flaw
    isn't Ruby's.
     
    Phrogz, Oct 3, 2007
    #4
  5. Phrogz

    Phrogz Guest

    For my last post on this topic, a simpler test case showing Ruby on OS
    X behaving as expected, and Ruby on Windows...not.

    ====

    Darwin Slim2.local 8.10.1 Darwin Kernel Version 8.10.1: Wed May 23
    16:33:00 PDT 2007; root:xnu-792.22.5~1/RELEASE_I386 i386 i386
    ruby 1.8.6 (2007-03-13 patchlevel 0) [i686-darwin8.9.1]

    Slim2:~/Desktop phrogz$ cat put_bytes.rb
    File.open( 'gkhead.jpg', 'rb' ){ |f| print f.read }

    Slim2:~/Desktop phrogz$ cat get_bytes.rb
    raw_bytes = File.open( 'gkhead.jpg', 'rb' ){ |f| f.read }
    rcv_bytes = `ruby put_bytes.rb`
    p raw_bytes.length, rcv_bytes.length

    Slim2:~/Desktop phrogz$ ruby get_bytes.rb
    21443
    21443

    ====

    Windows XP SP 2 (Microsoft Windows XP [Version 5.1.2600])
    ruby 1.8.6 (2007-03-13 patchlevel 0) [i386-mswin32] (latest one-click
    installer)

    C:\Documents and Settings\gavin.kistner\Desktop>type put_bytes.rb
    File.open( 'gkhead.jpg', 'rb' ){ |f| print f.read }

    C:\Documents and Settings\gavin.kistner\Desktop>type get_bytes.rb
    raw_bytes = File.open( 'gkhead.jpg', 'rb' ){ |f| f.read }
    rcv_bytes = `ruby put_bytes.rb`
    p raw_bytes.length, rcv_bytes.length

    C:\Documents and Settings\gavin.kistner\Desktop>ruby get_bytes.rb
    21443
    5159
     
    Phrogz, Oct 4, 2007
    #5
  6. > I have a script that pulls pages from our wiki server. It was working
    > using Net:HTTP and open-uri with basic_authentication, but our
    > sysadmin disabled basic authentication and left NTLM as the only
    > authentication method.


    Install http://ntlmaps.sourceforge.net/ and direct Net::HTTP through
    that
    as a proxy.
     
    Daniel Sheppard, Oct 4, 2007
    #6
  7. > good =3D File.open( 'gkhead.jpg', 'rb' ){ |f| f.read }
    > test =3D `curl -s http://phrogz.net/tmp/gkhead.jpg`


    I would hazard a guess that if you took that 'b' off of the File.open,
    you'd get the same bytes `` is returning?
     
    Daniel Sheppard, Oct 4, 2007
    #7
  8. Phrogz

    Phrogz Guest

    On Oct 3, 10:06 pm, "Daniel Sheppard" <> wrote:
    > > good = File.open( 'gkhead.jpg', 'rb' ){ |f| f.read }
    > > test = `curl -shttp://phrogz.net/tmp/gkhead.jpg`

    >
    > I would hazard a guess that if you took that 'b' off of the File.open,
    > you'd get the same bytes `` is returning?


    I doubt it, but will try when I get into work. My understanding was
    that (on Windows) opening a file without 'b' "helpfully" converts \n
    bytes to \r\n pairs; the 'b' is needed to say "Hey, don't be munging
    my data!".

    But like I said, I'll give it a shot.
     
    Phrogz, Oct 4, 2007
    #8
  9. Phrogz

    Phrogz Guest

    On Oct 4, 8:03 am, Phrogz <> wrote:
    > > > good = File.open( 'gkhead.jpg', 'rb' ){ |f| f.read }
    > > > test = `curl -shttp://phrogz.net/tmp/gkhead.jpg`

    >
    > > I would hazard a guess that if you took that 'b' off of the File.open,
    > > you'd get the same bytes `` is returning?

    >
    > I doubt it, but will try when I get into work. My understanding was
    > that (on Windows) opening a file without 'b' "helpfully" converts \n
    > bytes to \r\n pairs; the 'b' is needed to say "Hey, don't be munging
    > my data!".
    >
    > But like I said, I'll give it a shot.


    OK, so this has nothing to do with reading files from disk. The crazy
    thing is that it isn't even deterministic! See the following:

    C:\>type put_bytes.rb
    print (0..12000).map{ |i| ((i % 255) + 1).chr }.join
    $stdout.flush
    sleep 1
    $stdout.flush

    C:\>type get_bytes.rb
    p `ruby put_bytes.rb`.length

    C:\>type multiget.bat
    @echo off
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb

    C:\>multiget.bat
    944
    696
    944
    1192
    944
    919
    1192
    1192
    944
    944
    1192
    1192
    944
    1167
    1192
    1192
    944
    1192
    1192
    1192

    Note that it also does the above with or without the sleep, and with
    or without the $stdout.flush calls.

    What is going on here?!
     
    Phrogz, Oct 4, 2007
    #9
  10. Phrogz

    Peña, Botp Guest

    From: Phrogz [mailto:p]=20
    # OK, so this has nothing to do with reading files=20
    # from disk. The crazy thing is that it isn't even=20
    # deterministic! See the following:
    # <snip>
    #...
    # What is going on here?!

    can't help you there, but mine has a different yet consistent output...

    C:\family\ruby>type put_bytes.rb
    print (0..12000).map{ |i| ((i % 255) + 1).chr }.join
    $stdout.flush
    sleep 1
    $stdout.flush

    C:\family\ruby>type get_bytes.rb
    p `ruby put_bytes.rb`.length

    C:\family\ruby>type multi_get.bat
    @echo off
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb
    ruby get_bytes.rb

    C:\family\ruby> multi_get.bat
    348
    348
    348
    348
    348
    348
    348
    348
    348
    348
    348
    348
    348
    348
    348
    348
    348
    348
    348
    348

    C:\family\ruby>ver

    Microsoft Windows XP [Version 5.1.2600]

    C:\family\ruby>ruby -v
    ruby 1.8.6 (2007-09-23 patchlevel 110) [i386-mswin32]

    maybe we differ on the patchlevel?

    kind regards -botp
     
    Peña, Botp, Oct 5, 2007
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Fangs
    Replies:
    3
    Views:
    9,841
    darshana
    Oct 26, 2008
  2. Joona I Palaste
    Replies:
    6
    Views:
    19,017
    Wolfram Rittmeyer
    Aug 21, 2003
  3. Replies:
    7
    Views:
    336
    CBFalconer
    Nov 5, 2006
  4. Jerome Zago
    Replies:
    0
    Views:
    100
    Jerome Zago
    Mar 27, 2006
  5. Tony Mcneil
    Replies:
    6
    Views:
    128
    Tony Mcneil
    Sep 18, 2008
Loading...

Share This Page