binmode and the diamond operator

Discussion in 'Perl Misc' started by J. Romano, Nov 8, 2003.

  1. J. Romano

    J. Romano Guest

    Hi,

    I've had a little Perl problem recently that I've been wondering if
    there is a solution for.

    I'm using ActiveState Perl for Win32, and I need to read in binary
    files. I use the diamond operator in a while loop after setting slurp
    mode (in order to read in the whole file at once). In other words, my
    code looks something like this:

    $/ = undef; # set "slurp" mode

    while (<>) {
    my $fileLen = length $_;
    print "File \"$ARGV\" contains $fileLen bytes.\n";
    }

    With this script, someone could type

    perl script.pl file1 file2 file3

    and get output like:

    File "file1" contains 15 bytes.
    File "file2" contains 21 bytes.
    File "file3" contains 133 bytes.

    Now, I realize that I can find the size of a file by using the -s
    filetest operator, but that's not what I want to do (I just printed
    the file length as an example). Ultimately I want to peek into the
    files and look at the values at specific bytes. But in order to do
    this I have to make sure that the \n\r (or \r\n) combination doesn't
    get converted to one character (I've been burned by this before).

    So I need to set binmode() on these files, but how do I do it with
    the diamond operator? The only immediate solution I can think of is
    to re-write the code so that it opens and closes a filehandle, like
    this:

    foreach my $file (@ARGV) {
    $/ = undef; # set "slurp" mode

    open(FILE, $file) or die "Cannot read \"$file\": $!";
    binmode(FILE);
    $_ = <FILE>;
    close(FILE);

    my $fileLen = length $_;
    print "File \"$file\" contains $fileLen bytes.\n";
    }

    This way I have to add four more lines of code and check if open was
    successful. I can definitely do it this way, but if there is a
    quicker way of using binmode() with the diamond operator, I'd like to
    know about it.

    So, does anyone know if it is possible to set binmode() when using
    the diamond operator (specifically in "slurp" mode)?

    Thanks in advance,

    Jean-Luc
    J. Romano, Nov 8, 2003
    #1
    1. Advertising

  2. J. Romano <> wrote:

    > I use the diamond operator in a while loop after setting slurp
    > mode (in order to read in the whole file at once).


    > So I need to set binmode() on these files, but how do I do it with
    > the diamond operator?



    binmode ARGV;


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Nov 8, 2003
    #2
    1. Advertising

  3. J. Romano

    J. Romano Guest

    (Tad McClellan) wrote in message news:<>...
    > J. Romano <> wrote:
    >
    > > I use the diamond operator in a while loop after setting slurp
    > > mode (in order to read in the whole file at once).

    >
    > > So I need to set binmode() on these files, but how do I do it with
    > > the diamond operator?

    >
    > binmode ARGV;


    Thanks for the response, Tad, but it doesn't work. At least, I
    haven't figured out where to put the that line to make it work
    correctly. Should I put it before the "while (<>)" loop or inside it?
    I tried both ways out on this small program:

    #!/usr/bin/perl -w
    use strict;

    $/ = undef; # set "slurp" mode

    # binmode(ARGV); # Do I put the binmode() call here...

    while (<>) {
    binmode(ARGV); # ...or do I put it here?
    my $fileLen = -s $ARGV;
    my $numChars = length $_;
    print "File \"$ARGV\" contains $fileLen bytes",
    " and $numChars characters.\n";
    }
    __END__

    When I put "binmode(ARGV)" before the while loop I get the
    following warning:

    binmode() on unopened filehandle ARGV at script.pl line 6.

    and when I put it as the first line of the while loop, the file has
    already been read in before it is affected by the binmode() change.

    Therefore, if I run this script with the name of a one-line text
    file, the number of characters will always be one less than the number
    of bytes (due to the fact that the newline "character" is stored as
    two bytes on Win32), which shows that binmode() is not having the
    effect I wanted.

    One main reason I want binmode() with the diamond operator is that
    I want to use it with the -ne switches, like this:

    perl -lne "BEGIN{$/=undef} print ord substr($_,99,1)" file1 file2

    This one-liner prints out the ASCII value of the hundredth byte of
    file1 and file2. However, if there is a \n\r (or \r\n) before the
    hundredth byte, the offset will be affected and the output will no
    longer be correct.

    So, can I still use binmode() with the diamond operator (or with
    the -n switch)? If I have to use "binmode(ARGV)", where do I place
    it? Do I put it right after I undef the $/ variable, or inside the
    while loop? Or do I put it somewhere else entirely?

    (Keep in mind that I'm using ActiveState Perl on a Win32 machine,
    so setting binmode() really does make a difference in my case.)

    Thanks for any responses.

    -- Jean-Luc
    J. Romano, Nov 9, 2003
    #3
  4. J. Romano

    Ben Morrow Guest

    (J. Romano) wrote:
    > (Tad McClellan) wrote in message
    > news:<>...
    > > J. Romano <> wrote:
    > >
    > > > I use the diamond operator in a while loop after setting slurp
    > > > mode (in order to read in the whole file at once).

    > >
    > > > So I need to set binmode() on these files, but how do I do it with
    > > > the diamond operator?

    > >
    > > binmode ARGV;

    >
    > Thanks for the response, Tad, but it doesn't work. At least, I
    > haven't figured out where to put the that line to make it work
    > correctly. Should I put it before the "while (<>)" loop or inside it?


    A nice conundrum!

    I can't find any way to make it work with 5.6... if you're using that
    I think you'll have to write the loop 'properly' (ie. not use <> and
    ARGV, but open and then binmode each file yourself), which pretty much
    rules out one-liners. If you are using 5.8 then

    perl -Mopen=IO,:raw -0777nwe'$n += length; END{print "$n\n"}' crlf

    does what you want (this is Unix shell quoting, I'm afraid: you'll
    need to correct it to DOS syntax). The -0777 is equivalent to
    BEGIN{$/=undef}.

    Ben

    --
    I've seen things you people wouldn't believe: attack ships on fire off the
    shoulder of Orion; I've watched C-beams glitter in the darkness near the
    Tannhauser Gate. All these moments will be lost, in time, like tears in rain.
    Time to die. |-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-|
    Ben Morrow, Nov 9, 2003
    #4
  5. J. Romano

    J. Romano Guest

    > > > > So I need to set binmode() on these files, but how do I do it with
    > > > > the diamond operator?
    > > >
    > > > binmode ARGV;

    > >
    > > Thanks for the response, Tad, but it doesn't work. At least, I
    > > haven't figured out where to put the that line to make it work
    > > correctly. Should I put it before the "while (<>)" loop or inside it?

    >
    > A nice conundrum!
    >
    > I can't find any way to make it work with 5.6... if you're using that
    > I think you'll have to write the loop 'properly' (ie. not use <> and
    > ARGV, but open and then binmode each file yourself), which pretty much
    > rules out one-liners.


    I tried to find a solution to this problem, and I found a little
    work around. For those of you who don't remember, I was trying get
    the following code to use the diamond operator in binmode (on Win32
    platforms) so my \r\n or \n\r combinations wouldn't get converted to
    one character:

    #!/usr/bin/perl -w
    use strict;
    $/ = undef; # set "slurp" mode
    while (<>) {
    my $fileLen = -s $ARGV;
    my $numChars = length $_;
    print "File \"$ARGV\" contains $fileLen bytes",
    " and $numChars characters.\n";
    }
    __END__

    This code, when run on Win32 platforms with text file names as
    parameters, reports that there are more bytes to the file than
    characters. It says this because it considers the \r\n and \n\r
    combinations (that occur at newlines) as one character.

    My problem is that I wanted to stop this behavior by calling
    binmode on the filehandle, so that the number of characters reported
    would be the same as the number of bytes. But the diamond operator
    doesn't use a filehandle! So how do you tell the diamond operator to
    open files in binmode?

    The obvious answer, "binmode(ARGV);", didn't work. If called
    before the while loop containing the diamond operator, a warning would
    appear stating that binmode is being called on an unopened filehandle.
    And calling it as the first line of the while loop is too late, since
    the line has already been read (in ascii mode) into $_.

    The only solution I could think of at the time was to write out the
    program the long way, looping through @ARGV and and opening the files
    (and setting them with binmode) individually. But, as Ben morrow
    said, this pretty much rules out one-liners.

    Well, like I said above, I found a little work around. Instead of
    adding the line:

    binmode(ARGV);

    I add the following line as the first line of my while loop:

    binmode(ARGV), seek(ARGV,0,0), next if $a = !$a;

    so the above script now looks like:

    #!/usr/bin/perl -w
    use strict;
    $/ = undef; # set "slurp" mode
    while (<>) {
    binmode(ARGV), seek(ARGV,0,0), next if $a = !$a;
    my $fileLen = -s $ARGV;
    my $numChars = length $_;
    print "File \"$ARGV\" contains $fileLen bytes",
    " and $numChars characters.\n";
    }
    __END__

    Do you see what it's doing? When it reads a file in for the first
    time, it sets the filehandle to binary mode, then rewinds the file
    pointer and repeats the loop. The if condition (if $a = !$a) forces
    this statement to get executed only on every other loop (otherwise it
    would be an infinite loop).

    This solution definitely works, but it's obvious that it's not
    super-efficient since every file is read twice. If I really wanted to
    make it more efficient, I could set $/ to equal a reference to 1 (so
    only one byte is read), then set binmode, rewind the pointer, AND set
    $/ to undef before restarting the loop. That way only the first byte
    would be re-read. Of course, I would have to reset $/ to a reference
    to 1 at the end of the loop before the next file is read.

    (Some people might point out that I could set $/ to a reference to
    0 so I wouldn't have to re-read any bytes at all. Well, I already
    tried this and it seems like doing so causes the diamond operator to
    read in the entire pseudo-file all at once (in other words, instead of
    reading one file at a time, it reads all the files at once and puts
    all their contents into $_ as one long string). I tried to find some
    documentation that covered this, but I couldn't find any. I'm curious
    to know if this is normal behavior.)

    But if I use this more efficient solution of only re-reading one
    byte, then it's almost more trouble than it's worth, and difficult to
    remember for one-liners. So I'll probably stick to the solution:

    binmode(ARGV), seek(ARGV,0,0), next if $a = !$a;

    for one liners if I'm too lazy to open the files individually.

    It's kind of a strange solution, isn't it? At least it works.

    -- Jean-Luc
    J. Romano, Nov 16, 2003
    #5
  6. J. Romano <> wrote:
    > So how do you tell the diamond operator to open files in binmode?


    use open IN => ':raw'; # [ untested ]

    --
    Steve
    Steve Grazzini, Nov 16, 2003
    #6
  7. J. Romano

    J. Romano Guest

    Steve Grazzini <> wrote in message news:<RZAtb.38543$>...
    > J. Romano <> wrote:
    > > So how do you tell the diamond operator to open files in binmode?

    >
    > use open IN => ':raw';


    Hey, thanks, Steve! That works perfectly! Now the following code
    reports the same number of bytes and characters on files on Win32
    platforms:

    #!/usr/bin/perl -w
    use strict;
    use open IN => ':raw';
    $/ = undef; # set "slurp" mode

    while (<>) {
    my $fileLen = -s $ARGV;
    my $numChars = length $_;
    print "File \"$ARGV\" contains $fileLen bytes",
    " and $numChars characters.\n";
    }
    __END__

    Thanks again!

    -- Jean-Luc
    J. Romano, Nov 17, 2003
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Juha Nieminen
    Replies:
    1
    Views:
    341
    James Kanze
    Mar 2, 2009
  2. Dmitry Borodaenko

    StringIO#binmode: bug in cgi.rb and yaml.rb

    Dmitry Borodaenko, Aug 20, 2003, in forum: Ruby
    Replies:
    2
    Views:
    176
    Nobuyoshi Nakada
    Aug 23, 2003
  3. Chad Perrin
    Replies:
    11
    Views:
    236
    Chad Perrin
    Apr 16, 2007
  4. maryellen sniffen

    <> the Diamond Operator

    maryellen sniffen, Dec 31, 2004, in forum: Perl Misc
    Replies:
    3
    Views:
    119
    Anno Siegel
    Dec 31, 2004
  5. Roedy Green

    diamond operator

    Roedy Green, Apr 4, 2012, in forum: Java
    Replies:
    25
    Views:
    1,140
    Arne Vajhøj
    May 6, 2012
Loading...

Share This Page