&read_file in File::Slurp terminates unexpectedly on file

Discussion in 'Perl Misc' started by Charles R. Thompson, Jan 12, 2004.

  1. I'm working through the conversion of some fixed-length record files with
    extended ASCII data and a series of characters in the some of the files
    appears to be causing read_file to assume it's at the end of the file. These
    won't translate in the various readers so I'll notate. My hex editor says
    the last characters where it terminates are:

    00 C7 07 CA 1A 29 00

    I see everything up to the 1A (Decimal 26), meaning I can see the CA as the
    end. According to an ASCII chart I found online 1A is the 'substitute'
    character.

    Is there a method in Perl I can use to ensure an entire file is read so I
    can read every character without incident?

    Charles
    Charles R. Thompson, Jan 12, 2004
    #1
    1. Advertising

  2. Charles R. Thompson

    Ben Morrow Guest

    "Charles R. Thompson" <> wrote:
    > I'm working through the conversion of some fixed-length record files with
    > extended ASCII data and a series of characters in the some of the files
    > appears to be causing read_file to assume it's at the end of the file. These
    > won't translate in the various readers so I'll notate. My hex editor says
    > the last characters where it terminates are:
    >
    > 00 C7 07 CA 1A 29 00
    >
    > I see everything up to the 1A (Decimal 26), meaning I can see the CA as the
    > end. According to an ASCII chart I found online 1A is the 'substitute'
    > character.
    >
    > Is there a method in Perl I can use to ensure an entire file is read so I
    > can read every character without incident?


    Have you called binmode() on the filehandle concerned?

    Ben

    --
    If I were a butterfly I'd live for a day, / I would be free, just blowing away.
    This cruel country has driven me down / Teased me and lied, teased me and lied.
    I've only sad stories to tell to this town: / My dreams have withered and died.
    <=>=<=>=<=>=<=>=<=>=<=>=<=>=<=>=<=>=<=>=<=> (Kate Rusby)
    Ben Morrow, Jan 12, 2004
    #2
    1. Advertising

  3. Charles R. Thompson

    Jay Tilton Guest

    "Charles R. Thompson" <> wrote:

    : My hex editor says
    : the last characters where it terminates are:
    :
    : 00 C7 07 CA 1A 29 00
    :
    : I see everything up to the 1A (Decimal 26), meaning I can see the CA as the
    : end. According to an ASCII chart I found online 1A is the 'substitute'
    : character.

    On DOS-ish filesystems, character 0x1A marks the end-of-file when
    reading the file as text.

    : Is there a method in Perl I can use to ensure an entire file is read so I
    : can read every character without incident?

    binmode() the filehandle. This will screw up the normal CRLF
    translation, but that's easily remedied.
    Jay Tilton, Jan 12, 2004
    #3
  4. Charles R. Thompson

    Trent Curry Guest

    Jay Tilton wrote:
    > "Charles R. Thompson" <> wrote:
    >
    >> My hex editor says
    >> the last characters where it terminates are:
    >>
    >> 00 C7 07 CA 1A 29 00
    >>
    >> I see everything up to the 1A (Decimal 26), meaning I can see the CA
    >> as the end. According to an ASCII chart I found online 1A is the
    >> 'substitute' character.

    >
    > On DOS-ish filesystems, character 0x1A marks the end-of-file when
    > reading the file as text.


    Yes, on my WinXP Pro system if I insert 0x1A (Ctrl + Z) in the middle
    file and read it without binmode() it gets cut off there.

    Just FYI, the same is not true in a unix/linux based envirornment. 0x04
    (Ctrl + D) and 0x03 (Ctrl + C) characters insert into the file does not
    prevent reading to the end. It is my understnading that this is a Win32
    quirk (at least NT based; I have no Win9x/ME systems to check with.)

    >> Is there a method in Perl I can use to ensure an entire file is read
    >> so I can read every character without incident?

    >
    > binmode() the filehandle. This will screw up the normal CRLF
    > translation, but that's easily remedied.


    It still read just fine, but if you want the end result to be just \n
    (LF) instead of \r\n (CRLF) a simple

    $line = s!\r\n!\n!g;

    for each line oughtta do it.

    Or one better:

    $line = s!\r\n|\r!\n!g;

    (Or if you don't find reading the whole file to memory:)

    local $/ = undef;
    (my $file = <SOMEFILE>) =~ s!\r\n|\r!\n!g;

    Though if you know the file will large linebe line is best suited, and
    usually the way to go in most cases.

    --
    Trent Curry

    perl -e
    '($s=qq/e29716770256864702379602c6275605/)=~s!([0-9a-f]{2})!pack("h2",$1
    )!eg;print(reverse("$s")."\n");'
    Trent Curry, Jan 13, 2004
    #4
  5. Charles R. Thompson

    Uri Guttman Guest

    >>>>> "BM" == Ben Morrow <> writes:

    BM> "Charles R. Thompson" <> wrote:

    >> Is there a method in Perl I can use to ensure an entire file is read so I
    >> can read every character without incident?


    BM> Have you called binmode() on the filehandle concerned?

    and you can enable binmode when using File::Slurp (a recent
    version). the older module couldn't do binmode nor an already open
    handle that had binmode called on it.

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Uri Guttman, Jan 13, 2004
    #5
  6. > >> Is there a method in Perl I can use to ensure an entire file is read
    so I
    > >> can read every character without incident?


    > and you can enable binmode when using File::Slurp (a recent
    > version). the older module couldn't do binmode nor an already open
    > handle that had binmode called on it.


    I am using an older version, you are correct. I found the binmode answer
    earlier after searching more on "1A" and Perl. I have to say after that
    searching I found an alarming number of posts with my same problem. I had
    previously went to the FAQs first and tried all the examples under "How can
    I read in an entire file all at once? " hoping one of them provided a clue,
    no dice.

    Even though this appears to be Windows specific, I think including a note on
    binmode in that particular FAQs section would be very beneficial. Not a cop
    out... I fully realize now searching a bit more with some specifics would
    have gotten my answer, but I also wouldn't have spent my time and others
    here if it were in the FAQs.

    Just a thought.

    Charles
    Charles R. Thompson, Jan 13, 2004
    #6
  7. You need to use the binmode function, or the three argument open, with O_BINARY.

    --
    Cheers,
    Ben Liddicott

    "Charles R. Thompson" <> wrote in message news:...

    > 00 C7 07 CA 1A 29 00
    >
    > I see everything up to the 1A (Decimal 26), meaning I can see the CA as the
    > end. According to an ASCII chart I found online 1A is the 'substitute'
    > character.
    >
    > Is there a method in Perl I can use to ensure an entire file is read so I
    > can read every character without incident?
    Ben Liddicott, Jan 13, 2004
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dick Davies
    Replies:
    1
    Views:
    106
    Gavin Sinclair
    Sep 29, 2005
  2. Wes Gamble
    Replies:
    7
    Views:
    121
    Lyle Johnson
    Mar 23, 2006
  3. Tom Sliva
    Replies:
    7
    Views:
    105
    Tom Sliva
    Nov 23, 2004
  4. John
    Replies:
    10
    Views:
    228
    J. Gleixner
    Mar 26, 2009
  5. kj
    Replies:
    7
    Views:
    164
Loading...

Share This Page