Carriage Return / Line Feed question

Discussion in 'Perl Misc' started by Chris Kolosiwsky, Jul 11, 2003.

  1. <de-lurk>
    Hello all,

    Given the script listed below:

    LINE: while (<>)
    {
    while (! m/.+?<endad>/){
    if(m/<cat:(\d+)>/) {
    $cat = $1;
    }
    if($cat =~ /^14/) {
    if(m/(>(.+?)<endad>)/) {
    print $cat . "\|" . $2 ."\n";

    }
    }
    next LINE;
    }
    }

    and the data format as:

    <cat:nnnnn>
    <some useless discarded text>
    <logo:>TEXT that I want to keep<endad>

    (each line is seperate)

    and this is the expected output:

    nnnnn|TEXT that I want to keep


    Is there any reason that this script should function fine in files
    that use a \x0d\x0a between lines instead of just a \x0d?

    The script gives the expected output in the CR/LF scenario, but int he
    CR case, I get nothing.

    I'm exceptionally sorry if this is listed in the faq, but a perldoc -q
    "carriage return" returned zip.

    TIA

    Chris

    <re-lurk>
    Chris Kolosiwsky, Jul 11, 2003
    #1
    1. Advertising

  2. On Fri, 11 Jul 2003, Chris Kolosiwsky <phignutonatnospampleasehotmailcom> wrote:
    ><de-lurk>
    > Hello all,
    >
    > Given the script listed below:
    >
    > LINE: while (<>)
    > {
    > while (! m/.+?<endad>/){
    > if(m/<cat:(\d+)>/) {
    > $cat = $1;
    > }
    > if($cat =~ /^14/) {
    > if(m/(>(.+?)<endad>)/) {
    > print $cat . "\|" . $2 ."\n";
    >
    > }
    > }
    > next LINE;
    > }
    > }
    >
    > and the data format as:
    >
    ><cat:nnnnn>
    ><some useless discarded text>
    ><logo:>TEXT that I want to keep<endad>
    >
    > (each line is seperate)
    >
    > and this is the expected output:
    >
    > nnnnn|TEXT that I want to keep
    >
    >
    > Is there any reason that this script should function fine in files
    > that use a \x0d\x0a between lines instead of just a \x0d?


    It depends what OS the script is running on. An OS that expects \x0d\0a
    for line endings (DOS/Win) is not going to recognize just \x0d (old Mac)
    as a line ending. An OS that uses \x0a for line endings would not
    recognize \x0d as a line ending and may give unexpected results with
    \x0d\x0a line endings.

    So you should either convert data line endings to proper type for the OS
    the script is running on or, set $/ to whatever you expect actual line
    endings to be (see: perldoc perlvar).

    > The script gives the expected output in the CR/LF scenario, but int he
    > CR case, I get nothing.


    Because no line endings were found and the data all ended up in one long
    line, therefore, breaking your regex's.

    > I'm exceptionally sorry if this is listed in the faq, but a perldoc -q
    > "carriage return" returned zip.
    >
    > TIA
    >
    > Chris
    >
    ><re-lurk>


    --
    David Efflandt - All spam ignored http://www.de-srv.com/
    David Efflandt, Jul 12, 2003
    #2
    1. Advertising

  3. <original post -- 'snip'>

    On Sat, 12 Jul 2003 00:06:58 +0000, David Efflandt wrote:

    > It depends what OS the script is running on. An OS that expects \x0d\0a
    > for line endings (DOS/Win) is not going to recognize just \x0d (old Mac)
    > as a line ending. An OS that uses \x0a for line endings would not
    > recognize \x0d as a line ending and may give unexpected results with
    > \x0d\x0a line endings.
    >
    > So you should either convert data line endings to proper type for the OS
    > the script is running on or, set $/ to whatever you expect actual line
    > endings to be (see: perldoc perlvar).
    >


    I should have included this in the initial post, but the text file is
    generated on a solaris machine and the script is being run from a linux
    box using perl 5.8. When the file was ftp'd to a DOS box, the ascii
    transfer converted the CR to CR/LF but that was to the DOS box. Another
    file with only a CR (still running on a linux box) transferred via FTP
    ascii (but not to a DOS machine) resulted in no output. A hex dump of the
    first (DOS FTP) file shows the CR/LF and a hex dump of the second file
    (unix -> linux FTP) shows only a CR.

    I will try setting $/ and update. Thanks!

    >> The script gives the expected output in the CR/LF scenario, but int he
    >> CR case, I get nothing.

    >
    > Because no line endings were found and the data all ended up in one long
    > line, therefore, breaking your regex's.
    >


    I had pretty much figured that this is what was happening (although, it
    took me pretty much a whole day to ash it out... Ick.)

    Thanks

    Chris
    Chris Kolosiwsky, Jul 12, 2003
    #3
  4. On Sat, 12 Jul 2003, Chris Kolosiwsky
    <phignuton@_nospamplease_hotmail.com> wrote:
    ><original post -- 'snip'>
    >
    > On Sat, 12 Jul 2003 00:06:58 +0000, David Efflandt wrote:
    >
    >> It depends what OS the script is running on. An OS that expects \x0d\0a
    >> for line endings (DOS/Win) is not going to recognize just \x0d (old Mac)
    >> as a line ending. An OS that uses \x0a for line endings would not
    >> recognize \x0d as a line ending and may give unexpected results with
    >> \x0d\x0a line endings.
    >>
    >> So you should either convert data line endings to proper type for the OS
    >> the script is running on or, set $/ to whatever you expect actual line
    >> endings to be (see: perldoc perlvar).
    >>

    >
    > I should have included this in the initial post, but the text file is
    > generated on a solaris machine and the script is being run from a linux
    > box using perl 5.8. When the file was ftp'd to a DOS box, the ascii
    > transfer converted the CR to CR/LF but that was to the DOS box. Another
    > file with only a CR (still running on a linux box) transferred via FTP
    > ascii (but not to a DOS machine) resulted in no output. A hex dump of the
    > first (DOS FTP) file shows the CR/LF and a hex dump of the second file
    > (unix -> linux FTP) shows only a CR.


    What generated the data with CR's in it. Both Solaris and Linux use LF
    for newlines in text files. If you transfer files directly between
    Solaris and Linux, ascii or binary mode does not matter because no
    conversion is necessary (I typically use scp). If it passes though
    Windows use ascii mode both to and from Windows. I think only pre-OS X
    Mac uses CR only for line endings.

    > I will try setting $/ and update. Thanks!


    Maybe you need to look at what generates the data in the first place and
    see if it is malformed (if it is Perl it should be using "\n" for
    newlines). But note that data from web form textareas may contain CR-LF
    pairs regardless of browser OS.

    --
    David Efflandt - All spam ignored http://www.de-srv.com/
    http://www.autox.chicago.il.us/ http://www.berniesfloral.net/
    http://cgi-help.virtualave.net/ http://hammer.prohosting.com/~cgi-wiz/
    David Efflandt, Jul 13, 2003
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dave Tichenor
    Replies:
    3
    Views:
    32,126
    Steven Cheng[MSFT]
    Feb 17, 2004
  2. =?Utf-8?B?UmFlZCBTYXdhbGhh?=

    Line Feed & Carriage Return with XML?

    =?Utf-8?B?UmFlZCBTYXdhbGhh?=, Oct 25, 2004, in forum: ASP .Net
    Replies:
    2
    Views:
    12,085
    Kevin Spencer
    Oct 25, 2004
  3. John Dalberg
    Replies:
    0
    Views:
    467
    John Dalberg
    Jan 21, 2005
  4. rjweytens
    Replies:
    3
    Views:
    8,091
    rjweytens
    Jul 30, 2004
  5. Steve Anderson
    Replies:
    3
    Views:
    248
    Steve Anderson
    Jun 21, 2004
Loading...

Share This Page