Carriage Return / Line Feed question

C

Chris Kolosiwsky

<de-lurk>
Hello all,

Given the script listed below:

LINE: while (<>)
{
while (! m/.+?<endad>/){
if(m/<cat:(\d+)>/) {
$cat = $1;
}
if($cat =~ /^14/) {
if(m/(>(.+?)<endad>)/) {
print $cat . "\|" . $2 ."\n";

}
}
next LINE;
}
}

and the data format as:

<cat:nnnnn>
<some useless discarded text>
<logo:>TEXT that I want to keep<endad>

(each line is seperate)

and this is the expected output:

nnnnn|TEXT that I want to keep


Is there any reason that this script should function fine in files
that use a \x0d\x0a between lines instead of just a \x0d?

The script gives the expected output in the CR/LF scenario, but int he
CR case, I get nothing.

I'm exceptionally sorry if this is listed in the faq, but a perldoc -q
"carriage return" returned zip.

TIA

Chris

<re-lurk>
 
D

David Efflandt

<de-lurk>
Hello all,

Given the script listed below:

LINE: while (<>)
{
while (! m/.+?<endad>/){
if(m/<cat:(\d+)>/) {
$cat = $1;
}
if($cat =~ /^14/) {
if(m/(>(.+?)<endad>)/) {
print $cat . "\|" . $2 ."\n";

}
}
next LINE;
}
}

and the data format as:

<cat:nnnnn>
<some useless discarded text>
<logo:>TEXT that I want to keep<endad>

(each line is seperate)

and this is the expected output:

nnnnn|TEXT that I want to keep


Is there any reason that this script should function fine in files
that use a \x0d\x0a between lines instead of just a \x0d?

It depends what OS the script is running on. An OS that expects \x0d\0a
for line endings (DOS/Win) is not going to recognize just \x0d (old Mac)
as a line ending. An OS that uses \x0a for line endings would not
recognize \x0d as a line ending and may give unexpected results with
\x0d\x0a line endings.

So you should either convert data line endings to proper type for the OS
the script is running on or, set $/ to whatever you expect actual line
endings to be (see: perldoc perlvar).
The script gives the expected output in the CR/LF scenario, but int he
CR case, I get nothing.

Because no line endings were found and the data all ended up in one long
line, therefore, breaking your regex's.
 
C

Chris Kolosiwsky

<original post -- 'snip'>

It depends what OS the script is running on. An OS that expects \x0d\0a
for line endings (DOS/Win) is not going to recognize just \x0d (old Mac)
as a line ending. An OS that uses \x0a for line endings would not
recognize \x0d as a line ending and may give unexpected results with
\x0d\x0a line endings.

So you should either convert data line endings to proper type for the OS
the script is running on or, set $/ to whatever you expect actual line
endings to be (see: perldoc perlvar).

I should have included this in the initial post, but the text file is
generated on a solaris machine and the script is being run from a linux
box using perl 5.8. When the file was ftp'd to a DOS box, the ascii
transfer converted the CR to CR/LF but that was to the DOS box. Another
file with only a CR (still running on a linux box) transferred via FTP
ascii (but not to a DOS machine) resulted in no output. A hex dump of the
first (DOS FTP) file shows the CR/LF and a hex dump of the second file
(unix -> linux FTP) shows only a CR.

I will try setting $/ and update. Thanks!
Because no line endings were found and the data all ended up in one long
line, therefore, breaking your regex's.

I had pretty much figured that this is what was happening (although, it
took me pretty much a whole day to ash it out... Ick.)

Thanks

Chris
 
D

David Efflandt

<original post -- 'snip'>



I should have included this in the initial post, but the text file is
generated on a solaris machine and the script is being run from a linux
box using perl 5.8. When the file was ftp'd to a DOS box, the ascii
transfer converted the CR to CR/LF but that was to the DOS box. Another
file with only a CR (still running on a linux box) transferred via FTP
ascii (but not to a DOS machine) resulted in no output. A hex dump of the
first (DOS FTP) file shows the CR/LF and a hex dump of the second file
(unix -> linux FTP) shows only a CR.

What generated the data with CR's in it. Both Solaris and Linux use LF
for newlines in text files. If you transfer files directly between
Solaris and Linux, ascii or binary mode does not matter because no
conversion is necessary (I typically use scp). If it passes though
Windows use ascii mode both to and from Windows. I think only pre-OS X
Mac uses CR only for line endings.
I will try setting $/ and update. Thanks!

Maybe you need to look at what generates the data in the first place and
see if it is malformed (if it is Perl it should be using "\n" for
newlines). But note that data from web form textareas may contain CR-LF
pairs regardless of browser OS.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,754
Messages
2,569,526
Members
44,997
Latest member
mileyka

Latest Threads

Top