windows one liner to output unix line feed

B

boman

I have a simple one liner running on Windows that does a substitution.
However, with the -p option, the line endings are coming out \r\n, and
I need them to be just \n.

I searched for the -l option, but couldn't find how to specify unix
line breaks on output

Here's the code:

perl -pi.bak -e "s|foo|bar|g" myfile.txt

thanks
 
B

Brian Wakem

David said:
perhaps you need to chomp the windows line ending, and add the unix line
ending manually:

perl -pi.bak -e "s|foo|bar|g; chomp; print qq{$_\n}" myfile.txt

How could that work when the output stream converts \n to the
system-defined line ending?[/QUOTE]


It wouldn't.

Using \x0A instead of \n should work.
 
S

sln

That seems unlikely, since "\x0a" and "\n" have exactly the same
value.

I don't know how unix deals with CR. Apparently it just does LF's
on output instead of CRLF.
It really doesen't matter what the :crlf layer does in unix.
It only matters how the console device interprets CR's as a tty.
You can have a thousand CR's and one LF before you print and it will
still print on the next line (Mac may be different).

If his file eol is all CR's ala Mac, then he opened it without an eol
and processed the whole file as one line. Otherwise, a series of
text embedded with just CR's on a console where a CR is a control character,
will result in overwrites of the lines without LF's.

If I had a unix machine I could try it out. It seems it would be a major
fopah if Perl didn't get the basic unix console correct, lol.

binmode (STDOUT, ":raw");
print " \nthis \x0a is \x0d\x0d\x0d\x0d\x0a a \x0a test\x0d\x0d\x0a";
print " \nthis \x0a\x0d is \x0d\x0d\x0d\x0d\x0a\x0d a \x0a test\x0d\x0d\x0a";
print " \nthis \x0d is \x0d\x0d\x0d\x0d\x0d a \x0d test\x0d\x0d\x0d";

-sln
 
D

Dr.Ruud

Scott said:
Not on a Windows system.

You are still mixing up layers.

Find out about PerlIO,
see for starters `perldoc -f binmode`:

The operating system, device drivers, C libraries, and Perl run-time
system all work together to let the programmer treat a single character
("\n") as the line terminator, irrespective of the external
representation. On many operating systems, the native text file
representation matches the internal representation, but on some
platforms the external representation of "\n" is made up of more than
one character.

Lasagna! How Latin to name the contents after the container.
 
P

Peter J. Holzer

Not on a Windows system.

Please.

Is it really so hard to test your assumptions before posting them?
Especially when several regulars have already explained that your
assumptions are wrong?

| Microsoft Windows [Version 6.0.6001]
| Copyright (c) 2006 Microsoft Corporation. Alle Rechte vorbehalten.
|
| C:\>perl -le "print qq{\n} eq qq{\x0A}"
| 1
|
| C:\>perl -v
|
| This is perl, v5.10.0 built for MSWin32-x64-multi-thread
| (with 5 registered patches, see perl -V for more detail)
|
| Copyright 1987-2007, Larry Wall
|
| Binary build 1004 [287188] provided by ActiveState http://www.ActiveState.com
| Built Sep 3 2008 12:22:07

hp
 
S

Scott Bryce

Peter said:
Please.

Is it really so hard to test your assumptions before posting them?

Actually, I DID test this.

-----------

use strict;
use warnings;

open my $OUTFILE, '>', 'test.txt' or die 'Cannot open test.txt';
print $OUTFILE "A line of text\n";
print $OUTFILE "A line of text\n";

close $OUTFILE or die 'cannot close test.txt';

-----------

Both lines in test.txt end in \x0D\x0A

But now I see my mistake. If I binmode $OUTFILE, then both lines end in
\x0A.

The assumption I was making is consistent with the documentation:

The operating system, device drivers, C libraries, and Perl run-time
system all work together to let the programmer treat a single character
(\n ) as the line terminator, irrespective of the external representation.

which suggests that "\n" does not have a specific ASCII value. So the
bottom line is that on a Windows system, the value of "\n" depends on
whether you are working in text mode or binary mode.

A search of the docs for "\n" came up blank, but as someone else pointed
out, everything is spelled out in the docs under binmode.

I stand somewhat corrected.
 
P

Peter J. Holzer

Actually, I DID test this.

-----------

use strict;
use warnings;

open my $OUTFILE, '>', 'test.txt' or die 'Cannot open test.txt';
print $OUTFILE "A line of text\n";
print $OUTFILE "A line of text\n";

close $OUTFILE or die 'cannot close test.txt';

-----------

No, you didn't. Your script doesn't use \x0A, so it says nothing about
whether \n and \x0A are the same or not.

Change the second print into:

print $OUTFILE "A line of text\x0A";
Both lines in test.txt end in \x0D\x0A

With the second line changed, still both lines end with 0D 0A in the
file. This is an indication (but no proof) that "\n" and "\x0A" are
indeed the same.

But now I see my mistake. If I binmode $OUTFILE, then both lines end in
\x0A.

The assumption I was making is consistent with the documentation:

The operating system, device drivers, C libraries, and Perl run-time
system all work together to let the programmer treat a single character
(\n ) as the line terminator, irrespective of the external representation.

which suggests that "\n" does not have a specific ASCII value.

This is true (on MacOS classic "\n" was "\x0D" and on EBCDIC based
systems it's "\x15", IIRC), but on Windows "\n" is always "\x0A".

And in any case it is always a single character, regardless of the
convention for text files on the OS.
So the bottom line is that on a Windows system, the value of "\n"
depends on whether you are working in text mode or binary mode.

No, it doesn't. On Windows "\n" is always the single character with the
code 10 decimal (or 0x0A hexadecimal).

The difference between text mode and binary mode is that text mode
converts to the text file convention of the local OS: For Windows that
means that when you print the single character "\n", the :crlf layer
sends to bytes (0x0D 0x0A) to the file. That's a relatively simple
conversion, there are more complicated conversions. For example, some
OSs had text files with fixed length, space padded lines. On such a
system,

print $OUTFILE "A line of text\n";

causes (for example) 80 bytes to be written to the file: "A line of
text" followed by 66 spaces. But that doesn't mean that the value of
"\n" is 66 spaces.

hp
 
K

Keith Thompson

Scott Bryce said:
Not on a Windows system.

Have you tried it?

perl -e "if (qq(\x0a) eq qq(\n)) { print qq(yes\n) } else { print qq(no\n) }"

I don't have access to a Windows system at the moment; I'll try it
tomorrow.
 
S

sln

(e-mail address removed) wrote:
[...]
It really doesen't matter what the :crlf layer does in unix.
It only matters how the console device interprets CR's as a tty.
You can have a thousand CR's and one LF before you print and it will
still print on the next line (Mac may be different).

I think you're mixing up two different things here, one is
text file IO (and the way Perl implements it on different
platforms) and the other is console IO, which would only
apply when piping/directing to or from a perl script.
How could I mix that up. In my example I put the handle in binary
mode. The i/o layers don't care about the handle do they?
Nobodys going to seek the un-seekable. Likewise, the stream gets
a control character that happens to be 0x0a, a line-feed, apparently
free of i/o layer interaction.
Didn't you get that sense?
And here you're mixing up Perl's behaviour for input line separators
and output line separators too. Using the "-i" switch, there's no
console involved, Perl simply opens a file and sequencially inserts
what gets passed to print. If the file has been opened with a :crlf
layer, this just means that any newlines encountered in the process
are replaced by a sequence of carriage return plus newline.
And where did you get the notion I was talking about Perls notion
of line seperators or anything related to anything but BINARY output
on a handle ?
I know the layers of Perl better than you do by your response.
What a console shows you when you cat the file contents is a very
different thing.
I find this statement incredible.
Again, what's that got to do with a console? You're seriously getting
off the track.

-Chris

Chris, you should re-read what I wrote. I wasn't writing about Perl
at all. It was all about the device, it had nothing whatsoever to do with
Perl in the slighetest degree!

The output stream to the device was binary, had nothing to do with perlio
layers other than putting the stream in binary before delivering the data
which was odododoaoaododod and other combinations. Devices are themselves
independently act on binary control codes, and CR, LF, FF's are control codes
devices promote to a wide range of different, sometimes visual results, if
thats its nature.

-sln
 
S

sln

(e-mail address removed) wrote:
[...]
It really doesen't matter what the :crlf layer does in unix.
It only matters how the console device interprets CR's as a tty.
You can have a thousand CR's and one LF before you print and it will
still print on the next line (Mac may be different).

I think you're mixing up two different things here, one is
text file IO (and the way Perl implements it on different
platforms) and the other is console IO, which would only
apply when piping/directing to or from a perl script.
How could I mix that up. In my example I put the handle in binary
mode. The i/o layers don't care about the handle do they?
Nobodys going to seek the un-seekable. Likewise, the stream gets
a control character that happens to be 0x0a, a line-feed, apparently
free of i/o layer interaction.
Didn't you get that sense?
And here you're mixing up Perl's behaviour for input line separators
and output line separators too. Using the "-i" switch, there's no
console involved, Perl simply opens a file and sequencially inserts
what gets passed to print. If the file has been opened with a :crlf
layer, this just means that any newlines encountered in the process
are replaced by a sequence of carriage return plus newline.
And where did you get the notion I was talking about Perls notion
of line seperators or anything related to anything but BINARY output
on a handle ?
I know the layers of Perl better than you do by your response.
What a console shows you when you cat the file contents is a very
different thing.
I find this statement incredible.
Again, what's that got to do with a console? You're seriously getting
off the track.

-Chris

Chris, you should re-read what I wrote. I wasn't writing about Perl
at all. It was all about the device, it had nothing whatsoever to do with
Perl in the slighetest degree!

The output stream to the device was binary, had nothing to do with perlio
layers other than putting the stream in binary before delivering the data
which was odododoaoaododod and other combinations. Devices are themselves
independently act on binary control codes, and CR, LF, FF's are control codes
devices promote to a wide range of different, sometimes visual results, if
thats its nature.

-sln

In addition, I was giving examples suggesting that Perl did nothing wrong,
even if the filehandle (apparently not) is not open for binary output.
The suggestion is that the device could be reacting (normally) to embedded
CR's from a cross-platform, or who knows, some binary interaction, by the
user.

Or his device is in a mode that translates control characters differently.

Obviously, the way to debug is to inspect the file, inspect the device mode,
then make a determination. The way NOT to debug, is suspecting Perl a culprit
in something that would have showed up hundreds of thousands of times before
this.

But, apparently, to get the newbies all rieled up on a wild goose chase,
that is seemingly whats happening.

-sln
 
B

boman

Chris's suggestion worked:

perl -pi.orig -e "binmode(ARGVOUT); s|foo|bar|g" myfile.txt

The original line endings are Unix, and when the file is run thru this
oneliner on Windows, the line endings remain as Unix.

According to perl -v on my system, I'm running v5.8.8 built for
MSWin32-x86-multi-thread.

I appreciate the amount of attention this little issue received, I
certainly have learned a lot from this discussion.

My sincere thanks to you all.

best,
Bo
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top