\r\n or \n notepad editor end line ???

A

ajikoe

Hello,

I use windows notepad editor to write text.

For example I write (in d:\myfile.txt):
Helo
World

If I open it with python:
FName = open(d:\myfile.txt,'r')
h = FName.readlines()
print h

I get h : ['Helo\n', 'World']

I thought notepad use \r\n to to end the line.

What's wrong with it?

pujo
 
M

Max M

Hello,

I use windows notepad editor to write text.

For example I write (in d:\myfile.txt):
Helo
World

If I open it with python:
FName = open(d:\myfile.txt,'r')
h = FName.readlines()
print h

I get h : ['Helo\n', 'World']

I thought notepad use \r\n to to end the line.

What's wrong with it?


Python tries to be clever. Open it in binary mode to avoid it:

FName = open(d:\myfile.txt,'rb')


--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science
 
B

Bill Mill

Hello,

I use windows notepad editor to write text.

For example I write (in d:\myfile.txt):
Helo
World

If I open it with python:
FName = open(d:\myfile.txt,'r')
h = FName.readlines()
print h

I get h : ['Helo\n', 'World']

I thought notepad use \r\n to to end the line.

What's wrong with it?
'testing\n1\n2\n3'

Peace
Bill Mill
bill.mill at gmail.com
 
A

ajikoe

Hello thanks everyone,

It means in windows we should use 'wb' to write and 'rb' to read ?
Am I right?

pujo
 
S

Simon Brunning

It means in windows we should use 'wb' to write and 'rb' to read ?
Am I right?

It depends what you are trying to do with the file. If you are
processing it as a text file, open it as a text file, and all will be
well:

my_file = open('my_file.txt')
for line in my_file:
# whatever...
 
P

Peter Hansen

It means in windows we should use 'wb' to write and 'rb' to read ?
Am I right?

There is a conceptual difference between "text" files and other files
(which are lumped under the label "binary").

Binary files have any kind of data in them (bytes from 0 to 255) and no
inherent concept of "lines", and thus no line ending sequences.

Text files generally have no control characters except those used to
terminate the lines (lines are by definition sequences of bytes followed
by the line ending sequence). The line ending sequence is
platform-dependent: Linux and most other things use just \n (LineFeed),
while Windows/DOS uses \r\n (CarriageReturn + LineFeed) and the old
MacOS used just \r (CarriageReturn).

Since the specific line ending used on your platform is rarely important
to you, so long as you are compatible with other applications that use
"text" files (such as Notepad), you should use just "r" and "w" to read
and write files that you consider "text" (and note: it's just a
convention, in your mind... and at some times a given file might be
treated the other way, quite legitimately). Python (actually the
underlying libraries, I believe) will convert the platform-specific line
endings to \n when reading text files, and will convert \n to the proper
line ending sequence for your platform when writing.

If you don't want this conversion, which is unlikely if this is really
just a text file, then and only then do you want to use "rb" and "wb".

So the answer to the question "What should I be using?" depends entirely
on you: if you were interested in seeing the raw bytes that Notepad
wrote, then use "rb". If you want to work with that file as a *text*
file, then use just "r".

Note also the existence of the "U" modifier. Opening a file with "rU"
will read any of the aforementioned line-ending sequences and convert
them to just \n, allowing you to work with text files created on other
platforms. (I don't believe there's a "wU" and conceptually it's sort
of meaningless anyway, so you would just use "w" to write the file out
again.)

-Peter
 
A

ajikoe

Hello All,

Thanks for the response.

I use mysql and find something strange lately while load text file to
my database table using LINES TERMINATED BY '\r\n',

And I found that mysql think I have '\r\r\n'. this is happened because
in one of my code I use 'w' to write element of string + '\r\n'. now I
understand why this happened.

Actually I prefer to use 'w' and 'r' because the termination of the
line will always '\n'.
By changing mycode to always use 'w' and write element of string +'\n'
instead of +'\r\n' and let my mysql code use LINES TERMINATED '\r\n' I
think solve this problem. :)

pujo
 
G

Greg Ewing

Peter said:
(I don't believe there's a "wU" and conceptually it's sort
of meaningless anyway,

If we ever get quantum computers, presumably "wU" will
write the newlines in all possible formats simultaneously...
 
F

Fredrik Lundh

It means in windows we should use 'wb' to write and 'rb' to read ?
Am I right?

no.

you should use "wb" to write *binary* files, and "rb" to read *binary*
files.

if you're working with *text* files (that is, files that contain lines of text
separated by line separators), you should use "w" and "r" instead, and
treat a single "\n" as the line separator.

</F>
 
S

Steven D'Aprano

no.

you should use "wb" to write *binary* files, and "rb" to read *binary*
files.

if you're working with *text* files (that is, files that contain lines of text
separated by line separators), you should use "w" and "r" instead, and
treat a single "\n" as the line separator.

I get nervous when I read instructions like this. It sounds too much like
voodoo: "Do this, because it works, never mind how or under what
circumstances, just obey or the Things From The Dungeon Dimensions will
suck out your brain!!!"

Sorry Fredrik :)

When you read a Windows text file using "r" mode, what happens to the \r
immediately before the newline? Do you have to handle it yourself? Or will
Python cleverly suppress it so you don't have to worry about it?

And when you write a text file under Python using "w" mode, will the
people who come along afterwards to edit the file in Notepad curse your
name? Notepad expects \r\n EOL characters, and gets cranky if the \r is
missing.

How does this behaviour differ from "universal newlines"?
 
P

Peter Hansen

Steven said:
When you read a Windows text file using "r" mode, what happens to the \r
immediately before the newline? Do you have to handle it yourself? Or will
Python cleverly suppress it so you don't have to worry about it?

And when you write a text file under Python using "w" mode, will the
people who come along afterwards to edit the file in Notepad curse your
name? Notepad expects \r\n EOL characters, and gets cranky if the \r is
missing.

This is Python. Fire up the interactive interpreter and try it out! It
will take all of two or three minutes...
How does this behaviour differ from "universal newlines"?

If you open a text file created with a different line-ending convention
than that used on your own platform, you may get "interesting" results.
If you use "rU" instead, you will receive only \n line endings and not
have anything to worry about. (For example, reading a Windows text file
on Linux will give you lines that have \r\n endings in them... not what
you really want. Using "rU" will give you just \n line endings whether
you are on Linux or Windows.)

-Peter
 
S

Steven D'Aprano

This is Python. Fire up the interactive interpreter and try it out! It
will take all of two or three minutes...

Which I would have done, except see comment below:
If you open a text file created with a different line-ending convention
than that used on your own platform, you may get "interesting" results.

I'm using Linux, not Windows. Of course I *could* try to fake reading and
writing Windows files from Linux, but that would require me actually
understanding how Python deals with line endings across the platforms in
order to generate them in the first place. But if I understood it, I
wouldn't have needed to ask the question.

And I would still be none the wiser about Python's behaviour when running
under Windows -- that's hard to fake on a Linux box.
If you use "rU" instead, you will receive only \n line endings and not
have anything to worry about. (For example, reading a Windows text file
on Linux will give you lines that have \r\n endings in them... not what
you really want. Using "rU" will give you just \n line endings whether
you are on Linux or Windows.)

So going back to the original question... if I open in "r" mode a text
file which was created under Windows, I will get \r characters in the
text and have to deal with them regardless of what platform I am running
Python under. Correct?

If I use "rU" mode Python suppress the \r characters. When I write a
string back, I'm responsible for making sure the EOL markers are correct
for whatever platform I expect the file to be read under. Unless Python
can somehow magically read my mind and know that even though I'm writing
the file under Linux it will be read later under Windows. Am I close?


(Ew, I hate cross-platform issues. They make my head hurt.)
 
T

Terry Hancock

I get nervous when I read instructions like this. It sounds too much like
voodoo: "Do this, because it works, never mind how or under what
circumstances, just obey or the Things From The Dungeon Dimensions will
suck out your brain!!!"

Actually, it's very simple. It just means that '\n' in memory maps to '\r\n' on
disk, and vice-versa. So long as you remain on Windows (or MS-DOS for
that matter).
When you read a Windows text file using "r" mode, what happens to the \r
immediately before the newline? Do you have to handle it yourself? Or will
Python cleverly suppress it so you don't have to worry about it?

'\n' in memory, '\r\n' on disk.

Do the same thing on a Linux or Unix system, and it's

'\n' in memory, '\n' on disk

and on the Mac:

'\n' in memory, '\r' on disk
And when you write a text file under Python using "w" mode, will the
people who come along afterwards to edit the file in Notepad curse your
name? Notepad expects \r\n EOL characters, and gets cranky if the \r is
missing.

No, all will be well. So long as you use 'r' (or 'rt' to be explicit) and 'w'
(or 'wt').

Only use 'rb' and 'wb' if you want to make sure that what is on disk is
literally the same as what is in memory. If you write text files this way
in Python, you will get '\n' line endings.
 
J

John Machin

Steven said:
I get nervous when I read instructions like this. It sounds too much like
voodoo: "Do this, because it works, never mind how or under what
circumstances, just obey or the Things From The Dungeon Dimensions will
suck out your brain!!!"

Sorry Fredrik :)

Many people don't appear to want to know why; they only want a solution
to what they perceive to be their current problem.
When you read a Windows text file using "r" mode, what happens to the \r
immediately before the newline?

The thing to which you refer is not a "newline". It is an ASCII LF
character. The CR and the LF together are the physical representation
(in a Windows text file) of the logical "newline" concept.

Internally, LF is used (irrespective of platform) to represent that concept.
Do you have to handle it yourself?
No.

Or will
Python cleverly suppress it so you don't have to worry about it?

Suppressed: no, it's a transformation from a physical line termination
representation to a logical one. Cleverly: matter of opinion. By Python:
In general, no -- the transformation is handled by the underlying C
run-time library.
And when you write a text file under Python using "w" mode, will the
people who come along afterwards to edit the file in Notepad curse your
name?

If they do, it will not be because other than CRLF has been written as a
line terminator.
Notepad expects \r\n EOL characters, and gets cranky if the \r is
missing.

AFAIR, it performs well enough for a text editor presented with a file
consisting of one long unterminated line with occasional embedded
meaningless-to-the-editor control characters. You can scroll it, edit
it, write it out again ... any crankiness is likely to be between the
keyboard and the chair :)
How does this behaviour differ from "universal newlines"?

Ordinary behaviour in text mode:

Win: \r\n -> newline -> \r\n
Mac OS X < 10: \r -> newline -> \r
other box: \n -> newline -> \n

Note : TFM does appear a little light on in this area. I suppose not all
users of Python have aquired this knowledge by osmosis through decades
of usage of C on multiple platforms :)

"Universal newlines":
On *any* box: \r\n or \n or \r (even a mixture) -> \n on reading
On writing, behaviour is "ordinary" i.e. the line terminator is what is
expected by the current platform

"Universal newlines" (if used) solves problems like where an other-boxer
FTPs a Windows text file in binary mode and then posts laments about all
those ^M things on the vi screen and :1,$s/^M//g doesn't work :)

HTH,
John
 
P

Peter Hansen

Steven said:
So going back to the original question... if I open in "r" mode a text
file which was created under Windows, I will get \r characters in the
text and have to deal with them regardless of what platform I am running
Python under. Correct?

Almost, but the way you phrased that makes it incorrect in at least one
case. I suspect you are clear on the actual situation now, but just to
correct your wording: you will get \r characters and have to deal with
them on any platform which does *not* use \r\n for its line endings.
(Under Windows, of course, you won't have to deal with them at all.)
If I use "rU" mode Python suppress the \r characters. When I write a
string back, I'm responsible for making sure the EOL markers are correct
for whatever platform I expect the file to be read under. Unless Python
can somehow magically read my mind and know that even though I'm writing
the file under Linux it will be read later under Windows. Am I close?

Python can't magically read your mind, although if you manage to steal
the time machine (in the future) you could always just come back and fix
the files as they are written, and you (in present time) would *think*
that Python had magically read your mind...

A few more trinkets: some programs on Windows will work just fine even
if they encounter files with only \n line endings. Decent text editors
will either adapt or let you specify what line ending should be
expected. Trying always to work with only \n line endings in files you
write will help us move towards the day when we can obliterate the
abomination that is \r\n so try it out and use "rU" to read, "w" to
write on Linux if that works in your case.

-Peter
 
F

Fredrik Lundh

John said:
Many people don't appear to want to know why; they only want a solution
to what they perceive to be their current problem.

and many people can identify a short HOWTO when they see it, and look
things up in the documentation when they want the full story. reposting
the documentation in response to every question is, in most cases, a
waste of time and bandwidth...

</F>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top