python tutorial

C

Carl Banks

I was just looking at the python tutorial, and I noticed these lines:

http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-...

"Windows makes a distinction between text and binary files;
"the end-of-line characters in text files are automatically altered
"slightly when data is read or written.

I don't see any obvious way to at docs.python.org to get that corrected: Is
there some standard procedure?

What's wrong with it?


Carl Banks
 
D

Dave Angel

steve said:
I was just looking at the python tutorial, and I noticed these lines:

http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files

"Windows makes a distinction between text and binary files;
"the end-of-line characters in text files are automatically altered
"slightly when data is read or written.

I don't see any obvious way to at docs.python.org to get that corrected: Is
there some standard procedure?

Steve
It's not clear from your question what you want corrected. Are you
saying that the tutorial leaves out some detail? Or are you upset that
reading the data gets "automatically altered" data?

If it's the former, just lookup the function in the reference
documentation (eg. the chm file in a Windows installation).

The way to control the behavior is with the 'mode' parameter to open().
If mode has a 'b' in it, the file is considered binary, which means no
translation is done. If the mode has a 'u' in it, or neither 'b' nor
'u', then some translation is done. The purpose of the translation is
to let the program always use \n to mean end of line, for code that'll
be portable between the various operating system conventions. Windows
typically does text files with \r\n at the end of each line. Some Macs
do just a \r, and Unix and Linux use a \n.

One reason a programmer has to be aware of it is that he/she may be
reading or writing a file from a different operating environment, for
example, a script that'll be uploaded to a web server running a
different OS.
 
S

steve

What's wrong with it?


Carl Banks

1) Windows does not make a distinction between text and binary files.

2) end-of-line characters in text files are not automatically altered by
Windows.

(david)
 
R

Robert Kern

1) Windows does not make a distinction between text and binary files.

2) end-of-line characters in text files are not automatically altered by
Windows.

The Windows implementation of the C standard makes the distinction. E.g. using
stdio to write out "foo\nbar\n" in a file opened in text mode will result in
"foo\r\nbar\r\n" in the file. Reading such a file in text mode will result in
"foo\nbar\n" in memory. Reading such a file in binary mode will result in
"foo\r\nbar\r\n". In your bug report, you point out several proprietary APIs
that do not make such a distinction, but that does not remove the
implementations of the standard APIs that do make such a distinction.

http://msdn.microsoft.com/en-us/library/yeby3zcb.aspx

Perhaps it's a bit dodgy to blame "Windows" per se rather than its C runtime,
but I think it's a reasonable statement on the whole.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
S

Steven D'Aprano

1) Windows does not make a distinction between text and binary files.

Of course it does.


Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
 
S

steve

Robert Kern said:
The Windows implementation of the C standard makes the distinction. E.g.
using stdio to write out "foo\nbar\n" in a file opened in text mode will
result in "foo\r\nbar\r\n" in the file. Reading such a file in text mode
will result in "foo\nbar\n" in memory. Reading such a file in binary mode
will result in "foo\r\nbar\r\n". In your bug report, you point out several
proprietary APIs that do not make such a distinction, but that does not
remove the implementations of the standard APIs that do make such a
distinction.

http://msdn.microsoft.com/en-us/library/yeby3zcb.aspx

Perhaps it's a bit dodgy to blame "Windows" per se rather than its C
runtime, but I think it's a reasonable statement on the whole.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma
that is made terrible by our own mad attempt to interpret it as though it
had
an underlying truth."
-- Umberto Eco


Which is where I came in: I was looking for simple file IO in the tutorial.
The tutorial tells me something false about Windows, rather than something
true about Python.

I'm looking at a statement that is clearly false (for anyone who knows
anything about Windows file systems and Windows file io), which leaves the
Python behaviour completely undefined (for anyone who knows nothing about
Python).

I understand that many of you don't really have any understanding of
Windows, much less any background with Windows, and I'm here to help. That
part was simple.

The next part is where I can't help: What is the behaviour of Python?

I'm sure you don't think that tutorial is only for readers who can guess
that they have to extrapolate from the behaviour of the Visual C library in
order to work out what Python does.


Steve
 
S

steve

Steven D'Aprano said:
1) Windows does not make a distinction between text and binary files.

Of course it does.


Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.

Ok, Python makes a distinction between text and binary files.

Steve.
 
S

Steven D'Aprano

....
Ok, Python makes a distinction between text and binary files.

Microsoft have reported a bug where cmd.exe fails to recognise EOF in a
text file:

http://support.microsoft.com/kb/156258

The behaviour of reading past the \0x1A character is considered a bug,
which says that cmd.exe at least (and by extension Windows apps in
general) are expected to stop reading at \0x1A for text files.


Technically, the Windows file systems record the length of text files and
so an explicit EOF character is redundant, nevertheless, the behaviour of
stopping the read at \0x1A is expected. Whether you want to claim it is
"Windows" or "the Windows shell" or something else is a fine distinction
that makes little difference in practice.

Anyway, here's Raymond Chen of Microsoft explaining more:

http://blogs.msdn.com/oldnewthing/archive/2004/03/16/90448.aspx
 
S

steve

Steven D'Aprano said:
Microsoft have reported a bug where cmd.exe fails to recognise EOF in a
text file:

http://support.microsoft.com/kb/156258

The behaviour of reading past the \0x1A character is considered a bug,
which says that cmd.exe at least (and by extension Windows apps in
general) are expected to stop reading at \0x1A for text files.


Technically, the Windows file systems record the length of text files and
so an explicit EOF character is redundant, nevertheless, the behaviour of
stopping the read at \0x1A is expected. Whether you want to claim it is
"Windows" or "the Windows shell" or something else is a fine distinction
that makes little difference in practice.

Anyway, here's Raymond Chen of Microsoft explaining more:

http://blogs.msdn.com/oldnewthing/archive/2004/03/16/90448.aspx

If you're pleased to be learning something about Windows, then
I'm pleased for you.

The reason that I didn't give a full discussion about the history of
DOS and Microsoft C was that I didn't think it was relevant to a
Python newsgroup.

My Bad. I didn't think anyone would care about the behaviour of
copy vs xcopy in DOS 6-.

I'd like to see the Tutorial corrected so that it gives some useful
information about the behaviour of Python. As part of that, I'd like
to see it corrected so that it doesn't include patently false information,
but only because the patently false information about Windows
obscures the message about Python.

Believe me, I really don't care what myths you believe about
Windows, or why you believe them. I've got a full and interesting
life of my own.

I'm only interested in getting the Python tutorial corrected so that it
gives some sensible information to someone who hasn't already had
the advantage of learning what the popular myths represent to the
Python community.

So far I've been pointed to a discussion of C, a discussion of DOS,
and a discussion of Windows NT 4.
Great. Glad to see that you know how to use the Internet.

I'll give you that if you already have a meaning to assign to those
meaningless words, you know more Python than I do.

And I'll give you that if you already have a meaning to assign to
those meaningless words, you know more Visual C than I do.

Is that all there is? You're going to leave the tutorial because
you can mount an obscure justification and it makes sense to
someone who already knows what it means?
Tell me it isn't so :~(
 
D

D'Arcy J.M. Cain

Technically, the Windows file systems record the length of text files and
so an explicit EOF character is redundant, nevertheless, the behaviour of
stopping the read at \0x1A is expected. Whether you want to claim it is

I really loved CP/M in its day but isn't it time we let go?
 
E

Ethan Furman

steve said:
Which is where I came in: I was looking for simple file IO in the tutorial.
The tutorial tells me something false about Windows, rather than something
true about Python.

I'm looking at a statement that is clearly false (for anyone who knows
anything about Windows file systems and Windows file io), which leaves the
Python behaviour completely undefined (for anyone who knows nothing about
Python).

I understand that many of you don't really have any understanding of
Windows, much less any background with Windows, and I'm here to help. That
part was simple.

I will freely admit to having no idea of just how many pythonastis have
good Windows experience/background, but how about you give us the
benefit of the doubt and tell us exactly which languages/routines you
play with *in windows* that fail to make a distinction between text and
binary?
 
R

Robert Kern

Which is where I came in: I was looking for simple file IO in the tutorial.
The tutorial tells me something false about Windows, rather than something
true about Python.

I don't think it's false. I think it's a fair statement given the Windows
implementation of the C standard library. Such things are frequently considered
to be part of the OS. This isn't just some random API; it's the implementation
of the C standard.
I'm looking at a statement that is clearly false (for anyone who knows
anything about Windows file systems and Windows file io), which leaves the
Python behaviour completely undefined (for anyone who knows nothing about
Python).

I understand that many of you don't really have any understanding of
Windows, much less any background with Windows, and I'm here to help. That
part was simple.

The next part is where I can't help: What is the behaviour of Python?

The full technical description is where it belongs, in the reference manual
rather than a tutorial:

http://docs.python.org/library/functions.html#open
I'm sure you don't think that tutorial is only for readers who can guess
that they have to extrapolate from the behaviour of the Visual C library in
order to work out what Python does.

All a tutorial level documentation needs to know is what is described: when a
file is opened in text mode, the actual bytes written to a file for a newline
may be different depending on the platform. The reason that it does not explain
the precise behavior on each and every platform is because it *is* undefined.
Python 2.x does whatever the C standard library implementation for stdio does.
It mentions Windows as a particularly common example of a difference between
text mode and binary mode.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
L

Lie Ryan

Ben said:
You started out asking how to *interpret* it, which is fine for this
forum; but discussing it here isn't going to lead automatically to any
*midification* to a document developed within the core of Python.

I definitely want to see how python doc be midified, last time I checked
MIDI cannot play spoken words, don't know whether there is
text-to-speech sound font though ;)
 
P

Piet van Oostrum

Peter Bell said:
PB> That says that Windows NT 3.5 and NT 4 couldn't make
PB> a distinction between text and binary files. I don't think
PB> that advances your case.

And that was a bug apparently (euphemistically called a `problem').
PB> If they had changed the Windows behaviour, yes, but
PB> Windows 7 seems to be compatible with NT 3.5 rather
PB> than with DOS.

If that is true then they may still be `researching this problem'. :=(
 
T

Terry Reedy

'Windows', in its broad sense of Windoes system, includes the standards
and protocols mandated by its maker, Microsoft Corporation, and
implemented in its C compiler, which it uses to compile the software
that other interact with. I am pretty sure that WixXP Notepad *still*
requires \r\n in text files, even though Wordpad does not. Don't know
about Haste (la Vista) and the upcoming Win7.

It is a common metaphor in English to ascribe agency to products and
blame them for the sins (or virtues) of their maker.

'Unix' and 'Linux' are also used in the double meaning of OS core and OS
system that includes core, languages tools, and standard utilities.

I agree. There are much worse sins in the docs to be fixed.

Hmmm. "Bill Gates, his successors, and minions, still require, after 28
years, that we jump through artificial hoops, confuse ourselves, and
waste effort, by differentiating text and binary files and fiddling with
line endings."

More accurate, perhaps, but probably less wise than the current text.

Terry Jan Reedy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,526
Members
44,997
Latest member
mileyka

Latest Threads

Top