are there some special about '\x1a' symbol

sim.sim · Jan 10, 2009

Hi all!

I had touch with some different python behavior: I was tried to write
into a file a string with the '\x1a' symbol, and for FreeBSD system,
it gives expected result:
'before\x1aafter'

but for my WinXP box, it gives some strange:
'before'

Here I can write all symbols, but not read.
I've tested it with python 2.6, 2.5 and 2.2 and WinXP SP2.

Why is it so and is it possible to fix it?

Marc 'BlackJack' Rintsch · Jan 10, 2009

I had touch with some different python behavior: I was tried to write
into a file a string with the '\x1a' symbol, and for FreeBSD system, it
gives expected result:

'before\x1aafter'

but for my WinXP box, it gives some strange:

'before'

Here I can write all symbols, but not read. I've tested it with python
2.6, 2.5 and 2.2 and WinXP SP2.

Why is it so and is it possible to fix it?

\x1a is treated as "end of text" character in text files by Windows. So
if you want all, unaltered data, open the file in binary mode ('rb' and
'wb').

Ciao,
Marc 'BlackJack' Rintsch

Mel · Jan 10, 2009

sim.sim said:
Hi all!

I had touch with some different python behavior: I was tried to write
into a file a string with the '\x1a' symbol, and for FreeBSD system,
it gives expected result:

'before\x1aafter'

but for my WinXP box, it gives some strange:

'before'

Here I can write all symbols, but not read.
I've tested it with python 2.6, 2.5 and 2.2 and WinXP SP2.

Why is it so and is it possible to fix it?

'\x1a' is the End-of-file mark that Windows inherited from MS-DOS and CP/M.
The underlying Windows libraries honour it for files opened in text mode.

open ('test', 'rb').read()

will read the whole file.

Mel.

John Machin · Jan 10, 2009

Hi all!

I had touch with some different python behavior: I was tried to write
into a file a string with the '\x1a' symbol, and for FreeBSD system,
it gives expected result:

'before\x1aafter'

but for my WinXP box, it gives some strange:

'before'

Here I can write all symbols, but not read.
I've tested it with python 2.6, 2.5 and 2.2 and WinXP SP2.

Why is it so and is it possible to fix it?

You've already got two good answers, but this might add a little more
explanation:

You will aware that in Windows Command Prompt, to exit the interactive
mode of Python (among others), you need to type Ctrl-Z ...

| C:\junk>python
| Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit
(Intel)] on
win32
| Type "help", "copyright", "credits" or "license" for more
information.
| >>> problem = '\x1a'
| >>> ord(problem)
| 26
| >>> # What is the 26th letter of the English/ASCII alphabet?
| ...
| >>> ^Z
|
| C:\junk>

HTH,
John

sim.sim · Jan 12, 2009

Hi all!

Click to expand...

I had touch with some different python behavior: I was tried to write
into a file a string with the '\x1a' symbol, and for FreeBSD system,
it gives expected result:

but for my WinXP box, it gives some strange:

Here I can write all symbols, but not read.
I've tested it with python 2.6, 2.5 and 2.2 and WinXP SP2.

Click to expand...

Why is it so and is it possible to fix it?

Click to expand...

You've already got two good answers, but this might add a little more
explanation:

You will aware that in Windows Command Prompt, to exit the interactive
mode of Python (among others), you need to type Ctrl-Z ...

| C:\junk>python
| Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit
(Intel)] on
win32
| Type "help", "copyright", "credits" or "license" for more
information.
| >>> problem = '\x1a'
| >>> ord(problem)
| 26
| >>> # What is the 26th letter of the English/ASCII alphabet?
| ...
| >>> ^Z
|
| C:\junk>

HTH,
John

Hi John,

I agree - those two answers are really good. Thanks to Mel and Marc.
I'm sorry if my stupid question was annoyed you.

John Machin · Jan 12, 2009

You've already got two good answers, but this might add a little more
explanation:

Click to expand...

You will aware that in Windows Command Prompt, to exit the interactive
mode of Python (among others), you need to type Ctrl-Z ...

Click to expand...

| C:\junk>python
| Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit
(Intel)] on
win32
| Type "help", "copyright", "credits" or "license" for more
information.
| >>> problem = '\x1a'
| >>> ord(problem)
| 26
| >>> # What is the 26th letter of the English/ASCII alphabet?
| ...
| >>> ^Z
|
| C:\junk>

Click to expand...

HTH,
John

Click to expand...

Hi John,

I agree - those two answers are really good. Thanks to Mel and Marc.
I'm sorry if my stupid question was annoyed you.

I didn't think your question was stupid. Stupid was (a) CP/M recording
file size as number of 128-byte sectors, forcing the use of an in-band
EOF marker for text files (b) MS continuing to regard Ctrl-Z as an EOF
decades after people stopped writing Ctrl-Z at the end of text files.

And I wasn't annoyed either ... I was merely adding the information
that Ctrl-Z and '\x1a' were the same thing; many people don't make the
connection.

Cheers,
John

Gabriel Genellina · Jan 13, 2009

I didn't think your question was stupid. Stupid was (a) CP/M recording
file size as number of 128-byte sectors, forcing the use of an in-band
EOF marker for text files (b) MS continuing to regard Ctrl-Z as an EOF
decades after people stopped writing Ctrl-Z at the end of text files.

This is called "backwards compatibility" and it's a good thing

Consider the Atucha II nuclear plant, started in 1980, based on a design
from 1965, and still unfinished. People require access to the complete
design, plans, specifications, CAD drawings... decades after they were
initially written.
I actually do use (and maintain! -- ugh!) some DOS programs. Some people
would have a hard time if they could not read their old data with new
programs.
Even Python has a "print" statement decades after nobody uses a teletype
terminal anymore...

sim.sim · Jan 13, 2009

Hi all!
I had touch with some different python behavior: I was tried to write
into a file a string with the '\x1a' symbol, and for FreeBSD system,
it gives expected result:
open("test", "w").write('before\x1aafter')
open('test').read()
'before\x1aafter'
but for my WinXP box, it gives some strange:
open("test", "w").write('before\x1aafter')
open('test').read()
'before'
Here I can write all symbols, but not read.
I've tested it with python 2.6, 2.5 and 2.2 and WinXP SP2.
Why is it so and is it possible to fix it?
You've already got two good answers, but this might add a little more
explanation:
You will aware that in Windows Command Prompt, to exit the interactive
mode of Python (among others), you need to type Ctrl-Z ...
| C:\junk>python
| Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit
(Intel)] on
win32
| Type "help", "copyright", "credits" or "license" for more
information.
| >>> problem = '\x1a'
| >>> ord(problem)
| 26
| >>> # What is the 26th letter of the English/ASCII alphabet?
| ...
| >>> ^Z
|
| C:\junk>
HTH,
John

Click to expand...

Click to expand...

Hi John,

Click to expand...

I agree - those two answers are really good. Thanks to Mel and Marc.
I'm sorry if my stupid question was annoyed you.

Click to expand...

I didn't think your question was stupid. Stupid was (a) CP/M recording
file size as number of 128-byte sectors, forcing the use of an in-band
EOF marker for text files (b) MS continuing to regard Ctrl-Z as an EOF
decades after people stopped writing Ctrl-Z at the end of text files.

And I wasn't annoyed either ... I was merely adding the information
that Ctrl-Z and '\x1a' were the same thing; many people don't make the
connection.

Cheers,
John

Ah John, thank you for your explanations!
My first impression was that your comments does not relates to my
question,
but I've found new things where I used to think there was nothing.

Now it is interesting to me how one have to give reasons to use open
(.., 'r') instead of open(.., 'rb')?
There is confusing situation when we use open(.., 'r'), are there some
scenario when we might be confused when we'll use open(.., 'rb')?

John Machin · Jan 13, 2009

Ah John, thank you for your explanations!
My first impression was that your comments does not relates to my
question,
but I've found new things where I used to think there was nothing.

Now it is interesting to me how one have to give reasons to use open
(.., 'r') instead of open(.., 'rb')?
There is confusing situation when we use open(.., 'r'), are there some
scenario when we might be confused when we'll use open(.., 'rb')?

Some general rules: if you regard a file as text, open it with "rt" --
the "t" is redundant but gives you and anyone else who reads your code
that assurance that you've actually thought about it. Otherwise you
regard the file as binary, and open it with "rb". The distinction was
always important on Windows because of the special handling of
newlines and '\x1a') but largely unimportant on *x boxes. With Python
3.0, it is important for all users to specify the mode that they
really need: 'b' files read and write bytes objects whereas 't' files
read and write str objects, have the newline etc changes, and need an
encoding to decode the raw bytes into str (Unicode) objects -- and you
can't use bytes objects directly with a 't' file nor str objects
directly a 'b' file.

HTH,
John

Terry Reedy · Jan 14, 2009

Gabriel said:
This is called "backwards compatibility" and it's a good thing

But it does not have to be the default or only behavior to be available.

Gabriel Genellina · Jan 14, 2009

But it does not have to be the default or only behavior to be available.

Sure. And it isn't - there are many flags to open and fopen to choose
from...
The C89 standard (the language used to compile CPython) guarantees *only*
that printable characters, tab, and newline are preserved in a text file;
everything else may or may not appear when it is read again. Even
whitespace at the end of a line may be dropped. Binary files are more
predictable...

Delphi recognizes the EOF marker when reading a text file only inside the
file's last 128-byte block -- this mimics the original CP/M behavior
rather closely. I thought the MSC runtime did the same, but no, the EOF
marker is recognized anywhere. And Python inherits that (at least in 2.6
-- I've not tested with 3.0)

Problems of Symbol Congestion in Computer Languages	54	Feb 16, 2011
Some questions about decode/encode	23	Jan 24, 2008
some random remarks about Moose::Manual::Concepts	12	Mar 2, 2013
undefined symbol: PyUnicodeUCS4*	4	Sep 19, 2005
Are there any modules for IRC, that work with Python 3.1?	6	Oct 10, 2009
PyWart: Python modules are not so "modular" after all!	2	Nov 11, 2013
interpretation of special characters in Python	8	Jul 6, 2008
some questions about ejabberd,spark and psi	1	Feb 24, 2008

are there some special about '\x1a' symbol

sim.sim

Marc 'BlackJack' Rintsch

Mel

John Machin

sim.sim

John Machin

Gabriel Genellina

sim.sim

John Machin

Terry Reedy

Gabriel Genellina

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads