are there some special about '\x1a' symbol

S

sim.sim

Hi all!

I had touch with some different python behavior: I was tried to write
into a file a string with the '\x1a' symbol, and for FreeBSD system,
it gives expected result:
'before\x1aafter'


but for my WinXP box, it gives some strange:
'before'

Here I can write all symbols, but not read.
I've tested it with python 2.6, 2.5 and 2.2 and WinXP SP2.

Why is it so and is it possible to fix it?
 
M

Marc 'BlackJack' Rintsch

I had touch with some different python behavior: I was tried to write
into a file a string with the '\x1a' symbol, and for FreeBSD system, it
gives expected result:

'before\x1aafter'


but for my WinXP box, it gives some strange:

'before'

Here I can write all symbols, but not read. I've tested it with python
2.6, 2.5 and 2.2 and WinXP SP2.

Why is it so and is it possible to fix it?

\x1a is treated as "end of text" character in text files by Windows. So
if you want all, unaltered data, open the file in binary mode ('rb' and
'wb').

Ciao,
Marc 'BlackJack' Rintsch
 
M

Mel

sim.sim said:
Hi all!

I had touch with some different python behavior: I was tried to write
into a file a string with the '\x1a' symbol, and for FreeBSD system,
it gives expected result:

'before\x1aafter'


but for my WinXP box, it gives some strange:

'before'

Here I can write all symbols, but not read.
I've tested it with python 2.6, 2.5 and 2.2 and WinXP SP2.

Why is it so and is it possible to fix it?

'\x1a' is the End-of-file mark that Windows inherited from MS-DOS and CP/M.
The underlying Windows libraries honour it for files opened in text mode.

open ('test', 'rb').read()

will read the whole file.

Mel.
 
J

John Machin

Hi all!

I had touch with some different python behavior: I was tried to write
into a file a string with the '\x1a' symbol, and for FreeBSD system,
it gives expected result:


'before\x1aafter'

but for my WinXP box, it gives some strange:


'before'

Here I can write all symbols, but not read.
I've tested it with python 2.6, 2.5 and 2.2 and WinXP SP2.

Why is it so and is it possible to fix it?

You've already got two good answers, but this might add a little more
explanation:

You will aware that in Windows Command Prompt, to exit the interactive
mode of Python (among others), you need to type Ctrl-Z ...

| C:\junk>python
| Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit
(Intel)] on
win32
| Type "help", "copyright", "credits" or "license" for more
information.
| >>> problem = '\x1a'
| >>> ord(problem)
| 26
| >>> # What is the 26th letter of the English/ASCII alphabet?
| ...
| >>> ^Z
|
| C:\junk>

HTH,
John
 
S

sim.sim

I had touch with some different python behavior: I was tried to write
into a file a string with the '\x1a' symbol, and for FreeBSD system,
it gives expected result:

but for my WinXP box, it gives some strange:

Here I can write all symbols, but not read.
I've tested it with python 2.6, 2.5 and 2.2 and WinXP SP2.
Why is it so and is it possible to fix it?

You've already got two good answers, but this might add a little more
explanation:

You will aware that in Windows Command Prompt, to exit the interactive
mode of Python (among others), you need to type Ctrl-Z ...

| C:\junk>python
| Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit
(Intel)] on
win32
| Type "help", "copyright", "credits" or "license" for more
information.
| >>> problem = '\x1a'
| >>> ord(problem)
| 26
| >>> # What is the 26th letter of the English/ASCII alphabet?
| ...
| >>> ^Z
|
| C:\junk>

HTH,
John

Hi John,

I agree - those two answers are really good. Thanks to Mel and Marc.
I'm sorry if my stupid question was annoyed you.
 
J

John Machin

You've already got two good answers, but this might add a little more
explanation:
You will aware that in Windows Command Prompt, to exit the interactive
mode of Python (among others), you need to type Ctrl-Z ...
| C:\junk>python
| Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit
(Intel)] on
win32
| Type "help", "copyright", "credits" or "license" for more
information.
| >>> problem = '\x1a'
| >>> ord(problem)
| 26
| >>> # What is the 26th letter of the English/ASCII alphabet?
| ...
| >>> ^Z
|
| C:\junk>
HTH,
John

Hi John,

I agree - those two answers are really good. Thanks to Mel and Marc.
I'm sorry if my stupid question was annoyed you.

I didn't think your question was stupid. Stupid was (a) CP/M recording
file size as number of 128-byte sectors, forcing the use of an in-band
EOF marker for text files (b) MS continuing to regard Ctrl-Z as an EOF
decades after people stopped writing Ctrl-Z at the end of text files.

And I wasn't annoyed either ... I was merely adding the information
that Ctrl-Z and '\x1a' were the same thing; many people don't make the
connection.

Cheers,
John
 
G

Gabriel Genellina

I didn't think your question was stupid. Stupid was (a) CP/M recording
file size as number of 128-byte sectors, forcing the use of an in-band
EOF marker for text files (b) MS continuing to regard Ctrl-Z as an EOF
decades after people stopped writing Ctrl-Z at the end of text files.

This is called "backwards compatibility" and it's a good thing :)
Consider the Atucha II nuclear plant, started in 1980, based on a design
from 1965, and still unfinished. People require access to the complete
design, plans, specifications, CAD drawings... decades after they were
initially written.
I actually do use (and maintain! -- ugh!) some DOS programs. Some people
would have a hard time if they could not read their old data with new
programs.
Even Python has a "print" statement decades after nobody uses a teletype
terminal anymore...
 
S

sim.sim

Hi all!
I had touch with some different python behavior: I was tried to write
into a file a string with the '\x1a' symbol, and for FreeBSD system,
it gives expected result:
open("test", "w").write('before\x1aafter')
open('test').read()
'before\x1aafter'
but for my WinXP box, it gives some strange:
open("test", "w").write('before\x1aafter')
open('test').read()
'before'
Here I can write all symbols, but not read.
I've tested it with python 2.6, 2.5 and 2.2 and WinXP SP2.
Why is it so and is it possible to fix it?
You've already got two good answers, but this might add a little more
explanation:
You will aware that in Windows Command Prompt, to exit the interactive
mode of Python (among others), you need to type Ctrl-Z ...
| C:\junk>python
| Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit
(Intel)] on
win32
| Type "help", "copyright", "credits" or "license" for more
information.
| >>> problem = '\x1a'
| >>> ord(problem)
| 26
| >>> # What is the 26th letter of the English/ASCII alphabet?
| ...
| >>> ^Z
|
| C:\junk>
HTH,
John
I agree - those two answers are really good. Thanks to Mel and Marc.
I'm sorry if my stupid question was annoyed you.

I didn't think your question was stupid. Stupid was (a) CP/M recording
file size as number of 128-byte sectors, forcing the use of an in-band
EOF marker for text files (b) MS continuing to regard Ctrl-Z as an EOF
decades after people stopped writing Ctrl-Z at the end of text files.

And I wasn't annoyed either ... I was merely adding the information
that Ctrl-Z and '\x1a' were the same thing; many people don't make the
connection.

Cheers,
John

Ah John, thank you for your explanations!
My first impression was that your comments does not relates to my
question,
but I've found new things where I used to think there was nothing.

Now it is interesting to me how one have to give reasons to use open
(.., 'r') instead of open(.., 'rb')?
There is confusing situation when we use open(.., 'r'), are there some
scenario when we might be confused when we'll use open(.., 'rb')?
 
J

John Machin

Ah John, thank you for your explanations!
My first impression was that your comments does not relates to my
question,
but I've found new things where I used to think there was nothing.

Now it is interesting to me how one have to give reasons to use open
(.., 'r') instead of open(.., 'rb')?
There is confusing situation when we use open(.., 'r'), are there some
scenario when we might be confused when we'll use open(.., 'rb')?

Some general rules: if you regard a file as text, open it with "rt" --
the "t" is redundant but gives you and anyone else who reads your code
that assurance that you've actually thought about it. Otherwise you
regard the file as binary, and open it with "rb". The distinction was
always important on Windows because of the special handling of
newlines and '\x1a') but largely unimportant on *x boxes. With Python
3.0, it is important for all users to specify the mode that they
really need: 'b' files read and write bytes objects whereas 't' files
read and write str objects, have the newline etc changes, and need an
encoding to decode the raw bytes into str (Unicode) objects -- and you
can't use bytes objects directly with a 't' file nor str objects
directly a 'b' file.

HTH,
John
 
G

Gabriel Genellina

But it does not have to be the default or only behavior to be available.

Sure. And it isn't - there are many flags to open and fopen to choose
from...
The C89 standard (the language used to compile CPython) guarantees *only*
that printable characters, tab, and newline are preserved in a text file;
everything else may or may not appear when it is read again. Even
whitespace at the end of a line may be dropped. Binary files are more
predictable...

Delphi recognizes the EOF marker when reading a text file only inside the
file's last 128-byte block -- this mimics the original CP/M behavior
rather closely. I thought the MSC runtime did the same, but no, the EOF
marker is recognized anywhere. And Python inherits that (at least in 2.6
-- I've not tested with 3.0)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top