File read and writing in binary mode...

N

nicolasg

Hi,

I'm trying to open a file (any file) in binary mode and save it inside
a new text file.
After that I want to read the source from the text file and save it
back to the disk with its original form. The problem is tha the binary
source that I extract from the text file seems to be diferent from the
source I saved. Here is my code:
1)
handle=file('image.gif','rb')
source=handle.read()
handle.close()

if I save the file directly everything is well :
2A)
handle=file('imageDuplicated.gif','wb')
handle.write(source)
handle.close()

the file imageDuplicated.gif will be exactly the same as the original
image.gif.
But if I save the source to a text file I have porblem :
2B)
handle=file('text.txt','w')
handle.write(source)
handle.close()

handle=file('text.txt','r')
source2=handle.read()
handle.close()

handle=file('imageDuplicated.gif','wb')
handle.write(source2)
handle.close()

the files are completly different and I even cant display the image
from the imageDuplicated.gif .

something changes when I save the source in the text file because in
2B) source == source2 returns a False .
I suspect that maybe the encoding is making a conflict but I don't know
how to manipulate it...
Every help is welcome, thanks.
 
D

Diez B. Roggisch

Hi,

I'm trying to open a file (any file) in binary mode and save it inside
a new text file.
After that I want to read the source from the text file and save it
back to the disk with its original form. The problem is tha the binary
source that I extract from the text file seems to be diferent from the
source I saved. Here is my code:
1)
handle=file('image.gif','rb')
source=handle.read()
handle.close()

if I save the file directly everything is well :
2A)
handle=file('imageDuplicated.gif','wb')
handle.write(source)
handle.close()

the file imageDuplicated.gif will be exactly the same as the original
image.gif.
But if I save the source to a text file I have porblem :
2B)
handle=file('text.txt','w')
handle.write(source)
handle.close()

handle=file('text.txt','r')
source2=handle.read()
handle.close()

handle=file('imageDuplicated.gif','wb')
handle.write(source2)
handle.close()

the files are completly different and I even cant display the image
from the imageDuplicated.gif .

something changes when I save the source in the text file because in
2B) source == source2 returns a False .
I suspect that maybe the encoding is making a conflict but I don't know
how to manipulate it...
Every help is welcome, thanks.

Now why do you think there is a distinction between binary and text
files? Precisely because of what you observe: a text file will undergo a
automatice file ending conversion. That means that newlines get
translated to DOS-newlines (actually two characters) - and that makes a
binary file corrupted.

http://zephyrfalcon.org/labs/python_pitfalls.html

Solution: only use binary files, and do the newline-translation yourself
if needed.

Diez
 
N

nicolasg

Solution: only use binary files, and do the newline-translation yourself
if needed.

Diez

The probelm is if I can't use only binary files...
How can I do the newline-translation myself ? if check the text and
found the diferrence between binary and text is the '\r' instead of
'\'n' . I can't change every '\n' because it will change the real '\n'
ones....
 
D

Dennis Lee Bieber

The probelm is if I can't use only binary files...
How can I do the newline-translation myself ? if check the text and
found the diferrence between binary and text is the '\r' instead of
'\'n' . I can't change every '\n' because it will change the real '\n'
ones....

If you think the file is text, there are only three common line
endings: (TRS80) <cr>, (Unix) <lf>, (MS-DOS) <cr><lf>... I suppose you
might find some utility that wants to use <lf><cr>, but I'd say that
utility should be trashed.

By definition, those sequences are used as control codes in a text
file and are, in a way, not considered part of the data (VAX FORTRAN's
default file mode doesn't store them at all, instead using some sort of
bitmap at the start of the record to indicate "first physical chunk of
record", "last physical chunk of record", "first and last chunk of
record", and "intermediate chunk of record" (short lines fit in one
chunk so first&last applies, very long lines are split into multiple
chunks and the middle ones have no flag bits set).

Line endings of text files get changed as the files are copied from
system to system (that's what FTP's text mode is responsible for); I
believe SMTP/NNTP physically uses <cr><lf> protocol, but the clients
convert between native format on receipt/transmission.

Binary files, OTOH, are defined to have NO special meaning to any
characters; they are a simple stream from beginning to end. The only way
to store binary data into a text file is to encode it into a plain text
format: Base64 (MIME), Quoted-Printable, UUencode/UUdecode. The
receiving end then has to know about the format, and decode the text
format back to binary.
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
H

hdante

Hi,

I'm sorry, but you have a conceptual error there. Text files differ
from binary files because they are not considered raw. When you say
that this or that file is a text file, python and/or the operating
system takes the liberty to insert and or remove semantics from the
file, according to some formatting agreed/imposed by them. Text files
are formatted files. Binary files are raw files. You can't expect by
any means that python will respect your raw data when writing text
files.

The solution to the question "how can I write a binary file into a
text file" requires that you convert the binary file to a format
suitable for textual access. For example, you can "uuencode" the binary
file inside your text file. In simple terms:

mytext = serialize(binary_file.read())
text_file.write(mytext)
...
mydata = deserialize(text_file.read())

The functions "serialize" and "deserialize" are responsible for
converting the binary data to/from some textual representation.

HOWEVER, why would you want to put binary data into a text file ? Is
this some information that will be used by your application ? Or will
you transfer it to some other person in a portable way ? Maybe you
should leave those files alone and not try to merge them. If it is a
complex structure you should put it into a database instead of doing
those strange things. In the worst case, you could just write a text
file, write a binary file and concatenate them later. See if this
really is a requirement for your project.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top