Writing to Binary Files

D

doug meyer

I'm having some serious issues writing to a binary file. Can anyone
give me a hand?
And could I be having this problem because I'm trying to do this on a
Linux machine?
Thanks!
 
A

Alex Young

doug said:
I'm having some serious issues writing to a binary file. Can anyone
give me a hand?
Can you show us some code?
And could I be having this problem because I'm trying to do this on a
Linux machine?
Unlikely... I do this fairly regularly.
 
K

Kyle Schmitt

Well, try to tell us what you are doing?

If the answer to your question is as simple as

textFile=File.open("~/foo.text","w")
binaryFile=File.open("~/bar.dump","wb")

Then everyone will tell you to read up ;), but since it's probably
more complex, tell us what the issue is.

(and if it's writing Marshal.dump stuff, your problem isn't writing,
it's reading)
--Kyle
 
B

Bertram Scharpf

Hi,

Am Mittwoch, 22. Aug 2007, 06:35:32 +0900 schrieb doug meyer:
I'm having some serious issues writing to a binary file. Can anyone
give me a hand?
And could I be having this problem because I'm trying to do this on a
Linux machine?

I just repeat what I wrote this afternoon
(<http://www.ruby-forum.com/topic/122170#544613>).

The distinction between "text" and "binary" is the archetype
misdesign in DOS and Windows. It means nothing more than
that in "text" mode line ends are translated from "\n" to
"\r\n" what is of no use but to disturb file positions and
string lengths. The only purpose of this is to detain
programmers from doing anything in a non-Microsoft way.
Anywhere else you don't need to care.

Sorry for the flame but that's the way it is.

Bertram
 
M

Michael T. Richter

--=-l8TK7rWhZKvee9Ynil6s
Content-Type: multipart/alternative; boundary="=-9JKKUTsQpEG7LFwJd85x"


--=-9JKKUTsQpEG7LFwJd85x
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

The distinction between "text" and "binary" is the archetype
misdesign in DOS and Windows.=20


And this explains the distinction between opening binary vs. opening
text in UNIX APIs since *LONG* before MS-DOS how?

It means nothing more than
that in "text" mode line ends are translated from "\n" to
"\r\n" what is of no use but to disturb file positions and
string lengths. The only purpose of this is to detain
programmers from doing anything in a non-Microsoft way.
Anywhere else you don't need to care.
=20
Sorry for the flame but that's the way it is.


It would help if you actually said things the way they were. This "text
mode" vs. "binary mode" thing is a UNIX "innovation" (one of many which
has plagued the computing world since UNIX's misdesign). Let me
introduce to you what "the way it is" really is....

Way back in the bad old days, people talked to computers on teletype
machines: combination printer/keyboard. We didn't have these fancy,
schmancy glass-screened terminals all over the place. On these
terminals "carriage return" meant "move the printer head to the far
left". "Line feed" meant "scroll the paper down one line". These were
completely separate actions requiring completely separate control codes.
("\n" is the "line feed" or "newline". "\r" is the "carriage return".)

Most systems of the day wrote everything in a single format. There was
no binary/text distinction. Each line was ended by a carriage return
and a line feed. (I still have some of these systems up and running on
my laptop thanks to good old SIMH.) When you printed these files,
whatever their contents were was run straight to the teletype and
printed out verbatim. That meant each line ended with "\r\n".

UNIX, of course, being the half-bastard-child of real operating systems
(MULTICS and ITS) that it was, had to do things differently. To save on
space (!) its creators, in their nigh-infinite wisdom and judgement,
plagued the world with the notion of only using "\n" to terminate text
lines in text files. (Apparently saving one byte out of every line was
important! Never mind that OSes on smaller machines than ever ran UNIX
had no problem with that "wasted" carriage return....) Of course this
meant that you couldn't just copy the bits of a document directly to the
teletype. Oh, no. You had to open the file in a special text mode so
the OS would convert things behind the scenes for you, switching every
"\n" into a "\r\n" before sending it off to the teletype. This was
perceived (incorrectly) as a Great Innovation.

Later, as the UNIX infection set in, "smart" terminals (teletypes and
glass screen) started to, if set appropriately, automatically convert
line feeds into carriage return/line feed combinations. This was a
feature added to make up for a misfeature in UNIX systems, though, not
something that was really necessary. (Indeed it breaks the definition
of a line feed according to the ASCII definition thereof.)

MS-DOS arrived on the scene from a different direction. It came from
the CP/M side of things which was itself heavily influenced by IBM's
operating systems (scaled down, of course, to the teensy CPU that ran
it). CP/M? Used the more traditional (at the time) CR/LF combinations
found in pretty much every operating system of the day other than UNIX.
MS-DOS was a hack off of a CP/M clone for the new 8086 processor and, as
such, inherited CP/M's approach to text files (and command line
switches) which itself was inherited from IBM's (and others') various
operating systems.

So the system that had to do it different? Wasn't Microsoft's. Nor
even IBM's. UNIX was the one that had to be different from everybody
else. And it is UNIX that is to blame for this artificial text/binary
file distinction.

--=20
Michael T. Richter <[email protected]> (GoogleTalk:
(e-mail address removed))
Our outrage at China notwithstanding, we should remember that before
1891 the copyrights of foreigners were not protected in the United
States. (Lawrence Lessig)

--=-9JKKUTsQpEG7LFwJd85x
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; CHARSET=3DUTF-8">
<META NAME=3D"GENERATOR" CONTENT=3D"GtkHTML/3.12.1">
</HEAD>
<BODY>
On Wed, 2007-22-08 at 08:13 +0900, Bertram Scharpf wrote:
<BLOCKQUOTE TYPE=3DCITE>
<PRE>
<FONT COLOR=3D"#000000">The distinction between &quot;text&quot; and &quot;=
binary&quot; is the archetype</FONT>
<FONT COLOR=3D"#000000">misdesign in DOS and Windows. </FONT>
</PRE>
</BLOCKQUOTE>
<BR>
And this explains the distinction between opening binary vs. opening text i=
n UNIX APIs since *LONG* before MS-DOS how?<BR>
<BR>
<BLOCKQUOTE TYPE=3DCITE>
<PRE>
<FONT COLOR=3D"#000000">It means nothing more than</FONT>
<FONT COLOR=3D"#000000">that in &quot;text&quot; mode line ends are transla=
ted from &quot;\n&quot; to</FONT>
<FONT COLOR=3D"#000000">&quot;\r\n&quot; what is of no use but to disturb f=
ile positions and</FONT>
<FONT COLOR=3D"#000000">string lengths. The only purpose of this is to deta=
in</FONT>
<FONT COLOR=3D"#000000">programmers from doing anything in a non-Microsoft =
way.</FONT>
<FONT COLOR=3D"#000000">Anywhere else you don't need to care.</FONT>

</PRE>
</BLOCKQUOTE>
<BR>
It would help if you actually said things the way they were.&nbsp; This &qu=
ot;text mode&quot; vs. &quot;binary mode&quot; thing is a <B>UNIX</B> &quot=
;innovation&quot; (one of many which has plagued the computing world since =
UNIX's misdesign).&nbsp; Let me introduce to you what &quot;the way it is&q=
uot; <B>really</B> is....<BR>
<BR>
Way back in the bad old days, people talked to computers on teletype machin=
es: combination printer/keyboard.&nbsp; We didn't have these fancy, schmanc=
y glass-screened terminals all over the place.&nbsp; On these terminals &qu=
ot;carriage return&quot; meant &quot;move the printer head to the far left&=
quot;.&nbsp; &quot;Line feed&quot; meant &quot;scroll the paper down one li=
ne&quot;.&nbsp; These were <B>completely separate actions</B> requiring <B>=
completely separate control codes</B>.&nbsp; (&quot;\n&quot; is the &quot;l=
ine feed&quot; or &quot;newline&quot;.&nbsp; &quot;\r&quot; is the &quot;ca=
rriage return&quot;.)<BR>
<BR>
Most systems of the day wrote everything in a single format.&nbsp; There wa=
s no binary/text distinction.&nbsp; Each line was ended by a carriage retur=
n and a line feed.&nbsp; (I still have some of these systems up and running=
on my laptop thanks to good old SIMH.)&nbsp; When you printed these files,=
whatever their contents were was run straight to the teletype and printed =
out verbatim.&nbsp; That meant each line ended with &quot;\r\n&quot;.<BR>
<BR>
UNIX, of course, being the half-bastard-child of real operating systems (MU=
LTICS and ITS) that it was, had to do things differently.&nbsp; To save on =
space (!) its creators, in their nigh-infinite wisdom and judgement, plague=
d the world with the notion of only using &quot;\n&quot; to terminate text =
lines in text files.&nbsp; (Apparently saving one byte out of every line wa=
s important!&nbsp; Never mind that OSes on smaller machines than ever ran U=
NIX had no problem with that &quot;wasted&quot; carriage return....)&nbsp; =
Of course this meant that you couldn't just copy the bits of a document dir=
ectly to the teletype.&nbsp; Oh, no.&nbsp; You had to open the file in a sp=
ecial text mode so the OS would convert things behind the scenes for you, s=
witching every &quot;\n&quot; into a &quot;\r\n&quot; before sending it off=
to the teletype.&nbsp; This was perceived (incorrectly) as a Great Innovat=
ion.<BR>
<BR>
Later, as the UNIX infection set in, &quot;smart&quot; terminals (teletypes=
and glass screen) started to, if set appropriately, automatically convert =
line feeds into carriage return/line feed combinations.&nbsp; This was a fe=
ature added to make up for a misfeature in UNIX systems, though, not someth=
ing that was really necessary.&nbsp; (Indeed it breaks the definition of a =
line feed according to the ASCII definition thereof.)<BR>
<BR>
MS-DOS arrived on the scene from a different direction.&nbsp; It came from =
the CP/M side of things which was itself heavily influenced by IBM's operat=
ing systems (scaled down, of course, to the teensy CPU that ran it).&nbsp; =
CP/M?&nbsp; Used the more traditional (at the time) CR/LF combinations foun=
d in pretty much every operating system of the day other than UNIX.&nbsp; M=
S-DOS was a hack off of a CP/M clone for the new 8086 processor and, as suc=
h, inherited CP/M's approach to text files (and command line switches) whic=
h itself was inherited from IBM's (and others') various operating systems.<=
BR>
<BR>
So the system that had to do it different?&nbsp; Wasn't Microsoft's.&nbsp; =
Nor even IBM's.&nbsp; <B>UNIX</B> was the one that had to be different from=
everybody else.&nbsp; And it is <B>UNIX</B> that is to blame for this arti=
ficial text/binary file distinction.<BR>
<BR>
<TABLE CELLSPACING=3D"0" CELLPADDING=3D"0" WIDTH=3D"100%">
<TR>
<TD>
-- <BR>
<B>Michael T. Richter</B> &lt;<A HREF=3D"mailto:[email protected]">ttmri=
(e-mail address removed)</A>&gt; (<B>GoogleTalk:</B> (e-mail address removed))<BR>
<I>Our outrage at China notwithstanding, we should remember that before 189=
1 the copyrights of foreigners were not protected in the United States. (La=
wrence Lessig)</I>
</TD>
</TR>
</TABLE>
</BODY>
</HTML>

--=-9JKKUTsQpEG7LFwJd85x--

--=-l8TK7rWhZKvee9Ynil6s
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)

iD8DBQBGy3pxLqyWkKVQ54QRApufAJ9jCNvn5W2d4Pp5L8NC/hX8E5coFwCbBPfR
gjf9+XBt7rKp8v5vO17VGkI=
=UQGY
-----END PGP SIGNATURE-----

--=-l8TK7rWhZKvee9Ynil6s--
 
B

Bertram Scharpf

Hi,

Am Mittwoch, 22. Aug 2007, 08:51:17 +0900 schrieb Michael T. Richter:
And this explains the distinction between opening binary vs. opening
text in UNIX APIs since *LONG* before MS-DOS how?

MS-DOS was a hack off of a CP/M clone for the new 8086 processor and, as
such, inherited CP/M's approach to text files (and command line
switches) which itself was inherited from IBM's (and others') various
operating systems.

So the system that had to do it different? Wasn't Microsoft's. Nor
even IBM's. UNIX was the one that had to be different from everybody
else. And it is UNIX that is to blame for this artificial text/binary
file distinction.

You're right. I almost forgot that. It's not fair to imply
MS invented anything.

Bertram
 
M

Michael W. Ryder

Bertram said:
Hi,

Am Mittwoch, 22. Aug 2007, 06:35:32 +0900 schrieb doug meyer:

I just repeat what I wrote this afternoon
(<http://www.ruby-forum.com/topic/122170#544613>).

The distinction between "text" and "binary" is the archetype
misdesign in DOS and Windows. It means nothing more than
that in "text" mode line ends are translated from "\n" to
"\r\n" what is of no use but to disturb file positions and
string lengths. The only purpose of this is to detain
programmers from doing anything in a non-Microsoft way.
Anywhere else you don't need to care.

Sorry for the flame but that's the way it is.

Bertram
So where do files like database or spreadsheet files fit? They
obviously aren't text files, sending one to a printer will only produce
garbage or a locked printer. Binary is a useful distinction to denote
that a file is not "ready" for use without intervention, either through
the OS or a program. Whether Unix treats text differently than anyone
else is not of concern as a simple test will show if a file is a Unix
"text" file or other text file. I have zero problems moving text files
back and forth between Unix and Windows programs. I can even move my
"binary" database files between the two without any problems. I can't
use either of them to read the database files without a program but that
is why it is treated differently.
 
B

Bill Kelly

From: Michael T. Richter
UNIX, of course, being the half-bastard-child of real operating systems
(MULTICS and ITS) that it was, had to do things differently. To save on
space (!) its creators, in their nigh-infinite wisdom and judgement, plagued
the world with the notion of only using "\n" to terminate text lines in text
files. (Apparently saving one byte out of every line was important! Never
mind that OSes on smaller machines than ever ran UNIX had no problem with
that "wasted" carriage return....) Of course this meant that you couldn't
just copy the bits of a document directly to the teletype. Oh, no. You had
to open the file in a special text mode so the OS would convert things
behind the scenes for you, switching every "\n" into a "\r\n" before sending
it off to the teletype. This was perceived (incorrectly) as a Great
Innovation.

It's sounding like it really was a Great Innovation in those days to separate the
model from the view. In retrospect it's taken for granted as good design.
The specific print head movement characteristics of a particular piece of display
hardware have no business polluting the internal representation of a portable
text file format.
Later, as the UNIX infection set in, "smart" terminals (teletypes and glass
screen) started to, if set appropriately, automatically convert line feeds
into carriage return/line feed combinations. This was a feature added to
make up for a misfeature in UNIX systems, though, not something that was
really necessary.

A misfeature would be taking completely independent teletype carriage return
and line feed output control bytes, and perverting them into an atomic LINE ENDING
SEQUENCE PAIR.

A teletype doesn't need the carriage return and linefeed characters to
follow one another back-to-back. They're just independent ways to move
the print head. As I recall, I used to send carriage returns independently from
linefeeds to our teletype whenever I pleased, if I wanted to write over the same
line twice, for instance... for bold-face, underline, overstrike, whatever.

Taking two independent print head control characters, and artifically gluing them
together into an atomic line-ending marker, just adds noise to what should have
been a portable, device-independent text file format.

Thankfully the Unix guys got it right.


Regards,

Bill
 
J

John Joyce

It's sounding like it really was a Great Innovation in those days
to separate the
model from the view. In retrospect it's taken for granted as good
design.
The specific print head movement characteristics of a particular
piece of display
hardware have no business polluting the internal representation of
a portable
text file format.
Portable? In those days, networking and portability were not real
concerns on the minds of anyone.
Back then, people really believed their code would disappear in a few
years, to be replaced by something else!

All in all, it is should be evident that there are two (3 if you
count the old non-unix mac os) line ending paradigms to care about.
Not too bad! Consider how many other things are splintered more!

This is not a matter of pointing fingers or saying my OS is better
than yours,
this is simply a matter of doing what needs to be done.

Even now, a "simple" text file could have any kind of crazy internal
formatting that is meaningful only to some particular program.
The history is interesting, but hardly important to writing the code
for now.
 
B

Bill Kelly

From: "John Joyce said:
Portable? In those days, networking and portability were not real
concerns on the minds of anyone.
Back then, people really believed their code would disappear in a few
years, to be replaced by something else!

Well I tried. :) I started as a youngster in the late '70's
but we DID have a real teletype printer. Complete with margin
bell at the end of the line. A fast touch typist could probably
have out-paced the thing.
All in all, it is should be evident that there are two (3 if you
count the old non-unix mac os) line ending paradigms to care about.
Not too bad! Consider how many other things are splintered more!

Indeed, but I'd call unix \n or old-mac \r format equally reasonable.
It's the unnecessarily redundant \r\n that seems clunky to me.
This is not a matter of pointing fingers or saying my OS is better
than yours,
this is simply a matter of doing what needs to be done.

Oh, I wasn't intending to get into the OS vs. OS finger-pointing.
It's like the Atari ST vs. the Amiga. One of them totally sucked!

<grin>

Regards,

Bill
 
R

Robert Klemme

2007/8/22 said:
Hi,

Am Mittwoch, 22. Aug 2007, 06:35:32 +0900 schrieb doug meyer:

I just repeat what I wrote this afternoon
(<http://www.ruby-forum.com/topic/122170#544613>).

The distinction between "text" and "binary" is the archetype
misdesign in DOS and Windows. It means nothing more than
that in "text" mode line ends are translated from "\n" to
"\r\n" what is of no use but to disturb file positions and
string lengths. The only purpose of this is to detain
programmers from doing anything in a non-Microsoft way.
Anywhere else you don't need to care.

Sorry for the flame but that's the way it is.

I prefer to just take it as given that different operating systems
treat line endings differently and go from there. I can't remember
that this caused an issue for me: I open binary files with "rb" and
text files with "r" on all platforms. I even find this helps
understanding the code better (documentation). And I cannot remember a
single case where someone processed a text file and needed exact file
positions; line numbers are typically more interesting.

Relax :)

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,050
Latest member
AngelS122

Latest Threads

Top