confused abt file operations

S

siliconwafer

Hi All,
If I open a binary file in text mode and use text functions to
read it then will I be reading numbers as characters or actual
values?
What if I open a text file and read it using binary read functions.
-Siliconwafer
 
G

Gordon Burditt

If I open a binary file in text mode and use text functions to
read it then will I be reading numbers as characters or actual
values?
What if I open a text file and read it using binary read functions.

The difference between a binary and text file on most implementations
is either (a) none at all, or (b) the line ending (\n vs. \r\n vs.
\n\r vs. something else). This doesn't rule out things like binary
line numbers and line lengths preceeding the line with no end-of-line
terminator, but these are very rare.

What is a "text function"? There is no such division ("fread and
fwrite are for binary, printf, fgets, fputs, scanf, etc. are for
text" is a myth).

The practical effect is likely that you will see stray \r characters
before or after \n characers if you read a text file as binary. If
you read a binary file as text, some of the \r characters might
vanish. You might see a binary file read as text somewhat shorter
than expected if the binary file contains a text "end of file
character" which is interpreted as such.

Gordon L. Burditt
 
S

SM Ryan

# Hi All,
# If I open a binary file in text mode and use text functions to
# read it then will I be reading numbers as characters or actual
# values?
# What if I open a text file and read it using binary read functions.

There are not text and binary functions. The functions work either way,
but they may have slightly different results if the file is text or
binary. That might include things like text line definitions, end of
file markers, file positions in fseek and ftell, etc. What the difference
really are depends on the system.

On Unices, text and binary modes are the same thing.
 
P

Peter Shaggy Haywood

Groovy hepcat Gordon Burditt was jivin' on Tue, 30 Aug 2005 18:33:01
-0000 in comp.lang.c.
Re: confused abt file operations's a cool scene! Dig it!
The difference between a binary and text file on most implementations

Just to clarify (for the OP's sake), all files are binary files. But
some store values represented as text while others store raw binary
values.
However, a file stream may be opened in either binary or text mode.
This may (depending on the system) cause some translation of the bytes
read in or written out. For example, on a particular system a newline
sequence may be stored as a carriage return character followed by a
line feed character but is translated to a single newline character
when read in.
is either (a) none at all, or (b) the line ending (\n vs. \r\n vs.
\n\r vs. something else). This doesn't rule out things like binary
line numbers and line lengths preceeding the line with no end-of-line
terminator, but these are very rare.

There may be other translations too. For example, one well known
system truncates a text stream after a certain control character is
written to it. (You alude to this below, Gordon.)
What is a "text function"? There is no such division ("fread and
fwrite are for binary, printf, fgets, fputs, scanf, etc. are for
text" is a myth).

Partially true, but some functions naturally lend themselves to
reading/writing text while others are naturally more suited to
reading/writing binary values. Actually, some functions require their
input to be text or they write text. You wouldn't read a raw binary
value with fscanf(), for example, because it parses formatted text.
You could read text with fread(), but it won't parse it and extract
values from it (as fscanf() will).
The practical effect is likely that you will see stray \r characters
before or after \n characers if you read a text file as binary. If
you read a binary file as text, some of the \r characters might
vanish. You might see a binary file read as text somewhat shorter
than expected if the binary file contains a text "end of file
character" which is interpreted as such.

Right.

--

Dig the even newer still, yet more improved, sig!

http://alphalink.com.au/~phaywood/
"Ain't I'm a dog?" - Ronny Self, Ain't I'm a Dog, written by G. Sherry & W. Walker.
I know it's not "technically correct" English; but since when was rock & roll "technically correct"?
 
S

siliconwafer

Hi Again,
So what about numbers wirtten to /read from the text/binary files?
I read that in text files numbers are stored as characters or strings
rather than their actual values.So a number 1234 will occoupy5 bytes(4
+ ' \0 ' ).In binary files,numbers are stored as actual values i.e 1234
is stored as 2 bytes integer and so on..
was the book correct?
 
P

Peter Nilsson

siliconwafer said:
Hi Again,
So what about numbers wirtten to /read from the text/binary files?

If you're replying to something specific, please include the context
of what you are replying to.
I read that in text files numbers are stored as characters or
strings rather than their actual values. So a number 1234 will
occoupy 5 bytes(4 + ' \0 ' ).In binary files,numbers are stored
as actual values i.e 1234 is stored as 2 bytes integer and so
on.. was the book correct?

With text files, you can only output 'printable' characters. With
binary files you can output whatever byte values you want.

How you encode your integer as output is up to you. For instance,
you can print the number 42 as "42", or "2A" if you want to output
hexadecimal.

Because binary streams are not limited to printable characters
alone, you have an additional option of copying the integer
representation itself.
 
K

Keith Thompson

siliconwafer said:
Hi Again,
So what about numbers wirtten to /read from the text/binary files?
I read that in text files numbers are stored as characters or strings
rather than their actual values.So a number 1234 will occoupy5 bytes(4
+ ' \0 ' ).In binary files,numbers are stored as actual values i.e 1234
is stored as 2 bytes integer and so on..
was the book correct?

Since I've been telling people to search the newsgroup for the phrase
"Context, dammit!", I should occasionally post the explanation along
with the phrase.

Don't assume that your readers have easy access to the article to
which you're replying. It's important to provide some context,
generally relevant quotes from the previous article, so each article
can be understood on its own.

If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers.

As for your question, the way numbers are stored in a file depends on
how you write them. If you use

int n = 1234;
fprintf(myfile, "%d\n", n);

you'll get the 5 characters '1', '2', '3', '4', '\n' written to the
file. If you use

fwrite(&n, sizeof n, 1, myfile);

you'll get a binary representation of the value 1234, typically 2 or 4
bytes; the order in which the bytes are written is system-specific.

Either fprintf() or fwrite() can be used with either binary or text
files; both are defined to write sequences of bytes. It just usually
makes more sense to use fprintf() with text files and fwrite() with
binary files.
 
S

SM Ryan

# Hi Again,
# So what about numbers wirtten to /read from the text/binary files?
# I read that in text files numbers are stored as characters or strings
# rather than their actual values.So a number 1234 will occoupy5 bytes(4
# + ' \0 ' ).In binary files,numbers are stored as actual values i.e 1234
# is stored as 2 bytes integer and so on..
# was the book correct?

You can write an arbitrary byte string to text or binary files, and you
can write ASCII (or EBCDIC or UTF16) strings to text or binary files.
Depending on your system, it might end convert some bytes in text mode.
There is nothing that will reach out and smack your C program, though
it might smack some of the bytes going out. It really is system specific.

If you stick to text mode with isprint() characters together with "\n"
and "\t", and don't assume fseek/ftell are byte offsets, you should
be compatiable with all other plain text processing programs on your
machine, and any other system with the same character set and character
encoding. There will also be character set convertors available.
 
G

Gordon Burditt

If I open a binary file in text mode and use text functions to
Just to clarify (for the OP's sake), all files are binary files. But

What prohibits an implementation from refusing to open files in a mode
(text vs. binary) other than the one they were created with? (I don't
know of any, but the DeathStation 9000 seems like a likely candidate.)

An implementation could also store binary and text files differently,
e.g. a binary file is on a data track with a block size of 2048, and
text files are on an audio track with a block size of 2352, so you
can't read them correctly in the wrong mode.
some store values represented as text while others store raw binary
values.
However, a file stream may be opened in either binary or text mode.
This may (depending on the system) cause some translation of the bytes
read in or written out. For example, on a particular system a newline
sequence may be stored as a carriage return character followed by a
line feed character but is translated to a single newline character
when read in.


There may be other translations too. For example, one well known
system truncates a text stream after a certain control character is
written to it. (You alude to this below, Gordon.)


Partially true, but some functions naturally lend themselves to
reading/writing text while others are naturally more suited to
reading/writing binary values.

This much I'll agree to.
Actually, some functions require their
input to be text or they write text.
You wouldn't read a raw binary
value with fscanf(), for example, because it parses formatted text.

This doesn't mean there is no formatted text within a binary file.
Assuming for the moment that the binary file is a SQL database or
something similar, even though the rest of the file is heavily
binary, you can still seek to the VARCHAR field and try to parse
it if it's in a known format, say, a serial number with pieces that
mean something (year of manufacture, country code, engine type,
etc. which are encoded into automobile VINs). An automobile VIN
would probably be stored as text, as it's not all numeric, and
storing individual pieces as raw binary does not necessarily save
any space and complicates rather than speeds lookups.
You could read text with fread(), but it won't parse it and extract
values from it (as fscanf() will).

Correct, but if the lines are known to be fixed-length, it's a
perfectly good way of reading them. It's also a perfectly good way
of dealing with the file if you're not trying to interpret the
content (e.g. a file copy, just checksumming the bytes, compressing
the file, or packing a bunch of files into a larger one).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top