Happy christmas

P

Paul Hsieh

[...] I understand that the Committee had an overall policy to
include only those new functions in the Standard library that would
be impossible or difficult to replicate in user code.

Excuse me, what? You mean like the SafeStr library that Microsoft
tricked the committee into adopting? Every function in there is
trivial, with basically no opportunity for compiler implementation
improvements.

Its obvious that the C standards committee are also trying to set a
direction for the language irrespective of the difficulty of the
functions they add. (They also don't have any idea what they are
doing, but that's a different story.)
 
V

vippstar

... snip ...



Yes there is. Just open the file in text mode (i.e. without the
"rb") and wait until you detect a '\n' in the stream. That is
exactly where a line separator occured.
That's a where, not a what.
 
J

Joe Wright

CBFalconer said:
Yes there is. Just open the file in text mode (i.e. without the
"rb") and wait until you detect a '\n' in the stream. That is
exactly where a line separator occured.
Not here. Opening a text file with CR line terminations in "r" mode is
fine but the CR bytes (0x0d) are never seen. I get one long line.
 
V

vippstar

Not here. Opening a text file with CR line terminations in "r" mode is
fine but the CR bytes (0x0d) are never seen. I get one long line.
He is not talking about CR (which is ascii) or 0x0D which is just a
value, but '\n'.
 
F

Flash Gordon

jacob navia wrote, On 25/12/07 11:00:
No, because in text mode the standard doesn't guarantee that ftell and
fseek will work correctly!

You have it wrong. The standard guarantees that in text mode they will
work correctly on any conforming implementation, it's just that the
value returned by ftell does not have to be a byte count. In binary
mode, on the other hand, an fseek SEEK_END does not have to be
meaningfully supported. I suggest you go back and read questions 12.40
and 19.12 of the comp.lang.c FAQ or read the descriptions of the
functions in the standard.


Ignoring this problem?
goto ioerror;
result[l] = 0;
fclose(f);
if (strchr(mode,'b') == NULL) {
char *src = result,*dst = result;
while ((src - result) < l) {

Hmm. You have an initialisation, a test, and an increment, wouldn't a
for loop have been more natural?
if (*src != '\r')
*dst++ = *src;
src++;

Since you have opened the file in binary mode and MacOS 9.x and
earlier use '\r' as the line terminator you have just converted the
file to one long line on some systems.

Yes, will not work in DS9000 and MAC os9.x

Or under VMS.
Easy to change though.

Each time you say that I'm sure that someone can come up with another
example where you will have to change it.
There is no portable way to know what line separator the system uses.

That is part of the reason for having text mode, so that you the
programmer do not have to worry about how lines are indicated.

Oh, and under Windows opening in binary mode means you will not
correctly handle ^Z terminated text files so by trying to do it yourself
instead of letting the library handle it you have got it wrong for
Windows as well.

Your function will also have problems if someone tries to use it on stdin.
 
J

Joe Wright

He is not talking about CR (which is ascii) or 0x0D which is just a
value, but '\n'.

Ok vippstar whoever you are. You weren't listening. Among the three
systems, Apple, PC and Unix there are three line endings. Apple used a
single CR, Unix a single LF and the PC two bytes, CRLF.

Jacob said "There is no portable way to know what line separator the
system uses". Chuck said "Just open the file in text mode (i.e. without
the "rb") and wait until you detect a '\n' in the stream".

My response to Chuck was that Hell will freeze over before a single CR
in a text stream will be converted to LF for me to see (on my system here).

Do all of us the favor of assuming we know LF is '\n' and CR is '\r'.
Now that you understand my reply to Chuck, please feel free to comment
on it.

Before you attempt to correct me again, "CR is ascii but 0x0d is just a
value" won't fly. You'll have to do better.

Merry Christmas
 
C

CBFalconer

jacob said:
So What? You think the function should discover each time it is
called the line separator?

My reply simply denied your erroneous underlined statement above.
 
C

CBFalconer

Joe said:
Not here. Opening a text file with CR line terminations in "r" mode
is fine but the CR bytes (0x0d) are never seen. I get one long line.

Then (surprise) '\r' is not used as a line termination. Your file
contains one long line. Notice that I never mentioned '\r' above,
but I did mention '\n'. So does the C standard.
 
V

vippstar

Ok vippstar whoever you are. You weren't listening. Among the three
systems, Apple, PC and Unix there are three line endings. Apple used a
single CR, Unix a single LF and the PC two bytes, CRLF.
You are not listening.
What three systems? There are no systems in C.
Do all of us the favor of assuming we know LF is '\n' and CR is '\r'.
Now that you understand my reply to Chuck, please feel free to comment
on it.
I understood it before as well, and I don't feel like reposting my
comment.
Before you attempt to correct me again, "CR is ascii but 0x0d is just a
value" won't fly. You'll have to do better.
With all the respect, I have a feeling that nothing will do.
 
K

Keith Thompson

Joe Wright said:
Ok vippstar whoever you are. You weren't listening. Among the three
systems, Apple, PC and Unix there are three line endings. Apple used a
single CR, Unix a single LF and the PC two bytes, CRLF.

Jacob said "There is no portable way to know what line separator the
system uses". Chuck said "Just open the file in text mode
(i.e. without the "rb") and wait until you detect a '\n' in the
stream".

My response to Chuck was that Hell will freeze over before a single CR
in a text stream will be converted to LF for me to see (on my system
here).

Agreed, what Chuck suggested will not reliably detect what line
terminator a system uses. You might never see a '\n' when reading a
text file in binary mode (if the system doesn't use '\n' as its line
terminator, or as part of it).

If you write a small file in text mode, then read the same file in
binary mode, then you *might* be able to determine how the system
represents line endings. At least it should work for Unix (LF), MacOS
<= 9 (CR), and DOS/Windows (CRLF). But there are stranger systems
than any of those out there, for example, some that don't use a
character sequence to terminate a line.

But the point is (or should be) that most of the time *it doesn't
matter* how the system represents line endings. If you want to read a
text file, use text mode and let the implementation take care of it
for you. If you want to read a "foreign" text file, you need to know
how it's represented; you might be able to get away with guessing, but
it's better to know by other means. (How do you tell the difference
between a Windows text file and a Unix text file where each line
happens to end with a carriage-return?)
Do all of us the favor of assuming we know LF is '\n' and CR is
\r'. Now that you understand my reply to Chuck, please feel free to
comment on it.

LF isn't *necessarily* '\n'. '\n', or new-line, is a character that
the C implementation uses internally to represent a line terminator.
Whatever representation the OS uses is translated to '\n' on text-mode
input. On old MacOS, for example, it would have been sensible for
'\n' to be the ASCII CR character; I don't know whether it was
actually done that way.

It may be the case that every non-EBCDIC uses the ASCII LF character
for '\n', but the standard doesn't guarantee this.

[...]
 
J

jaysome

[snip]
LF isn't *necessarily* '\n'. '\n', or new-line, is a character that
the C implementation uses internally to represent a line terminator.
Whatever representation the OS uses is translated to '\n' on text-mode
input. On old MacOS, for example, it would have been sensible for
'\n' to be the ASCII CR character; I don't know whether it was
actually done that way.

It was done that way.

/* start program */
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>

#define THE_FILE "foo.txt"

int main(void)
{
int ch;
FILE *fp;
int status;

/* write a text file */
fp = fopen(THE_FILE, "w");
if ( !fp )
{
printf("Error creating %s\n", THE_FILE);
return EXIT_FAILURE;
}

fprintf(fp, "Hello world!\n");
status = fclose(fp);
assert(status == 0);

/* read it in binary */
fp = fopen(THE_FILE, "rb");
if ( !fp )
{
printf("Error opening %s\n", THE_FILE);
return EXIT_FAILURE;
}

/* print the contents */
do
{
ch = fgetc(fp);
printf("%02X\n", (unsigned)ch);
} while ( ch != EOF );

printf("\n");
status = fclose(fp);
assert(status == 0);
return EXIT_SUCCESS;
}
/* end program */

Output with Metrowerks Codewarrior 5.1.1 Build 1108 on MAC OS 9.2.2:

48
65
6C
6C
6F
20
77
6F
72
6C
64
21
0D
FFFFFFFF

And for comparison...

Output with Microsoft Visual C++ 6.0 SP5 on Windows Vista Ultimate:

48
65
6C
6C
6F
20
77
6F
72
6C
64
21
0D
0A
FFFFFFFF

Output with gcc on Ubuntu Linux:

48
65
6C
6C
6F
20
77
6F
72
6C
64
21
0A
FFFFFFFF
 
R

Richard Heathfield

Joe Wright said:

Among the three
systems, Apple, PC and Unix there are three line endings. Apple used a
single CR, Unix a single LF and the PC two bytes, CRLF.

And other systems (such as OS390) use other techniques, which don't
necessarily incorporate in-stream markers.
 
R

Richard Heathfield

(e-mail address removed) said:

Why you never free() what you allocate?

The leak was a good spot. But did you also spot that the code isn't even
guaranteed to compile, even after the typo in the #include is corrected?

The Standard makes no mention of ENOMEM, and no implementation is required
to provide it. Thus, on some systems, the above line will fail to compile.
 
K

Keith Thompson

jaysome said:
[snip]
LF isn't *necessarily* '\n'. '\n', or new-line, is a character that
the C implementation uses internally to represent a line terminator.
Whatever representation the OS uses is translated to '\n' on text-mode
input. On old MacOS, for example, it would have been sensible for
'\n' to be the ASCII CR character; I don't know whether it was
actually done that way.

It was done that way.

/* start program */
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>

#define THE_FILE "foo.txt"

int main(void)
{
int ch;
FILE *fp;
int status;

/* write a text file */
fp = fopen(THE_FILE, "w");
if ( !fp )
{
printf("Error creating %s\n", THE_FILE);
return EXIT_FAILURE;
}

fprintf(fp, "Hello world!\n");
status = fclose(fp);
assert(status == 0);

/* read it in binary */
fp = fopen(THE_FILE, "rb");
if ( !fp )
{
printf("Error opening %s\n", THE_FILE);
return EXIT_FAILURE;
}

/* print the contents */
do
{
ch = fgetc(fp);
printf("%02X\n", (unsigned)ch);
} while ( ch != EOF );

printf("\n");
status = fclose(fp);
assert(status == 0);
return EXIT_SUCCESS;
}
/* end program */

Output with Metrowerks Codewarrior 5.1.1 Build 1108 on MAC OS 9.2.2:

48
65
6C
6C
6F
20
77
6F
72
6C
64
21
0D
FFFFFFFF
[...]

That demonstrates that writing a new-line to a text file stores a 0x0D
byte (ASCII CR) in that file, as seen when reading it as binary. It
doesn't address my speculation above, which is that a C implementation
on MacOS might use 0x0D as the *internal* representation for '\n'.

This program should answer that question by showing the values of
both '\n' and '\r':

#include <stdio.h>
int main(void)
{
printf("'\\n' = 0x%02x", (unsigned)'\n');
switch ('\n') {
case 0x0a: printf(" = ASCII LF"); break;
case 0x0d: printf(" = ASCII CR"); break;
}
printf("\n");

printf("'\\r' = 0x%02x", (unsigned)'\r');
switch ('\r') {
case 0x0a: printf(" = ASCII LF"); break;
case 0x0d: printf(" = ASCII CR"); break;
}
printf("\n");
return 0;
}

If '\n' is ASCII LF, then the system has to do conversions on input
and output, similar to what's done on Windows except that it's a
one-to-one conversion. If '\n' is ASCII CR, then no such conversions
are needed -- but then the value of '\n' is inconsistent with most
other systems.
 
J

jaysome

[snip]
That demonstrates that writing a new-line to a text file stores a 0x0D
byte (ASCII CR) in that file, as seen when reading it as binary. It
doesn't address my speculation above, which is that a C implementation
on MacOS might use 0x0D as the *internal* representation for '\n'.

This program should answer that question by showing the values of
both '\n' and '\r':

#include <stdio.h>
int main(void)
{
printf("'\\n' = 0x%02x", (unsigned)'\n');
switch ('\n') {
case 0x0a: printf(" = ASCII LF"); break;
case 0x0d: printf(" = ASCII CR"); break;
}
printf("\n");

printf("'\\r' = 0x%02x", (unsigned)'\r');
switch ('\r') {
case 0x0a: printf(" = ASCII LF"); break;
case 0x0d: printf(" = ASCII CR"); break;
}
printf("\n");
return 0;
}

Output with Metrowerks Codewarrior 5.1.1 Build 1108 on MAC OS 9.2.2
and Microsoft Visual C++ 6.0 SP5 on Windows Vista Ultimate:

'\n' = 0x0a = ASCII LF
'\r' = 0x0d = ASCII CR
If '\n' is ASCII LF, then the system has to do conversions on input
and output, similar to what's done on Windows except that it's a
one-to-one conversion. If '\n' is ASCII CR, then no such conversions
are needed -- but then the value of '\n' is inconsistent with most
other systems.

The compilers I tested on Windows, Mac OS 9 and Linux all use the same
internal representation for '\n' and '\r'. Windows and MAC OS 9
perform conversions on input and output.

With the Microchip C30 compiler (dsPIC33 target), both of our programs
yield the same results as Linux. And with TI Code Composer Studio and
a C5501 target (CHAR_BIT = 16, sizeof(int) = 1, Big Endian), your
program yelds the same results as the rest, and my program yields the
same results as Windows (except for EOL output (FFFF), of course).

Best regards
 
J

Joe Wright

You are not listening.
What three systems? There are no systems in C.
I understood it before as well, and I don't feel like reposting my
comment.
With all the respect, I have a feeling that nothing will do.

I was not using my best set of manners. I beg your pardon.
 
B

Bart C

Take, for instance, we all some day needed to read an entire
file into RAM to process it. Why there isn't
char *strfromfile(const char *fname,const char *mode)
....

Someone mentioned this is the sort of thing one knocks up in a few minutes.
In fact I had a look and I have one written in about a dozen lines (about a
third of this one anyway), although with less elaborate error handling and
none of that stuff about text mode.

(I *never* use text mode, in fact I hardly remembered it existed until
reading your post; I use binary. What's the point?)
Why are C interfaces so low level?

Maybe it doesn't matter because people build their own libraries around
them. In all my code I use malloc() for example exactly twice (and one of
those I could lose).

You're asking then why C doesn't have a standardised higher level library. I
don't know. But it's certainly easy enough to build an extra layer of
functionality.

Bart
 
K

Keith Thompson

Bart C said:
(I *never* use text mode, in fact I hardly remembered it existed until
reading your post; I use binary. What's the point?)
[...]

The point is to be able to read and write text files properly. Text
mode causes an end-of-line in an external file to be translated to a
single '\n' character on reading, and vice versa on writing. Without
this translation, you'd have to deal with, for example, an extra '\r'
character on each line, depending on what operating system you're
using.

If you use an OS such as Unix or Linux that uses a single '\n'
character to represent end-of-line in external files, then text and
binary mode will likely work the same way -- but then your software
might fail if you try to run it on a different system. Since text
mode costs you nothing (and saves having to type the extra 'b' in the
mode string), why not use it?
 
B

Bart C

Keith Thompson said:
Bart C said:
(I *never* use text mode, in fact I hardly remembered it existed until
reading your post; I use binary. What's the point?) [...]

If you use an OS such as Unix or Linux that uses a single '\n'
character to represent end-of-line in external files, then text and
binary mode will likely work the same way -- but then your software
might fail if you try to run it on a different system. Since text
mode costs you nothing (and saves having to type the extra 'b' in the
mode string), why not use it?

I'm used to text files using one of CR, LF or CRLF to terminate lines. I
have routines that read a line at a time and take care of this. And binary
mode means a 'getfilesize()' function is more accurate.
The point is to be able to read and write text files properly. Text
mode causes an end-of-line in an external file to be translated to a

Writing the EOL chars properly for that OS so that other software can deal
with it might be a valid point. Although I would prefer my own control as to
what exactly is output.

Bart
 
A

army1987

Flash said:
jacob navia wrote, On 24/12/07 23:37:
Wouldn't it be more sensible to use the mode the user passes in?
For files opened as text the format of the return of ftell can be totally
unrelated to the amount of memory needed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,900
Latest member
Nell636132

Latest Threads

Top