Has thought been given given to a cleaned up C? Possibly called C+.

  • Thread starter Casey Hawthorne
  • Start date
K

Keith Thompson

Bartc said:
Even considering only systems using CR, CR/LF, or LF, C's text mode can go
wrong: reading in a file created on a computer with a different newline
sequence, or writing a text file on this computer and reading it on one
using a different sequence.

So translate the file before trying to process it as text.
And then there are hybrid files which are mainly binary data, but also
contain embedded text that can include newline characters. Which means that
binary data that looks like CR/LF gets converted to LF (and the entire file
shrinks in size by one byte), or vice versa.

Such files are binary, and should be read in binary mode, which
always requires knowing the exact format of the file. If the format
specifies how the embedded text is represented, use that format.
If it doesn't, you're in trouble anyway.
I suspect people who advocate text mode tend to use machines with a single
character newline, and simply don't see the problems it creates when newline
is multiple characters.

I'm quite aware of the problems; I deal with Unix-format
vs. Windows-format text files all the time (including ASCII, Latin-1,
Windows-1252, UTF-8, and both flavors of UTF-16). Some tools I use
can deal with the differences. For others, I use conversion tools.
 
K

Keith Thompson

Bartc said:
Many here seem to be confusing forward declarations, with external
declarations. I think Richard D was just questioning the necessity of a
function declaration, when the function *definition* exists elsewhere in the
file.

That's why I asked Richard D to explain just what *he's* proposing.

[...]
 
W

Walter Banks

Ian said:
In other words a step backwards; requiring a two pass compilation.


That might take a while!

One of our customers had a huge amount of legacy code
that was originally compiled to asm and then linked that had both
forward references and circular links in many cases in the same
source file.

Some of this code had forward references to variable declarations.

Not that it really mattered their IDE tools set would only
create a linker script with the application and library files
in alphabetical order.

The kicker was they seriously needed dead function removal. As
much as 60% of the code in many applications was written in
assembler, much of it dead code. Extracting function information
out of asm code is at best a heuristic best guess and required
a whole new level of structure identification.

Our linkers have a strategy pass and used that mechanism to
create the information we needed to create function prototypes
that we could validate the calls to and aggressively map dead code.

It can be done it isn't in the spirit of C.

Regards

Walter
 
J

jacob navia

Ian Collins a écrit :
That might take a while!

Please press Ctrl-C :)
There is an infinite loop in there.

What about typos?

Suppose:

int LongNameFunctionThatMakesCodeEasyToRead(int param)
{ ..... }

int someFunction(void)
{
int a = LongNameFunctionThatMakesCodeEasytoRead(56);
}

This mistake is not detected until link time. If we are building a
library it will be detected by the usert of the library when he/she
calls "someFunction".
 
N

Nick

Mutual recursion is simple unless you are fixated on one pass
compilers. Forward function declarations (signatures) are needed
for separate compilation.

So I think we are all agreed that removing the need for forward
declarations for static functions is very easy (just a second pass of
the source file). For separate compilation they are needed in the
header file, or some sort of compiler magic to generate a header file is
needed.

I'd quite like to see the first one, I don't think the second is worth
the effort.
 
B

bartc

I have used a real live system
(I believe the O/S in question is still in use and still supported
today) on which puts("Hello") produced

'\005' '\000' 'H' 'e' 'l' 'l' 'o' '\000'

If you moved those eight bytes verbatim to a Unix or Windows
system and tried to read them with a text stream, you'd get junk
at best.

Which is my point.
If you read them with a binary stream, you'd get the
eight bytes -- but then it would be *your* problem to know the
text-file conventions of the foreign system.

At least now the opportunity is there to fix the file.
(See the newline?
No? Too bad: It's there, sort of. See the two NUL's? Yes? Too
bad: They're not there, sort of.)

I'd need a bigger sample to work out what's happening.
This whole thing has been explained to you several times, and
you're still grasping the wrong end of the stick. C's distinction
between text and binary streams doesn't *create* incompatibilities,
it gives you a fighting chance to *solve* incompatibilities that
arise outside C's sphere of influence.

OK, but I think I'll carry on using binary mode, even when using printf()
(was it you who gave me that fix to set stdout to binary? It's still working
fine, thanks.)

(I don't need text mode: I tend to do simple text file i/o a line at a time,
so newline is something to be added at the end of output, and something to
be discarded on input. So I don't even need to know what a newline is except
inside these routines.

And when reading text files entirely into memory, I read as binary, and my
code can cope with any of the common cr,lf or cr,lf terminations (when it
even matters). I'm not even sure I can do this in text mode, because the
file size reported is wrong, for a start.

BTW how do you do random access in text mode?)
 
B

Ben Bacarisse

bartc said:
(I don't need text mode: I tend to do simple text file i/o a line at a time,
so newline is something to be added at the end of output, and something to
be discarded on input. So I don't even need to know what a newline is except
inside these routines.

No one is suggesting you must use text streams, but you were
suggesting that they were broken in some way. I think everyone agrees
that the world would be a better place if the distinction between text
and binary mode were not needed, but that is not the world we live in.
And when reading text files entirely into memory, I read as binary, and my
code can cope with any of the common cr,lf or cr,lf terminations (when it
even matters). I'm not even sure I can do this in text mode, because the
file size reported is wrong, for a start.

How can you tell the difference between a \r that is (or is part of) a
line ending and one that is there in its own right? You have, in
effect, invented your own text mode (built on top of binary streams)
with its own rules about what gets added and what gets stripped out.
BTW how do you do random access in text mode?)

It depends on what you mean. The answer might be fsetpos and
fgetpos.
 
J

Jasen Betts

How is fgets misused?

if you don't check for '\n' at the end of the input, and handle long
lines apropriately...


--- news://freenews.netfront.net/ - complaints: (e-mail address removed) ---
 
J

James Kuyper

Andrew said:
How is fgets misused?

I think his point is that ANY function can be misused, by giving it the
wrong arguments, or by writing code around the function call that is
based upon a mistaken concept of how it works, or by using it when you
actually should be using some other function, and that strncpy() has
nothing wrong with it that doesn't fall into one of those categories.

Sure, you can pass a pointer to strncpy() that points at a buffer that
is actually smaller than the length that you pass to strncpy(), but you
can do the same thing with fgets(). You can also pass strncpy() a
pointer to an input source that is insufficiently long, and not
null-terminated, but it's equally true that you can pass an invalid
stream pointer to fgets().

With strncpy(), whether or not it creates a null-terminated string (NTS)
depends upon whether or not it fills the buffer before reading a '\0',
and if you need an NTS you either need to use a different function, or
add code to make sure that it is null terminated. On those rare
occasions when I use strncpy(), despite needing a null-terminated
string, I'll pass it a length 1 byte shorter than the actual length of
the buffer, and make sure that the last character in the buffer is '\0'.

Now, fgets() always null terminates the buffer it fills, but it has a
similar problem: whether or not there's a newline at the end of the
string depends upon whether or not it fills the buffer before reading in
a '\n'; how you should deal with that depends upon whether or not you
want it to have a '\n' at the end. You also have to keep in mind that if
the '\n' is missing, the next "line" that you read in from that stream
will actually be the rest of the same line.
 
J

James Kuyper

Seebs wrote:
....
In general, strncpy() should never be used unless you're working with
early UNIX inodes.

Or have some other reason why you need to null-fill an array.

There's a situation that comes up frequently in the programs I'm
responsible for, for which strncpy() seems like exactly the right fit.
Let me describe the relevant constraints:

The program is writing into a fixed-length field in a file; it's not
within my authority to change that fact, nor would I be inclined to so
so if I did have that authority. The source it is writing from is a
string (usually a unix path) that will normally be shorter than that
length, but has, for practical purposes, no upper limit on it's length -
no acceptably small fixed sized could be specified that would be
guaranteed to be long enough. If the source string is too long to fit,
it's not an error serious enough to justify refusing to create the file.
However, getting as much as possible of the source string into the
output file is more important than ensuring that it's null terminated.

Finally, though there's no specific requirement concerning this, I've
found it's easier to perform a binary comparison of two files created by
different runs of the program if the parts of the fixed-length field
after the null termination have fixed contents that will compare equal
in the two files. Those fixed contents don't have to be '\0', but since
strncpy() provides null-filling as part of the package, I don't see any
point is filling them with anything other than '\0'.

What function would you recommend I use to prepare the output buffer for
fwrite(), other than strncpy()?
 
J

James Kuyper

Nick said:
well, yes but this is a cleanup and simplification

And the fact that it will break existing code is why simplification of
an existing language by removal of features is generally not a practical
option.
and what do you need that for? Backward compatibility with K&R C? C++
seem to manage to drop void in these circumstances without a problem.

C++ was a new language, and backwards compatibility is less of an issue
for a brand new language than it is for a new version of a
well-established language. Despite that fact, Stroustrup compromised his
design somewhat, in many different ways, for the sake of minimizing
backwards incompatibility with C. This just wasn't one of the cases
where he was willing to compromise.
without a return type it's a call.

So, with your proposed change to the language, it would not be possible
to declare a void function at block scope?
 
D

Dag-Erling Smørgrav

Richard Delorme said:
- void can be removed from the language. So instead of declaring
void f(void);
we can simply write :
f();
The generic pointer type (void *), could then be replaced by (char*)
without much harm.

Vade retro, satana!
- auto can be removed from the language.

I'd love to see it reused to mean "the type of the expression which you
assign to the variable", except "auto foo = bar" would be legal in both
old-C and new-C but have different semantics.
- register can be removed from the language.
ACK

- restrict can be removed. It is mostly here to facilitate some
optimizations by the compiler by preventing aliases. I think this is
not the duty of the programmer to facilitate optimization, but rather
the burden of the compiler.

It allows the programmer to provide information to the compiler which
the compiler can not obtain in any other way.
Obviously the standard library could be improved, at least by removing
dangerous function like gets() or stupid functions like strncpy and
making all functions thread safe. It might also be made simpler by
removing useless type like size_t.

Why is size_t useless?

DES
 
S

Seebs

What function would you recommend I use to prepare the output buffer for
fwrite(), other than strncpy()?

Okay, I am convinced. There does still exist a use for strncpy. And a
pretty well considered one.

What it isn't, though, is a "safe replacement for strcpy". :) By contrast,
strlcpy sort of is. (Not totally, but enough better that I'd use it by
preference if it were in the spec.)

-s
 
R

Richard Delorme

Le 16/03/2010 20:10, Keith Thompson a écrit :
In your hypothetical C without "function forward declarations", how
would this work? How would the compiler figure out that square()
takes a float argument and returns a float result?

I'm not saying it can't be done. I'm asking how you suggest doing it.

The problem is for the compiler to know the type of a function without
asking the programmer to explicitly write a function declaration. If the
function is defined in the same compilation unit I guess there is no
much problem. When using several compilation units, we need to tell the
compiler on how to find the information by itself. We can imagine
several ways to achieve this:
- In the source file, use a new instruction that indicates where to
find the function type. For example:

#interface "square.c"

int main()
{
/*... code using the function square... */
}

So #interface will open the square.c file and decipher the function type
from its definition.

- In the command line invoking the compiler, we can tell the compiler
to seek for function type in a file through a command line option.
Similar to the -llibrary_name used during linkage, we can use a
-ifile_name telling the compiler where to find the information.

- We can let the compiler doing all the work by itself. It means
delaying part of the compilation during the linkage, once all
information has been gathered.

- We can also introduce the notion of project, which is a set of
source files & libraries necessary to build an application (or a
library). One of the first step for the compiler will be to read all the
file of the project to build an interface file (a kind of precompiled
header) usable by all the file of the project.

So there are many way to accomplish such a task. The only important
things, IMHO, is to facilitate the task of the programmer (the language
user) whatever the consequence for the complexity of the compiler could be.
 
K

Keith Thompson

Richard Delorme said:
Le 16/03/2010 20:10, Keith Thompson a écrit :

The problem is for the compiler to know the type of a function without
asking the programmer to explicitly write a function declaration. If
the function is defined in the same compilation unit I guess there is
no much problem. When using several compilation units, we need to tell
the compiler on how to find the information by itself. We can imagine
several ways to achieve this:
- In the source file, use a new instruction that indicates where to
find the function type. For example:

#interface "square.c"

int main()
{
/*... code using the function square... */
}

So #interface will open the square.c file and decipher the function
type from its definition.

Currently, I can have a file "square.c" that defines a number
of functions, and another file "square.h" that provides visible
declarations for *some* of them. Given your #interface proposal,
how do I specify that some functions in "square.c" are intended to
be used by client code, and some are internal?

Is #interface supposed to replace #include? If so, what about
declarations for things other than functions (constants, typedefs,
etc.)?
- In the command line invoking the compiler, we can tell the compiler
to seek for function type in a file through a command line
option. Similar to the -llibrary_name used during linkage, we can use
a -ifile_name telling the compiler where to find the information.

Ok, though this means that some of the information about the program
is in the compiler command line rather than in the source, with a
syntax that can vary wildly from one compiler to another. Admittedly
this is already somewhat true for existing options that specify
libraries.
- We can let the compiler doing all the work by itself. It means
delaying part of the compilation during the linkage, once all
information has been gathered.

If I call square(), how does the compiler know where to find it?
- We can also introduce the notion of project, which is a set of
source files & libraries necessary to build an application (or a
library). One of the first step for the compiler will be to read all
the file of the project to build an interface file (a kind of
precompiled header) usable by all the file of the project.

Do you propose imposing this on all implementations?
So there are many way to accomplish such a task. The only important
things, IMHO, is to facilitate the task of the programmer (the
language user) whatever the consequence for the complexity of the
compiler could be.

All this to avoid having to write a function declaration? I thought
you were trying to simplify things.
 
W

Willem

Keith Thompson wrote:
) Currently, I can have a file "square.c" that defines a number
) of functions, and another file "square.h" that provides visible
) declarations for *some* of them. Given your #interface proposal,
) how do I specify that some functions in "square.c" are intended to
) be used by client code, and some are internal?

Err... You could use the 'static' keyword ?

) Is #interface supposed to replace #include? If so, what about
) declarations for things other than functions (constants, typedefs,
) etc.)?

It could pick up all constants, typedefs, variables and macros that
are not declared 'static' ?


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
B

bartc

Keith said:
Currently, I can have a file "square.c" that defines a number
of functions, and another file "square.h" that provides visible
declarations for *some* of them. Given your #interface proposal,
how do I specify that some functions in "square.c" are intended to
be used by client code, and some are internal?

Global and Private attributes in square.c?

Anyway aren't all functions in square.c currently already Global?
Is #interface supposed to replace #include?

I would keep both. #include just imports some text. #interface might be
higher level than that (and may not need the #), or it might just #include
some automatically created header.
If so, what about
declarations for things other than functions (constants, typedefs,
etc.)?

Why not? For typedefs, structs, enums and variables, the definition is just
duplicated (although variable initialisers don't need to be). These entities
are generally defined in one place (a shared header), but if the mechanism
is there to this automatically, why not use it.

#defines might be a problem however: how do you apply a global attribute to
something which is just a preprocessor artifact? (This highlights why C
could do with a 'cleanup'.)
Ok, though this means that some of the information about the program
is in the compiler command line rather than in the source, with a
syntax that can vary wildly from one compiler to another. Admittedly
this is already somewhat true for existing options that specify
libraries.

Apparently some applications are already quite fragile if the exact
compiler/linker options are not used.
All this to avoid having to write a function declaration? I thought
you were trying to simplify things.

No, to avoid writing, and maintaining identical twin versions of, a hundred
declarations, at a cost of specifying one import or interface module.

If you wanted to extend square.c, just define a new function and declare as
global or exported. Then it will be instantly available to all modules that
make use of square.c.

However it you change a function signature in square.c, it may not be as
obvious then you need to recompile everything, as it might with square.h.
 
R

Richard Delorme

Le 17/03/2010 18:44, Keith Thompson a écrit :
Richard Delorme<[email protected]> writes:
All this to avoid having to write a function declaration? I thought
you were trying to simplify things.

From the programmer point of view this is a simplification. What I
would appreciate, is to transfer some complexity from the programmer to
the compiler. This is exactly the opposite of what restrict is doing in
current implementations.
 
R

Richard Delorme

Le 17/03/2010 19:15, Willem a écrit :
Keith Thompson wrote:
) Currently, I can have a file "square.c" that defines a number
) of functions, and another file "square.h" that provides visible
) declarations for *some* of them. Given your #interface proposal,
) how do I specify that some functions in "square.c" are intended to
) be used by client code, and some are internal?

Err... You could use the 'static' keyword ?

) Is #interface supposed to replace #include? If so, what about
) declarations for things other than functions (constants, typedefs,
) etc.)?

It could pick up all constants, typedefs, variables and macros that
are not declared 'static' ?

Yes, this is exactly my opinion.
 
R

Richard Delorme

Le 16/03/2010 08:15, jacob navia a écrit :
James Kuyper a écrit :
>
Contrary to what Mr Delorme writes, I have (in this forum and in
comp.std.c) argued extensively AGAINST some "features" of the C library,
specifically gets() and asctime().

I am sorry if I let you understand that. I just see a big debate in this
thread about adding operator overloading and generic containers to the
standard, and I thought they were your ideas. I am sorry if I put you
behind such proposals despite your true will.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,279
Latest member
LaRoseDermaBottle

Latest Threads

Top