Listing the most dangerous parts of C

F

Flash Gordon

Robert said:
On Wed, 10 May 2006 16:25:42 +0000,


I know that C99 had to be designed in such a way that all C89 code would
still compile without error,

No it wasn't. They got rid of implicit int which breaks any code that
uses it.
> but gets() really should have been banned.
That would have nugded code maintainers towards removing gets() from
existing code, a trivial task.

Here I agree with you. They should have removed gets from the standard.
Or even made a diagnostic required for calling an external function
named gets ;-)
OTOH all the "should haves" with respect to compatibility with old code
are moot anyway. Since every compiler in existence (and even new ones)
will have a "C89" compatibility mode, people maintaining pre-C99 code
will continue to use it (and I will continue to use it whenever I can
because I think much of C99 stinks).

Yes, compilers will continue to have a C89 mode for a long time after
they implement C99 (if they ever do since some are not even trying). In
my case I need to maintain C89 compatibility because there is a high
likelihood of having to port everything (including the non-portable
code) to an implementation that has a stated intention of *never*
implementing C99. It's a shame because there are a few things from C99
that I might otherwise like to use.
 
J

Jordan Abel

On 11 May 2006 03:58:29 GMT,


Can you think of a meaningful piece of code that would break if
strncpy() were made safe, and where strncpy could not trivially be
replaced by memcpy?

Doesn't matter if it can be replaced by memcpy - it's still a silent
change.
 
C

Christopher Benson-Manica

Richard Heathfield said:
(For those who don't know - in Java, *everything* is a pointer.)

I was obviously included in the "those who don't know" category :)
 
R

Richard Bos

qed said:
You are recommending strncpy and strncat. These are slow functions that
occasionally leave off the terminating '\0'.

Once again you demonstrate your exquisite detailed knowledge of C string
handling.

strncpy() behaves as you say; strncat() does not, and is a very useful
function.

Richard
 
R

Robert Latest

On Thu, 11 May 2006 00:04:15 GMT,
in Msg. said:
You are recommending strncpy and strncat. These are slow functions that
occasionally leave off the terminating '\0'.

Apart from the fact that strncat never leaves off the trailing zero,
what do you know about the execution speed of these functions?
fgets is not an ideal substitute for gets as explained here:
http://www.pobox.com/~qed/userInput.html

That "explanation" lumps together several pathological cases of
fgets()'s shortcomings that never occur simultaneously.
The complex number type from C99

is bullshit, like many C99 "features".

robert
 
R

Robert Latest

On Thu, 11 May 2006 10:09:38 +0100,
in Msg. said:
It's a shame because there are a few things from C99
that I might otherwise like to use.

Like what? I'm genuinely interested. So far I haven't met anybody who
found much of C99 useful.

robert
 
F

Flash Gordon

Robert said:
On Thu, 11 May 2006 10:09:38 +0100,


Like what? I'm genuinely interested. So far I haven't met anybody who
found much of C99 useful.

Off the top of my head:

compound literals -
passing a constant to a function expecting a struct

stdint.h -
I'm doing database stuff with an old fashioned database where the
data has to be compatible across multiple machines. So defined
width integer types would be useful, as would the fast types.
Still have to handle endianness, of course.

snprintf -
The MS implementation of _snprintf is *not* the same

Increase in the mimimum number of significant characters in
identifiers (I rely on more than the C89 minimum anyway)

long long (a 64 bit or wider integer type would be of use)
 
C

CBFalconer

qed said:
This makes no sense. If you need ungetc semantics, you can create
your own file stream wrapper. Chances are you will allow more
than a single character of unreading, and you will not support
things like fgetpos() (or fseek, though I don't recommend the use
of that function in general) in your wrapper.

If you don't recognize the need for one char lookahead, your
education is sadly lacking. Other languages use other means.

....snip ...
[...] The only case known to me where the one
char limitation creates a problem is in parsing floats with an
invalid exponential part.

I don't even know what you are talking about. Parsing of files
should not rely on the ability to scan backwards and forwards
through a file, but if you really wanted to do that, why wouldn't
you use fgetpos/fsetpos instead?

Reinforcing my comment on education. get/set doesn't work too well
on interactive input files, for example. Go and read up on
lexers. Maybe someone else will explain it to you.

--
Some informative links:
http://www.geocities.com/nnqweb/
http://www.catb.org/~esr/faqs/smart-questions.html
http://www.caliburn.nl/topposting.html
http://www.netmeister.org/news/learn2quote.html
 
R

Rod Pemberton

jacob navia said:
Rod Pemberton a écrit :

????

Think about 1) and 3). If pointers in main are eliminated, then all memory
allocations would need to static or dynamic.
lcc-win32: done.
References are pointers associated with an object permanently.


lcc-win32: done.
The gc is standard in the normal distribution.


?????
Why?

Much easier to program. Sorry, this comes from my PL/1 experience. Many C
compilers convert the code to pass by reference for assembly anyway...
Stack allocation is ok if used correctly. Making all objects heap based
would slow done everything without a lot of gain in security.

That depends on how it is implemented. Dual stacks, ala FORTH, have minimal
overhead on certain systems, including IA-32.
????

With the above improvements, C can be much easier and safer to program.

Go for it!


Rod Pemberton
 
R

Rod Pemberton

qed said:
Huh? I probably need some elaboration here. Do you literally mean that
you shouldn't have local variables in main that are passed by reference
to other functions? Or ... what *do* you mean?

With both static and dynamic memory allocation, pointers, for the most part,
wouldn't be needed in main. Think about 1) with 3).
Huh? You mean that building linked lists and similar data structures
should take an additional operation (first assign the new node malloc to
a variable) or do you mean that making such things should be impossible?

One problem with compilers is optimization. Just how does the compiler
optimize and eliminate code if a pointer is accessing a portion of it? The
compiler needs to know not only the type of data the pointer points to, but
which variable(s) the pointer may point to.
Ok, but then you are no longer programming in C. C is a language that
allows you to understand the *performance* of your application very
clearly. Java is a language allows you to understand what it *does*
very clearly. The two languages are differentiated by this difference
in philosophy. Making C more like Java is just as easily achieved by
throwing it out and starting with Java.

However, I understand the motivation. Why not instead ask for *more*
from the C universe? There are many ways of *extending* the whole
malloc/etc paradigm to make it safer, *faster* and more powerful.

....


Ok, no. Adding refs (the & thingy from C++) I agree with, because
passing pointers tends to more dangerous in general (more likely to pass
in NULL, or garbage/uninitialized pointers) for some cases. However to
be general, you must support *both* semantics, and C does this by
allowing you to pass a pointer in lieu of references. But don't take
away call by value from the language.

Sorry, I should've been more clear. This came from my PL/1 programming
experience where I found it much easier to program. PL/1 uses pass by
reference. It also has support for pass by value. But, in the three
million line program I worked on, I only used pass by value once and only
saw it one other time. Many compilers must convert to pass by reference
anyway for the underlying assembly language.
You mean don't throw growable data types into non-growable arrays on a
stack? I agree. Bstrlib makes exactly this distinction (there is no
sane way to put a non-constant bstring on the stack). Perhaps only
allowing bounds protected enforced types into auto variables.

I was thinking more like dual stacks, ala FORTH. This would have only a
slight overhead in certain environments. It wouldn't prevent data overflow,
but would prevent corruption of the flow control.
Because "C++++" would have been too tacky. You could instead look at
languages like Python, Lua, and Java and ask yourself, what would it
take to design a language that was as easy to use as Python/Lua, and
safe and predictable as Java, with the speed of C?

I'm unfamiliar with Lua, and only aware of Python's existence. Java, from
what I've seen, although safe, seems too restricted.


Rod Pemberton
 
J

Juuso Hukkanen

I am looking for a wish list of things which should be removed from
the C (C99) - due to feature's bad security track record <OT>or
Multithreading unsafety. I need this list for a project intending to
build another (easiest & most powerful) programming language, which
has a two page definition document stating: "... includes C
programming language (C99), except its famous
"avoid-using-this-functions". </OT>

If you would not want to remove a whole function but only the use of
it with certain arguments / parameters, what would those combinations
be like? (Like scanf with %s or %[ arguments )

Probably there are official not to use recommendation lists.
( million times better than this)
http://tele3d.com/wiki/index.php/Parts_of_C99_which_are_NOT_included_in_t3d

Please, do not circumvent the question by saying all functions except
gets() are safe if used properly. That would be like teaching that
"the ideology of Soviet Union was right, it was the Soviet peoples
fault that the system didn't work.

One very popular wish list is Misra C. (Actuall two, since there's a
revision out too.) It endeavors to tame C by outlawing all sorts of
usages that some people think *might* be misused.

70 pages full of recommndations - That does not leave programmes much
room for making disasterous error - booring. Besides they charge 50$
for their can't-touch-this list.
Another is Microsoft's secure/safer/bounded C, a version of which is
now shipping with VC++ V8. It supplies alternatives to many functions
that can be better bounds checked to avoid storage overwrites. This
work is based on Microsoft's massive bug hunt stimulated by all the
viral attacks on Microsoft software largely written in C.

Good stuff summarizing: MS is advancing the C. They identify C's null
terminated strings as the biggest source of problems and suggest a
whole pattern of safer string functions. Noteworthly MS is also
suggesting a successor for strlen - which has not been suggested in
this thread earlier. MS has (in 9/2005) made a draft to C working
group about these _s safer functions
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1135.pdf
Quick scanning tells the amount of suggested 'safer functions' appears
to be atleast 68.

The list of suggested safer functions is below. Basically that list
then gives a list of C's problem functions /defines /macros:

asctime_s
bsearch_s
ctime_s
fopen_s
fprintf_s
freopen_s
fscanf_s
fwprintf_s
fwscanf_s
getenv_s
gets_s
gmtime_s
L_tmpnam_s
localtime_s
mbsrtowcs_s
mbstowcs_s
memcpy_s
memmove_s
printf_s
qsort_s
scanf_s
snprintf_s
snwprintf_s
sprintf_s
sscanf_s
strcat_s
strcpy_s
strerror_s
strerrorlen_s
strncat_s
strncpy_s
strnlen_s
strtok_s
swprintf_s
swscanf_s
TMP_MAX_S
tmpfile_s
tmpnam_s
wcrtomb_s
wcrtoms_s
wcscat_s
wcscpy_s
wcsncat_s
wcsncpy_s
wcsnlen_s
wcsrtombs
wcsrtombs_s
wcstok_s
wcstombs_s
wctomb_s
vfprintf_s
vfscanf_s
vfwprintf_s
vfwscanf_s
wmemcpy_s
wmemmove_s
vprintf_s
wprintf_s
vscanf_s
wscanf_s
vsnprintf_s
vsnwprintf_s
vsprintf_s
vsscanf_s
vswprintf_s
vswscanf_s
vwprintf_s
vwscanf_s
Neither is anywhere near perfect, nor universally accepted. Both are
places to start.

hmm, I'll try to spot the worst ones


Thank you
Juuso Hukkanen
(to reply by e-mail set addresses month and year to correct)
www.tele3d.com
 
C

Chris Hills

Richard Heathfield said:
Juuso Hukkanen said:


Absolutely. Just remove C99. Nobody will notice anyway.

Richard We agree.... But I promise not to let it happen again :)
 
R

Rod Pemberton

CBFalconer said:
If you don't recognize the need for one char lookahead, your
education is sadly lacking. Other languages use other means.

C grammars aren't just written in LALR(1). There are LL grammars for C too
(see ANTLR).

As for your comment that "ungetc() is ...", it's total hogwash. Lookahead
isn't a requirement. One can easily write a lexers and parsers without
ungetc(). You just need to save the previously read characters (i.e.,
lookbehind) instead of using ungetc().
...snip ...
[...] The only case known to me where the one
char limitation creates a problem is in parsing floats with an
invalid exponential part.

I don't even know what you are talking about. Parsing of files
should not rely on the ability to scan backwards and forwards
through a file, but if you really wanted to do that, why wouldn't
you use fgetpos/fsetpos instead?

Reinforcing my comment on education. get/set doesn't work too well
on interactive input files, for example. Go and read up on
lexers. Maybe someone else will explain it to you.

Yours education may be upto date, but your experience is lacking..


Rod Pemberton
 
J

Juuso Hukkanen

You are recommending strncpy and strncat. These are slow functions that
occasionally leave off the terminating '\0'.

Thanks, strncpy removed & replaced with memcpy recommendation
(thinking later that strlcpy(), obviously it one day will be part of
the C standard).
(But limited -- obviously I would recommend removing all of str* and add in Bstrlib >as an alternative, but, I've said this before, and this is likely beyond what you are >looking at/for.)

Originally I planned to use your Bstring library in making the t3d's
string type. However the t3d arrays would need to allow UTF32 formed
(wbyte) arrays and automatic garbage collector, so I designed a new
safe string system.
fgets is not an ideal substitute for gets as explained here:
http://www.pobox.com/~qed/userInput.html (though obviously gets must be
removed.) So I would also recommend removing fgets if you have a
replacement for it (such as getInputFrag, or perhaps just fgetstr)

Even Microsoft is suggesting a safer replacement version for fgets()
-->fgets_s(); in Microsoft's "safe C" proposal sent to C working
group. So apparently fgets() has also been collecting bugs / exploits.
However I don't yet dare to write off the fgets(), because of the
respect for using the standard tools as possible.
I am not sure why you want to get rid of srand() or rand(). Its true
they suck as PRNGs, and race conditions mess them up in ways that can be
worse than you think (and RAND_MAX is generally pathetically small), but

Well you said it, they are not particularly good as PRNG's and
apparently classed unsafe for multithreading.
the real world. Again, if you had a *substitute*, that would be fine.

I think snippet collection http://c.snippets.org/browser.php#12
provides good public domain alternatives for c99 conforming portable
t3d's rand functions like:

t3d_create_RANDOM_NUMBER
t3d_create_RANDOM_NUMBER_MINXXX_MAXXXX
t3d_ create_Rint_RANDOM_NUMBER_MINXXX_MAXXXX
t3d_create_int_int_Rint_RANDOM_NUMBER_n_MIN_MAX
(The right answer here is to demand that the standard change how it works --
however a quick perusal of their guiding principles, indicates there is
no mechanism by which you could reasonably do this.)

C is like bacteria perfectly adapted to its environment, major changes
would likely affect it and the C community somehow negatively, so the
standard resist the change. The truth is that if the people
participating in this thread would be at gunpoint forced to develop
the C into world's best programming language…
--> Out would go: the null-terminated string, and with them str*
functions, all the illogicality in C's return values and parameter
orders, all the multithreading unsafe elements
--> In would come: dynamic memory management, Bstrings, pthreads, GUI
support and lots of easy to use functions which would be utilizable in
those environments which support those features.

Ok, lets be more realistic if the C standard committee members were to
be taken into the US concentration camp at Guantanamo and tortured
there until they would produce a more secure-C. In matter of 0.15
seconds gets() would be history and in less 15 seconds all str*
functions would be retired, simultaneously as Bstrings and garbage
collector happily welcomed into language.

So who do we need to write about the C standard committee (except nice
Mr. P.J Plauger) which is keeping the insecure stuff within the C to
facilitate terrorist acts.
Ok, as for other things that should obviously be removed: ftell() and
fseek(). Use fgetpos() and fsetpos() as the alternatives.

Good point C.L.C FAQ says:
12.25: What's the difference between fgetpos/fsetpos and ftell/fseek?
What are fgetpos() and fsetpos() good for?

A: ftell() and fseek() use type long int to represent offsets
(positions) in a file, and may therefore be limited to offsets
of about 2 billion (2**31-1). The newer fgetpos() and fsetpos()
functions, [~can use any size]

--> ftell and fseek --> gone fishing
I would get rid of ungetc just on principle (can't unread at the
beginning of a file, may screw up fgetpos(), only does a single
character -- its just super lame, and throws a monkey wrench into too
many other functions.)

C standardization working group writes in its latest 'working paper'
(WG14), that

Major changes from the previous edition include:
....
- deprecate ungetc at the beginning of a binary file
....
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf
Of course I think "register" and "inline" are functional placebos in
modern C compilers. They are also deceptively named (both should be
replaced by a single adjective "nonaddressable" or something like that.)

I don't yet dare to touch the C's register, inline or pointer
properties - I believe most C purists love that warm feeling of just
knowing of that those things are there if needed.
C also accepts things like 3[a] as equivalent to a[3], when there
doesn't seem to be a really good reason to do this. This appears to be
strictly for the obfuscated C code competition.

The obfuscated C competition winners probably sit in the standard
committee designing themselves some new triks. How else you can
explain the thing like having Trigraph sequences which require
repeated question marks to be translated into something which causes
the code just to look totally cryptic.

Trigraph sequences
All occurrences in a source file of the following sequences of three
characters (called trigraph sequences12)) are replaced with the
corresponding single character.
??= #
??( [
??/ \
??) ]
??' ^
??< {
??! |
??> }
??- ~

Standard says :
The sequence ?? cannot occur in a valid pre-C89 program except in
strings, character constants,comments, or header names. The character
escape sequence '\?' (see §6.4.4.4) was introduced to allow two
adjacent question marks in such contexts to be represented as ?\?, a
form distinct 15 from the escape digraph. The Committee makes no
claims that a program written using trigraphs looks attractive.

Yes, I can imagine the committee members laughed their heads off while
designing the trigraphs :)

Thank you very much
Juuso Hukkanen
(to reply by e-mail set addresses month and year to correct)
"t3d programming language" and the structure of t3d function prototype
are trademarks of Juuso Hukkanen. (Currently discussing the
transfer of those to a major charity organization).
 
W

Walter Banks

Significant features of C99 that I have found particularly useful
- Size specific data types
- Booleans
- ellipsis in macro argument lists

w..
 
K

Keith Thompson

Rod Pemberton said:
Think about 1) and 3). If pointers in main are eliminated, then all memory
allocations would need to static or dynamic.

Um, no.

Suppose I rename my main() function to my_main(), and add a new main()
function that does nothing but call my_main(). You can tell me I
can't use pointers in main(), but there's nothing to stop me from
using pointers in my_main().

I suspect that "eliminate pointers in main" isn't really what you
mean, but I have no idea what you do mean, or why.
 
R

RSoIsCaIrLiIoA

Once again you demonstrate your exquisite detailed knowledge of C string
handling.

strncpy() behaves as you say; strncat() does not, and is a very useful
function.

Richard

copy and cat, for me are loops or a "snprintf like" work
 
R

Rod Pemberton

Keith Thompson said:

First, reread 3):
Suppose I rename my main() function to my_main(), and add a new main()
function that does nothing but call my_main(). You can tell me I
can't use pointers in main(), but there's nothing to stop me from
using pointers in my_main().

True, but totally irrelevant to my statements. It seems you misread "memory
allocations" as "memory references."

If you take into account:

A) part 3) "eliminate malloc, add dynamic allocation and garbage collection"
B) and last part of 1) "all memory allocations would need to static or
dynamic."

you'll see I wasn't talking about eliminating pointers in other functions.
I was talking about forcing static and dynamic memory allocations.
I suspect that "eliminate pointers in main" isn't really what you
mean, but I have no idea what you do mean, or why.

(Wow. You usually have a slightly but not much higher comprehension. Was
that a low IQ day for you? )


Rod Pemberton
 
K

Keith Thompson

"Static or dynamic" as opposed to what?

C has three storage durations: static, automatic, and allocated. Are
you proposing to eliminate one of these? If so, which one (and why)?
First, reread 3):

"Dynamic allocation" is exactly what malloc() does (unless "dynamic
allocation" means something different to you than it does to the rest
of us). If you were proposing keeping malloc(), eliminating free(),
and adding garbage collection, you might have a coherent idea. Not
necessarily a good one, but at least a coherent one.
True, but totally irrelevant to my statements. It seems you misread "memory
allocations" as "memory references."

No, I didn't.
If you take into account:

A) part 3) "eliminate malloc, add dynamic allocation and garbage collection"
B) and last part of 1) "all memory allocations would need to static or
dynamic."

you'll see I wasn't talking about eliminating pointers in other functions.
I was talking about forcing static and dynamic memory allocations.

So what's special about main()?
(Wow. You usually have a slightly but not much higher comprehension. Was
that a low IQ day for you? )

And I am sick and tired of your stupid personal insults.

I suspect the reason I can't figure out what you're talking about has
something to do with your use of the phrase "dynamic allocation". You
can either explain yourself clearly, or not. If I never understand
what you're talking about, it will be no great loss.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,167
Latest member
SusanaSwan
Top