Segfault City

Richard Heathfield · Jun 24, 2006

(e-mail address removed) said:

[X-posted, and followups set]

I found something interesting on the Web today, purely by chance.

It would be funny if it weren't so sad. Or sad if it weren't so funny.
I'm not sure which.

http://www.developerdotstar.com/community/node/291

Click to expand...

Something I wondered about was why the code tests numbers up through
half the highest value (call it N), rather than up through sqrt(N),
which is clearly (?) sufficient. Most discussions I've read of this
algorithm stop at sqrt(N), though perhaps the original algorithm
didn't?

Click to expand...

I saw this recommendation in 1974 (in Wirth) and it makes perfect
sense...if you have an efficient way to get square root.

Look up sqrt() in your C book. That's how my code gets the square root. And
my program works significantly faster than yours, so I think it's safe to
say that introducing the sqrt() call was not too terribly inefficient.

Or you could just write your own integer-based Newton-Raphson search if you
prefer.

Dividing by 2 is implemented by a fast shift in modern compilers which
"loses" the remainder but for the test used, all you need is the value
of the floor.

Yes, your super-efficient division by 2 is considerably faster than my
lumbering sqrt call. Undoubtedly this explains the performance difference
between our two programs (although not perhaps in the way you'd like us to
think).

Dave Vandervies · Jun 24, 2006

Richard Heathfield said:
Certainly possible, but by no means certain. Here is a legal implementation
of isdigit(), which might reasonably be used in a library targeted at an
ASCII platform:

/* somewhere in the library code */
const int __is_digit[UCHAR_MAX + 1] =
{
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1
};

/* in ctype.h */
extern const int __is_digit[UCHAR_MAX + 1];

#define isdigit(c) (((c) == EOF) ? EOF : __is_digit[c])

Wouldn't you want to give back a 0 if c is EOF?

If EOF is -1, you could also change the table slightly and implement
isdigit as
--------
#define isdigit(c) ((__is_digit+1)[c])
--------
which is a bit shorter, and may or may not be easier to read. You could
do that with more negative values for EOF, too, but depending on just
how negative they were it might not be worth the space.

dave

Richard Heathfield · Jun 24, 2006

Dave Vandervies said:

Richard Heathfield <[email protected]> wrote:

#define isdigit(c) (((c) == EOF) ? EOF : __is_digit[c])

Click to expand...

Wouldn't you want to give back a 0 if c is EOF?

Yes.

If EOF is -1, you could also change the table slightly and implement
isdigit as

You could, and an implementation might reasonably do that, but of course we
can't rely on it.

Andrew Poelstra · Jun 24, 2006

Richard joins the sad legion of people who mistake the contents of
their minds, their precise understanding of the Concept at a particular
time, with Absolute Truth.

Wow, do you ever remind me of James Harris.

Keith Thompson · Jun 24, 2006

Chris Smith said:
Please point out a C implementation that's in any significant amount of
use (i.e., not a toy implementation built by a student for a compilers
class, or something like that) that fails to implement isdigit.

A conforming freestanding implementation is not required to provide
the <ctype.h> header. But then, the program under discussion freely
makes use of printf(), which is also not required for a freestanding
implementation.

A bettter question might be:

Please point out a C implementation that's in any significant
amount of use (i.e., not a toy implementation built by a student
for a compilers class, or something like that) that implements
printf but fails to implement isdigit.

Richard Heathfield · Jun 24, 2006

Malcolm said:

What was this priceless pearl?

I did find it eventually, as it happens. So - here ya go:

"In passing, I've noted how difficult it is for some US companies who
alleged deal internationally to handle the fact that there are Other
Countries."
- Terry Pratchett

Richard Heathfield · Jun 24, 2006

Malcolm said:

/*
convert character to numerical value
Params: ch - digit in character encoding
Returns: numerical value of the digit. behaviour if passed non-digit is
undefined
*/
int chartonumber( char ch)
{
assert( isdigit(ch) );
return ch - '0';
}

Nice, easy to see. And if someone passes a non-digit to such a function,
it deserves to kick.
And no, it isn't qute conforming but it's conforming enough. Anyone care
to point out the technical difficulty?

My pleasure.

Firstly, it won't kick if NDEBUG is defined. Secondly, if ch is not
representable as an unsigned char, the behaviour is undefined.

Malcolm · Jun 24, 2006

Chris Smith said:
Please point out a C implementation that's in any significant amount of
use (i.e., not a toy implementation built by a student for a compilers
class, or something like that) that fails to implement isdigit.

Embedded compilers quite often ship without any sort of library
whatsoever.This includes even the trival functions.
However it is no problem to implement them.

I can think of two good reasons for not including them. Firstly, simplicity.
It is a lot easier to say "no library" than give a list of functions which
are implemented or maybe, like printf(), partly implemented. The second is
that if you provide a library you also have to build a linker which only
links in functions actually called. It is a lot easier to tell the
programmer to write his own strlen() and isdigit().

pete · Jun 24, 2006

Richard Heathfield wrote:

A good engineer doesn't blame the wheel
when the real problem is the nut behind it.

I usually describe it as a "loose nut",
especially when describing automotive problems.

"The problem with your car is that
there's a loose nut behind the wheel."

Keith Thompson · Jun 24, 2006

[X-posted, and followups set]

I found something interesting on the Web today, purely by chance.

It would be funny if it weren't so sad. Or sad if it weren't so funny. I'm
not sure which.

http://www.developerdotstar.com/community/node/291

Click to expand...

Something I wondered about was why the code tests numbers up through
half the highest value (call it N), rather than up through sqrt(N),
which is clearly (?) sufficient. Most discussions I've read of this
algorithm stop at sqrt(N), though perhaps the original algorithm
didn't?

Click to expand...

I saw this recommendation in 1974 (in Wirth) and it makes perfect
sense...if you have an efficient way to get square root.

Wirth's discussion, however, emphasized that you add "efficiencies"
sensibly depending on resources.

If the number to be tested for primehood is a power of two, getting
square root is easy, but, it will be already excluded because a power
of two is divisible by 2 and nonprime.

Dividing by 2 is implemented by a fast shift in modern compilers which
"loses" the remainder but for the test used, all you need is the value
of the floor.

So, for big primes, the numbers run up past square root to one half.

You only have to compute the square root once, using the universally
available sqrt() function.

For N = 1000000, this (possibly moderately expensive) computation lets
you reduce the number of iterations by a factor of 500. A single
computation for the sake of saving that much computation is almost
certainly worthwhile for large N, and almost certainly lost in the
noise for small N. (If you're doing a large number of successive
small-N computations, it might make sense to use sqrt(N) for large N,
and N/2 for small N; measurement can determine a reasonable cutoff.
If you're not running the program many times, this probably isn't
worth the effort.)

Keith Thompson · Jun 24, 2006

Richard's strategy is to post a lie that then spreads confusion, and
causes his butt buddies, and people frightened of him, to repeat
character assassinations. It is Swift Boating.

Here is Richard's lie:

</heathfield>

As you can see, it's not a straightforward test for numerics. It uses a
system library function that may or may not be available depending on
the C environment dishonestly and converts to numerics.

[snip]

Others have already addressed this multiple times, and spinoza1111
claims he's no longer following this thread, but I'll jump in anyway.

Richard has not lied. If he had, the quoted code and associated
statement would certainly not demonstrate it. Richard's code is, as
he said, quite straightforward. It uses the isdigit() function, which
is absolutely guaranteed to be part of any conforming hosted
implementation. It has been part of the standard library since the
ANSI C standard was issued in 1989; for that matter, ANSI based it on
existing practice, so it's very likely to be available even in
pre-ANSI implementations (which are becoming quite rare these days).

It's true that conforming implementations of the 1999 ISO C standard
are still far from universal. Conforming implementations of the 1990
ISO C standard, equivalent to the 1989 ANSI standard, however, are now
practically universal. It's the C89/C90 standard that defines the
isdigit() function (based on existing practice) and, as discussed
elsethread, concatentation of adjacent string literals; it would be
difficult to find a compiler for a hosted implementation that doesn't
support both. (Possibly many freestanding implementations also
support isdigit(), but this isn't guaranteed, and I'm not familiar
with freestanding (typically embedded) implementations anyway.)

There is no reason to avoid using isdigit() in code intended for a
hosted implementation. The program uses printf(); using isdigit() is
exactly as safe and sensible.

Both Richard's code and spinoza1111's code (which I commented on
earlier) happen to be correct; they both work, they both yield the
same result for the same arguments. I find Richard's code much more
readable and straightforward than spinoza1111's. Richard never said
that spinoza1111's code (for that particular function) *wasn't*
correct.

Richard is owed an apology, but I seriously doubt that he'll ever get
it.

John Smith · Jun 24, 2006

John Smith wrote:

But I made an error if I did strcat without memory available. I need to
investigate the site code. Right now my higher priority is responding
to a clear lie on REH's part.

I spoke too soon. It does segfault if run with the switch that
makes your error manifest. How is the user supposed to know about
the switch without reading the source code?

And FYI, it's only necessary to trial divide up to sqrt(n), not n/2.

So, thanks for your effort in making a fair check. But I may have made
an error.

OK, so it's ponderous. This is an aesthetic judgement based on a
decadent C culture of programmers with cushy jobs taking far too long
to developed embedded systems in which brevity is an actual attempt to
conceal, as in the case of:

This is the kind of silly drivel you're famous for. Go away.

blmblm · Jun 24, 2006

[email protected] said:
[email protected] said:

[X-posted, and followups set]

I found something interesting on the Web today, purely by chance.

It would be funny if it weren't so sad. Or sad if it weren't so funny. I'm
not sure which.

http://www.developerdotstar.com/community/node/291

Click to expand...

Something I wondered about was why the code tests numbers up through
half the highest value (call it N), rather than up through sqrt(N),
which is clearly (?) sufficient. Most discussions I've read of this
algorithm stop at sqrt(N), though perhaps the original algorithm
didn't?

Click to expand...

I saw this recommendation in 1974 (in Wirth) and it makes perfect
sense...if you have an efficient way to get square root.

Wirth's discussion, however, emphasized that you add "efficiencies"
sensibly depending on resources.

There may be no point in following up here, since Mr. Nilges says
he's left the building. In case he changes his mind, though:

The cost of computing the largest number to use as a possible
factor is a one-time cost, which almost certainly will not be
different for different values of N. You do save something by
computing N/2 versus sqrt(N), but it's a one-time saving.

(Well, I suppose it's a one-time cost/saving if you only do
it once. You have in the past sometimes recomputed such values
on each trip through a loop.)

The cost of using N/2 rather than sqrt(N), however, increases as
N increases. I'm not sure I care to think through right now
exactly what the order of magnitude of both algorithms is, but
I'm pretty sure the one that uses N/2 is bigger. (It's not
O(N) versus O(sqrt(N)), but I think it's something along these
lines.)

I'm fairly sure Wirth would analyze the problem along the same
lines.

If the number to be tested for primehood is a power of two, getting
square root is easy, but, it will be already excluded because a power
of two is divisible by 2 and nonprime.

Dividing by 2 is implemented by a fast shift in modern compilers which
"loses" the remainder but for the test used, all you need is the value
of the floor.

So, for big primes, the numbers run up past square root to one half.

Exactly. Which for big primes matters a lot more ....

blmblm · Jun 24, 2006

[email protected] said:
[email protected] said:

[X-posted, and followups set]

I found something interesting on the Web today, purely by chance.

It would be funny if it weren't so sad. Or sad if it weren't so funny. I'm
not sure which.

http://www.developerdotstar.com/community/node/291

Click to expand...

Click to expand...

[ snip ]

[ .... ] I'm not sure
what to make of string2number, but perhaps there's a good reason
I'm not thinking of to not use one of the library functions (e.g.,
atoi or strtol) that would seem to accomplish the same thing.

Click to expand...

Because in fact few actually existing C environments conform to the
most current Standard I decided to roll my own. I was at the time
engaged in a global computing project for people who can't afford clean
water, let alone rush out and buy a conforming implementation.

There may be no point in following up here, since Mr. Nilges says
he's left the building. In case he changes his mind, though:

And yet you have no qualms about using other library functions --
strlen, strcmp, strcat, and printf being the ones I noticed in a
quick (re)scan through the code.

Seems inconsistent, but perhaps you aren't bothered by that.

lovecreatesbeauty · Jun 24, 2006

Richard said:
Malcolm said:
My pleasure.

Richard, I will ask for your help on following my code snippet. It will
be my pleasure again to read anyone's comments on it.

int ctoi(char c)
{
int i = -1;
if (c >= '0' && c <= '9')
{
i = c - '0';
}
return i;
}

Firstly, it won't kick if NDEBUG is defined. Secondly, if ch is not
representable as an unsigned char, the behaviour is undefined.

I remember someone ever said that there is not release or debug version
in C language. The C language (preprocessor) has Macros, functions. A
particular user (library) function is not a part of the features the C
language. I do not think there is differences between the assert() and
a simple if statement.

lovecreatesbeauty

Chris Smith · Jun 25, 2006

Keith Thompson said:
A conforming freestanding implementation is not required to provide
the <ctype.h> header.

Indeed, I was thinking of hosted implementations. I think it was pretty
well assumed by context... but you're right; I should be more careful
with trolls.

Richard Heathfield · Jun 25, 2006

lovecreatesbeauty said:

I remember someone ever said that there is not release or debug version
in C language.

Well, they were right. But perhaps they forgot to mention NDEBUG to you. If
NDEBUG is defined, assert is preprocessed out of existence.

The C language (preprocessor) has Macros, functions. A
particular user (library) function is not a part of the features the C
language. I do not think there is differences between the assert() and
a simple if statement.

You're wrong.

Malcolm · Jun 25, 2006

lovecreatesbeauty said:
Richard, I will ask for your help on following my code snippet. It will
be my pleasure again to read anyone's comments on it.

int ctoi(char c)
{
int i = -1;
if (c >= '0' && c <= '9')
{
i = c - '0';
}
return i;
}

I remember someone ever said that there is not release or debug version
in C language. The C language (preprocessor) has Macros, functions. A
particular user (library) function is not a part of the features the C
language. I do not think there is differences between the assert() and
a simple if statement.

To ask for the integer value of '~' or 'x' or any non-digit is nonsense.
The question is, what do we want the function to do if that happens?

It depnds what you are doing. if you are writing a video game there is
argument for returning zero and hoping for the best - the worst thing that
can happen is that the game craches anyway, probably you'll just get a tiny
error on screen most players won't notice.
If you are doing accounts software, on the other hand, the last thing you
want is for a wrong value to get into your database. Better far to trigger
an error.
If you are writing software for a life support machine?

assert() is the C way of checking parameters. It always triggers an error
under debug conditions, which is what you want.

Here's how Edward Nilges uses his function

// -----------------------------------------------------------------------
// Convert a string containing an unsigned number to its value: return
// true (-1) on success or false (0) on failure
//
//
// This function returns false when the number's value overflows, but in
// this case, its reference parameter intValue is still returned. In
// this case the value will be negative.
//
//
int string2Number( char *strInstring, int *intValue )
{
int intIndex1;
int intCharValue;
for ( intIndex1 = 0, *intValue = 0;
intIndex1 < strlen( strInstring );
intIndex1++ )
{ intCharValue = char2Number(strInstring[intIndex1]);
if ( intCharValue < 0 ) return 0;
*intValue = *intValue * 10 + intCharValue;
if ( *intValue < 0 ) return 0; }
return -1;
}

Here's how the higher-level function is called

if ( intArgCount < 3
||
! string2Number(strArgValues[1], &intStart)
||
! string2Number(strArgValues[2], &intFinish) )
{
printf( "Syntax: %s\n", SYNTAX_REMINDER );
return -1;
}

It is not terrible code, but the difficulties in reading it mount as error
returns are propagated upwards. One function returns a negative number on
error, another returns zero. Fortunately the program is short enough for
this not to really matter.
Most importantly, every call has an error check anyway. So why not make the
check validate the parameters rather than test the return?

Richard Heathfield · Jun 25, 2006

Malcolm said:

To ask for the integer value of '~' or 'x' or any non-digit is nonsense.
The question is, what do we want the function to do if that happens?

It depnds what you are doing. if you are writing a video game there is
argument for returning zero and hoping for the best -

No, there isn't. If you are writing a video game, there is no excuse
whatsoever for a non-digit character ever reaching such a function, because
the game-supplied data is tightly controlled and should have been validated
before shipping, and user-supplied data can be restricted at the input
stage so that only appropriate data can be entered. Video games have
considerable freedom in that regard, since they have access to more
sophisticated data capture techniques than are available in standard C.

the worst thing that
can happen is that the game craches anyway,

No, that's just the start of your problems. Here's one possible case that is
worse: the player, who happens to be highly influential in the marketplace
(perhaps he's a well-known and much admired reviewer) says "it crashed!
What a lame game - I'm never buying from /that/ company again, and I'm
going to tell everyone else /why/, too".

probably you'll just get a
tiny error on screen most players won't notice.

You'd be amazed at what players notice - especially since many modern video
games are so fast-paced that players /have/ to be able to notice tiny
things very quickly if they are to do well at the game. For example, in a
driving game my son plays, he can spot instantly whether a car in the
distance is a police car, without the benefit of sirens and lights!, by the
presence or absence of a *single pixel* representing the (unlit) roof lamp.
He would be very likely to spot an anomaly caused by a bug in the program
(and often does).

If you are doing accounts software, on the other hand, the last thing you
want is for a wrong value to get into your database. Better far to trigger
an error.

Normal practice in interactive programs is to alert the user and allow
re-entry. In batch mode, perhaps a bank's overnight run, the error is
logged. If processing can continue, it does. Otherwise, a message is
written on the operator's console, and operators ignore such messages at
their peril.

If you are writing software for a life support machine?

You get it right, or someone dies.

assert() is the C way of checking parameters.

No, it's the C way of allowing the programmer to test his assumptions, and
(perhaps) prove them to be false.

It always triggers an error under debug conditions, which is what you
want.

Well, nearly. It always triggers an error when NDEBUG is not defined and the
assertion condition fails, provided no undefined behaviour has got in the
way beforehand.

Here's how Edward Nilges uses his function

*intValue = *intValue * 10 + intCharValue;
if ( *intValue < 0 ) return 0; }
return -1;
}

It is not terrible code,

Overflow is not terrible?

but the difficulties in reading it mount as error
returns are propagated upwards. One function returns a negative number on
error, another returns zero. Fortunately the program is short enough for
this not to really matter.

Yes. In a larger program, consistency would greatly help the maintainer.

Most importantly, every call has an error check anyway. So why not make
the check validate the parameters rather than test the return?

Yes, that would work - e.g.

if(isdigit((unsigned char)ch)
{
n = RidiculousFunctionToSubtractAZeroDigitFrom(ch);
DoSomethingWith(n);
}
else
{
Handle the error
}

But a far better approach can be found on K&R2 page 252.

Richard Heathfield · Jun 25, 2006

Malcolm said:

[...] if you provide a library you also have to
build a linker which only links in functions actually called. It is a lot
easier to tell the programmer to write his own strlen() and isdigit().

Even in /that/ eventuality, it still makes much more sense to write your own
character classification functions, and then *use* them, than it does to
sprinkle character classification code all over your program.

Segfault City

Richard Heathfield

Dave Vandervies

Richard Heathfield

Andrew Poelstra

Keith Thompson

Richard Heathfield

Richard Heathfield

Malcolm

pete

Keith Thompson

Keith Thompson

John Smith

blmblm

blmblm

lovecreatesbeauty

Chris Smith

Richard Heathfield

Malcolm

Richard Heathfield

Richard Heathfield

Members online

Forum statistics

Latest Threads