code portability

Ben Pfaff · Aug 4, 2006

Harald van DD3k said:
!(x=func(y)) doesn't test func's return value. It tests func's return
value converted to the type of x. If x is narrower than func(), even
nonzero return values may cause the expression to evaluate to 1.

True. I missed that. But the point stands: you should be
assigning it to the proper type. Again, this is important
regardless of the width of the type in question. I don't care if
"int" is 16 or 32 bits as long as "int" is the type that func()
returns.

lawrence.jones · Aug 4, 2006

Keith Thompson said:
That's a problem with the design of <stdint.h>; the naming scheme
assumes that exact-width types are more important than types with *at
least* a specified size or range.

That's the problem with codifying existing practice rather than making
things up out of whole cloth. The original designers were only
interested in the exact-width types. By the time it got to the
standards committee, we were pretty much stuck with those names; it
would have been antisocial to change them.

-Larry Jones

Oh, now YOU'RE going to start in on me TOO, huh? -- Calvin

lawrence.jones · Aug 4, 2006

jacob navia said:
but then... I would have been incompatible
to both: gcc AND MSVC.

You say that like it's a *bad* thing.... ;-)

-Larry Jones

Oh, what the heck. I'll do it. -- Calvin

Keith Thompson · Aug 4, 2006

That's the problem with codifying existing practice rather than making
things up out of whole cloth. The original designers were only
interested in the exact-width types. By the time it got to the
standards committee, we were pretty much stuck with those names; it
would have been antisocial to change them.

I understand and agree; the committee didn't have the option of
designing the ideal language from scratch. (And if they had, there's
no guarantee anyone else would agree with their design decisions.)

"To summarize the summary of the summary: People are a problem."
-- Douglas Adams

Dik T. Winter · Aug 5, 2006

> I understand and agree; the committee didn't have the option of
> designing the ideal language from scratch. (And if they had, there's
> no guarantee anyone else would agree with their design decisions.)

There are a few languages designed from scratch. And indeed in all
cases not everyone did agree with the design decisions. That is why
we have Pascal (an off-spring of the Algol 68 design effort). There
is also reluctance for such languages because the objections will be:
"too difficult to implement", and so you will not see many compilers.
Algol 60 found much opposition from the US because it would be too
difficult to implement (that is a reason why Knuth designed a subset
of the language that threw away some of the major features). For the
same reason also full featured Pascal found much reluctance, it was
only when subsets where implemented that it found its place. (Pascal
level 1, 2 and 3, if I remember correctly.)

I still have a report in my possession, written in the early sixties,
describing the implementation of variable length arrays on the stack
for Algol 60. Still I have worked early on with full implementations
of Algol 60, Algol 68 and Pascal, while people were still muttering
that it was too difficult to implement. Especially CDC's effort on
the full implementation of Algol 68 is worth mentioning (although the
US branch never has acknowledged that they had a compiler). The
compiler was written by some nine workers without any earlier
knowledge of writing compilers. I think that it took about a year to
complete (I was present at some progress meetings). Also consider the
very first Pascal compiler written as (I think) a Ph.D. thesis by Urs
Amman. Too difficult to implement? Rather unwillingness to implement.

websnarf · Aug 5, 2006

Keith said:
Malcolm said:

Eigenvector wrote:
[...] I fully realize that
independent developers may or may not conform to standards, but again is
it at least encouraged?

Not really. By its very nature C encourages non-portable programming.
In general, I try to write code portably, but the only thing keeping me
honest is actually compiling my stuff with multiple compilers to see
what happens.

Click to expand...

Yes. There is a tension between efficiency and portability. In Java they
resolved it by compromising efficiency, in C we have to be careful to make
our portable code genuinely portable, which is why the topic is so often
discussed.
There is also the problem of "good enough" portability, for instance
assuming ASCII and two's complement integers.

Click to expand...

I rarely find it useful to assume ASCII.

Who cares what *YOU* find useful or not.

I would like to auto-initialize an array that maps from unsigned char
-> parse-state, which makes, say, letters to one value, numbers to
another, etc. The reason I want to auto-initialize this rather than
calling some init() routine that sets them up is because I want to
support correct multithreading, and my inner loops that use such an
array are going so fast, that auto-first-time checking actually is
unacceptable overhead.

If I can't assume ASCII, then this solution has simply been taken away
from me. Compare this with the Lua language, which allows unordered
specific index auto-initialization.

[...] It's usually just as easy to
write code that depends only on the guarantees in the standard, and
will just work regardless of the character set. It would be
marginally more convenient to be able to assume that the character
codes for the letters are contiguous, but that's easy enough to work
around.

Yeah, well obviously you don't work in environments where performance
and portability matters.

As for two's complement, I typically don't care about that either.
Numbers are numbers. If I need to do bit-twiddling, I use unsigned.

And if you need a correctly functioning ring modulo 2**n? If you can
assume 2s complement then you've *got one*. Otherwise, you get to
construct one somehow (not sure how hard this is, I have never ever
been exposed to a system that didn't *ONLY* support 2s complement).

Chris McDonald · Aug 5, 2006

Who cares what *YOU* find useful or not.

That's the great attitude that we all love to see!
Keep up the good work,

Keith Thompson · Aug 5, 2006

Keith said:
Keith said:

Malcolm said:

Eigenvector wrote:
[...] I fully realize that
independent developers may or may not conform to standards, but again is
it at least encouraged?

Not really. By its very nature C encourages non-portable programming.
In general, I try to write code portably, but the only thing keeping me
honest is actually compiling my stuff with multiple compilers to see
what happens.

Yes. There is a tension between efficiency and portability. In Java they
resolved it by compromising efficiency, in C we have to be careful to make
our portable code genuinely portable, which is why the topic is so often
discussed.
There is also the problem of "good enough" portability, for instance
assuming ASCII and two's complement integers.

Click to expand...

I rarely find it useful to assume ASCII.

Click to expand...

Who cares what *YOU* find useful or not.

Gosh, I don't know. Do you care? Because, as you know, your opinion
matters a great deal to me. It's probably because of your charming
manner.

I would like to auto-initialize an array that maps from unsigned char
-> parse-state, which makes, say, letters to one value, numbers to
another, etc. The reason I want to auto-initialize this rather than
calling some init() routine that sets them up is because I want to
support correct multithreading, and my inner loops that use such an
array are going so fast, that auto-first-time checking actually is
unacceptable overhead.

If I can't assume ASCII, then this solution has simply been taken away
from me. Compare this with the Lua language, which allows unordered
specific index auto-initialization.

I can think of several ways to do this. You can use some automated
process to generate the C code for you during the build process,
perhaps with a build-time option to select ASCII or some other
character set. Or you can explicitly invoke an initialization routine
exactly once as your program is starting up and save the expense of
checking on each use. Or (and this may or may not be available to
you), you can use C99, which lets you do just what you want. Here's a
brief outline of what I think you're trying to do:
========================================================================
#include <stdio.h> #include <limits.h>

enum CHAR_CLASS { OTHER=0, UPPER, LOWER, DIGIT };

static const enum CHAR_CLASS cclass[UCHAR_MAX + 1] =
{ ['a'] = LOWER,
['b'] = LOWER,
/* ... */
['A'] = UPPER,
['B'] = UPPER,
/* ... */
['0'] = DIGIT,
['1'] = DIGIT,
/* ... */
};

int main(void)
{
for (int c = 0; c <= UCHAR_MAX; c ++) {
if (cclass[c] != OTHER) {
printf("'%c' => %d\n", c, cclass[c]);
}
}

return 0;
}
========================================================================

This works with gcc 3.4.5 and 4.1.1 with "-std=c99".

Or you can just (drum roll please) assume ASCII. If you'll look very
closely at what I wrote above:

| I rarely find it useful to assume ASCII.

you'll see the word "rarely", not "never". If assuming ASCII, and
therefore making your code non-portable to non-ASCII platforms, makes
it significantly faster, that's great. I might consider adding a
check at program startup, something like
if ('A' != 65) {
/* yes, it's an incomplete check */
fprintf(stderr, "This program won't work on a non-ASCII system\n");
exit(EXIT_FAILURE);
}
or I might not bother; I'd at least document the assumption somewhere
in the code. (No, you can't reliably test this in the preprocessor;
see C99 6.10.1p3.)

The fact that you've managed to cite a single application where
assuming ASCII happens to be useful does not refute anything I've
said.

Write portable code if you can. If you need to write non-portable
code, keep it as isolated as you can (but you may *sometimes* find
that a portable implementation would have worked just as well in the
first place).

Yeah, well obviously you don't work in environments where performance
and portability matters.

Obviously you have no clue about the environments in which I work.

And if you need a correctly functioning ring modulo 2**n? If you can
assume 2s complement then you've *got one*. Otherwise, you get to
construct one somehow (not sure how hard this is, I have never ever
been exposed to a system that didn't *ONLY* support 2s complement).

It's been a while since my last abstract algebra class, but isn't a
"ring module 2**n" simply the set of integers from 0 to 2**n-1? And
isn't that precisely what C's *unsigned* integer types are?

If you want a particular behavior for *signed* integers on overflow, C
doesn't guarantee this; overflow of a signed integer invokes undefined
behavior. Very commonly it happens to do the obvious 2's-complement
wraparound you're probably thinking of. If you need to write code
that depends on that, go ahead. Be aware that it's not 100% portable,
but it may well be portable enough for your purposes.

In case I've misunderstood your point, I'll expand on what I wrote
above.

I *typically* don't care about two's-complement. If I actually need
to write code that depends on two's-complement, I'll do so. All else
being equal, portable code, particularly code that depends only on
guarantees made by the C standard, is better than non-portable code --
and, personally, I usually find it easier to write and understand.
(Don't bother posting counterexamples; I wrote "usually".) In those
cases where all else *isn't* equal, there's a tradeoff between
portability and performance and convenience, and whatever other
desirable attibutes you want to think about).

Keith Thompson · Aug 5, 2006

Keith Thompson said:
#include <stdio.h> #include <limits.h>

enum CHAR_CLASS { OTHER=0, UPPER, LOWER, DIGIT };

static const enum CHAR_CLASS cclass[UCHAR_MAX + 1] =
{ ['a'] = LOWER,
['b'] = LOWER,
/* ... */
['A'] = UPPER,
['B'] = UPPER,
/* ... */
['0'] = DIGIT,
['1'] = DIGIT,
/* ... */
};

int main(void)
{
for (int c = 0; c <= UCHAR_MAX; c ++) {
if (cclass[c] != OTHER) {
printf("'%c' => %d\n", c, cclass[c]);
}
}

return 0;
}

Of course the two #include directives need to be on separate lines; I
accidentally joined them when I reformatted the previous paragraph
before posting.

BTW, the output is:

'0' => 3
'1' => 3
'A' => 1
'B' => 1
'a' => 2
'b' => 2

Ben Pfaff · Aug 5, 2006

Keith Thompson said:
I might consider adding a check at program startup, something
like
if ('A' != 65) {
/* yes, it's an incomplete check */
fprintf(stderr, "This program won't work on a non-ASCII system\n");
exit(EXIT_FAILURE);
}

Is there some reason that this can't be done at compile time:
#if 'A' != 65
#error Needs ASCII character set
#endif

Keith Thompson · Aug 5, 2006

Ben Pfaff said:
Is there some reason that this can't be done at compile time:
#if 'A' != 65
#error Needs ASCII character set
#endif

Yes.

(Barely resisting the temptation to leave it at that...)

N1124 6.10.1p3:

This includes interpreting character constants, which may involve
converting escape sequences into execution character set
members. Whether the numeric value for these character constants
matches the value obtained when an identical character constant
occurs in an expression (other than within a #if or #elif
directive) is implementation-defined.

Footnote:

Thus, the constant expression in the following #if directive and
if statement is not guaranteed to evaluate to the same value in
these two contexts.

#if 'z' - 'a' == 25

if ('z' - 'a' == 25)

I did refer to this upthread, just after the portion you quoted:

| or I might not bother; I'd at least document the assumption somewhere
| in the code. (No, you can't reliably test this in the preprocessor;
| see C99 6.10.1p3.)

Ben Pfaff · Aug 5, 2006

Keith Thompson said:
Yes.

I need to do a better job of reading. Thank you for your
patience.

websnarf · Aug 5, 2006

Keith said:
Keith said:

Eigenvector wrote:
[...] I fully realize that
independent developers may or may not conform to standards, but again is
it at least encouraged?

Not really. By its very nature C encourages non-portable programming.
In general, I try to write code portably, but the only thing keeping me
honest is actually compiling my stuff with multiple compilers to see
what happens.

Yes. There is a tension between efficiency and portability. In Java they
resolved it by compromising efficiency, in C we have to be careful to make
our portable code genuinely portable, which is why the topic is so often
discussed.
There is also the problem of "good enough" portability, for instance
assuming ASCII and two's complement integers.

I rarely find it useful to assume ASCII.

Click to expand...

Who cares what *YOU* find useful or not.

Click to expand...

Gosh, I don't know. Do you care? Because, as you know, your opinion
matters a great deal to me. It's probably because of your charming
manner.

Its just so typical of you to answer generic questions with what
happens to suit you. As if you represent the only kind of C programmer
that there is, or should be.

I can think of several ways to do this. You can use some automated
process to generate the C code for you during the build process,
perhaps with a build-time option to select ASCII or some other
character set.

The subject for this thread is "code portability". So of course, I
assume you have a way of doing this portably.

[...] Or you can explicitly invoke an initialization routine
exactly once as your program is starting up and save the expense of
checking on each use.

Ok, read carefully, I just told you I can't do that. If I am willing
to sacrifice performance (remember we're setting up a constant
addressed look-up table so we're expecting a throughput of 1/3 of a
single clock (or even 1/4 of a clock on these new Intel Core CPUs) for
this operation) why would I bother doing this through a look up table
in the first place?

[...] Or (and this may or may not be available to
you), you can use C99,

Again, the subject of this thread is "code portability". Use C99 is
diametrically opposite to this goal.

[...] This works with gcc 3.4.5 and 4.1.1 with "-std=c99".

The irony of this statement is just unbelievable ... . Two versions of
gcc counts as portability?

Or you can just (drum roll please) assume ASCII. If you'll look very
closely at what I wrote above:

| I rarely find it useful to assume ASCII.

you'll see the word "rarely", not "never".

That's nice, but you've removed the context. This is not a response to
the generic question posed. This is just a statement about *your*
predilictions. The fact is that *I* rarely find it useful as well,
because I don't write a lot of code that does parsing. But that is
completely irrelevant, which is why, of course, I refrained from making
such ridiculous non-sequitor statements. *rarely* is not the only word
you wrote there, you also wrote the word *I*.

[...] If assuming ASCII, and
therefore making your code non-portable to non-ASCII platforms, makes
it significantly faster, that's great. I might consider adding a
check at program startup, something like
if ('A' != 65) {
/* yes, it's an incomplete check */
fprintf(stderr, "This program won't work on a non-ASCII system\n");
exit(EXIT_FAILURE);
}
or I might not bother; I'd at least document the assumption somewhere
in the code. (No, you can't reliably test this in the preprocessor;
see C99 6.10.1p3.)

The fact that you've managed to cite a single application where
assuming ASCII happens to be useful does not refute anything I've
said.

This is a *single* application? I am talking about a technique, not an
application. The fact is, this comes up for a wide variety of string
parsing scenarios, where speed (or in fact *simplicity*) might be a
concern. We're talking about ASCII here -- where else would such a
concern apply?

Write portable code if you can. If you need to write non-portable
code, keep it as isolated as you can (but you may *sometimes* find
that a portable implementation would have worked just as well in the
first place).

Now why couldn't you have posted this more reasoned position instead of
the drivel that you did in the first place?

Obviously you have no clue about the environments in which I work.

Ok, well then maybe you are just bad at your job, or maybe you have
long term memory problems like the guy from the movie Memento.

It's been a while since my last abstract algebra class, but isn't a
"ring module 2**n" simply the set of integers from 0 to 2**n-1?

No, that would be a list or a set.

Your bizarre relationship with the definition of technical words is a
real curiosity. How can you pretend to be a computer programmer, and
be so far removed from standard nomenclature? It would be ok if you
just mixed up a few words or something I wouldn't make a big deal about
it. But you appear to not know the concepts on the other side of these
words.

[...] And isn't that precisely what C's *unsigned* integer types are?

First of all no, and second of all if it was, then it wouldn't be a
ring.

A Ring is a set with a 0, a + operator and a * operator. And the point
is that its completely *closed* under these operations. In typical 2s
complement implementations, I know that integers (signed or not) are
rings. In 1s complement machines -- I have no idea; I don't have
access to such a machine (I never have in the past, and I almost
certainly never will in the future), and just don't have familliarity
with 1s complement. It doesn't have the natural wrapping properties
that 2s complement has, so my intuition is that its *not* a ring, but I
just don't know.

The reason why this is important is for verification purposes. Suppose
I write the following:

x = (y << 7) - (y << 2);

Well, that should be the same as x = y * 124. How do I know this?
Because I know that y << 7 is the same as y * 128, and y << 2 is the
same as y * 4. After that, there is a concern that one of operands of
the subtract might wrap around, while the other one doesn't. Or both
might. Because of that, direct verification of this fact might lead
you to believe that you need to look at these as seperate cases and
very carefully examine the bits to make sure that the results are still
correct. But we don't have to. If we *know* that the expression is
equivalent to y*128 - y*4, then because 2s complement integers form an
actual ring, then we are allowed rely on ordinary algebra without
concern. Wrap around doesn't matter -- its always correct.
Verification of just straight *algebra* is unnecessary, we can just
rely on mathematics.

Chris Torek · Aug 5, 2006

A Ring is a set with a 0, a + operator and a * operator. And the point
is that its completely *closed* under these operations.

This is ... hardly a thorough definition. You need to add
commutativity (for +) and distribution (of * over +), in particular.

In typical 2s complement implementations, I know that integers
(signed or not) are rings. In 1s complement machines -- I have
no idea ...

And that is where you have missed Keith Thompson's point -- because
even on ones' complement machines, *unsigned* integers (in C) are
still rings. So use "unsigned"; they give you the very property
you want. They *guarantee* it.

Flash Gordon · Aug 5, 2006

Keith said:
Keith said:

Keith Thompson wrote:
Eigenvector wrote:
[...] I fully realize that
independent developers may or may not conform to standards, but again is
it at least encouraged?
Not really. By its very nature C encourages non-portable programming.
In general, I try to write code portably, but the only thing keeping me
honest is actually compiling my stuff with multiple compilers to see
what happens.
Yes. There is a tension between efficiency and portability. In Java they
resolved it by compromising efficiency, in C we have to be careful to make
our portable code genuinely portable, which is why the topic is so often
discussed.
There is also the problem of "good enough" portability, for instance
assuming ASCII and two's complement integers.
I rarely find it useful to assume ASCII.
Who cares what *YOU* find useful or not.

Click to expand...

Gosh, I don't know. Do you care? Because, as you know, your opinion
matters a great deal to me. It's probably because of your charming
manner.

Click to expand...

Its just so typical of you to answer generic questions with what
happens to suit you. As if you represent the only kind of C programmer
that there is, or should be.

Unless someone knows *every* domain, which no one does, then all they
can do is talk about the areas they do. Therefore *any* response to a
generic question will be based on what the person answering it comes >
across.

Keith rarely finds it useful to assume ASCII, it appears you regularly
find it useful to assume ASCII. Neither shows what the situation is
across all domains.

The subject for this thread is "code portability". So of course, I
assume you have a way of doing this portably.

I'm sure Keith can.

[...] Or you can explicitly invoke an initialization routine
exactly once as your program is starting up and save the expense of
checking on each use.

Click to expand...

Ok, read carefully, I just told you I can't do that. If I am willing
to sacrifice performance (remember we're setting up a constant
addressed look-up table so we're expecting a throughput of 1/3 of a
single clock (or even 1/4 of a clock on these new Intel Core CPUs) for
this operation) why would I bother doing this through a look up table
in the first place?

int main(void)
{
do_init()
/* Throw off as many off topic threads as you want */
/* rest of program */
}

Calling do_init has a major impact on the performance of the program?

[...] Or (and this may or may not be available to
you), you can use C99,

Click to expand...

Again, the subject of this thread is "code portability". Use C99 is
diametrically opposite to this goal.

Keith noted that C99 might not be available to you. However, if it is
available on all platforms of interest then it might be portable enough.

[...] This works with gcc 3.4.5 and 4.1.1 with "-std=c99".

Click to expand...

The irony of this statement is just unbelievable ... . Two versions of
gcc counts as portability?

If it is valid C99, and I have no reason to believe it isn't, there are
other compilers it will work on.

That's nice, but you've removed the context. This is not a response to
the generic question posed. This is just a statement about *your*
predilictions. The fact is that *I* rarely find it useful as well,
because I don't write a lot of code that does parsing. But that is
completely irrelevant, which is why, of course, I refrained from making
such ridiculous non-sequitor statements. *rarely* is not the only word
you wrote there, you also wrote the word *I*.

Which means that what Keith wrote is perfectly clear. You (and probably
Keith) do not know whether for the majority of programs it is useful to
assume ASCII or not, all you know is the domains you know about.

[...] If assuming ASCII, and
therefore making your code non-portable to non-ASCII platforms, makes
it significantly faster, that's great. I might consider adding a
check at program startup, something like
if ('A' != 65) {
/* yes, it's an incomplete check */
fprintf(stderr, "This program won't work on a non-ASCII system\n");
exit(EXIT_FAILURE);
}
or I might not bother; I'd at least document the assumption somewhere
in the code. (No, you can't reliably test this in the preprocessor;
see C99 6.10.1p3.)

The fact that you've managed to cite a single application where
assuming ASCII happens to be useful does not refute anything I've
said.

Click to expand...

This is a *single* application? I am talking about a technique, not an
application. The fact is, this comes up for a wide variety of string
parsing scenarios, where speed (or in fact *simplicity*) might be a
concern. We're talking about ASCII here -- where else would such a
concern apply?

So if I come up a technique for two things covering two wide varieties
of scenarios where assuming ASCII provides no benefit that will prove
that generally you don't need to assume ASCII?

Ok, well then maybe you are just bad at your job, or maybe you have
long term memory problems like the guy from the movie Memento.

Or maybe Keith is good at his job and does things where it is rarely
useful to assume ASCII?

No, that would be a list or a set.

Your bizarre relationship with the definition of technical words is a
real curiosity. How can you pretend to be a computer programmer, and
be so far removed from standard nomenclature? It would be ok if you
just mixed up a few words or something I wouldn't make a big deal about
it. But you appear to not know the concepts on the other side of these
words.

There are large fields of computing where algebra is not required.
Certainly large fields where rings are not required.

[...] And isn't that precisely what C's *unsigned* integer types are?

Click to expand...

First of all no, and second of all if it was, then it wouldn't be a
ring.

A Ring is a set with a 0, a + operator and a * operator. And the point
is that its completely *closed* under these operations.

Which unsigned integer types are.

> In typical 2s
complement implementations, I know that integers (signed or not) are
rings.

You obviously no very little about how unsigned integers are defined in
C. They are the same *whatever* representation is used for signed integers.

> In 1s complement machines -- I have no idea;

Had you bothered to look you would know that the signed integer
representation does not affect the unsigned integer representation.
Keith *explicitly* stated *unsigned*.

> I don't have
access to such a machine (I never have in the past, and I almost
certainly never will in the future), and just don't have familliarity
with 1s complement. It doesn't have the natural wrapping properties
that 2s complement has, so my intuition is that its *not* a ring, but I
just don't know.

Singed integers are not defined as being a ring *whatever*
representation is used. I've used processors that use 2s complement
where they will limit on overflow of addition/subtraction instead of
wrapping. There are times in signal processing where this is a *very*
useful property.

The reason why this is important is for verification purposes. Suppose
I write the following:

x = (y << 7) - (y << 2);

Well, that should be the same as x = y * 124. How do I know this?
Because I know that y << 7 is the same as y * 128, and y << 2 is the
same as y * 4. After that, there is a concern that one of operands of

If you understood unsigned integers in C you would understand that it
applies whatever the signed representation is. I would still use
multiplication rather than a shift/subtract when I want multiplication
and let the compiler sort out the optimisation. After all, that is what
the optimisation phase is for. In any case, on some processors it would
be *faster* to multiply because they have single cycle hardware multipliers.

the subtract might wrap around, while the other one doesn't. Or both
might. Because of that, direct verification of this fact might lead
you to believe that you need to look at these as seperate cases and
very carefully examine the bits to make sure that the results are still
correct. But we don't have to. If we *know* that the expression is
equivalent to y*128 - y*4, then because 2s complement integers form an
actual ring, then we are allowed rely on ordinary algebra without
concern. Wrap around doesn't matter -- its always correct.
Verification of just straight *algebra* is unnecessary, we can just
rely on mathematics.

As Keith said, you get these guarantees on unsigned integers. So if you
need a ring use unsigned integers. Since unsigned integers are
guaranteed to be a ring by the C standard.

websnarf · Aug 5, 2006

Chris said:
This is ... hardly a thorough definition.

I didn't claim it was. This isn't a classroom; thoroughness is not the
same as correctness.

[...] You need to add commutativity (for +) and distribution (of * over +), in
particular.

In typical 2s complement implementations, I know that integers
(signed or not) are rings. In 1s complement machines -- I have
no idea ...

Click to expand...

And that is where you have missed Keith Thompson's point -- because
even on ones' complement machines, *unsigned* integers (in C) are
still rings. So use "unsigned"; they give you the very property
you want. They *guarantee* it.

And now you are starting to make Keith-style mistakes. What if I need
to do algebra on signed integers? I need the "ring properties" for
proofs of correctness -- this is not an useful end in of itself. If I
cannot apply these properties to signed integers, then I cannot do
algebra on signed integers without great difficulty.

Compare this to the situation in 2s complement. Suppose its
*difficult* to prove something on signed integers, but easy to prove it
for unsigned. But if it turns out you can "lift" from signed to
unsigned through casting and your theorem still makes sense, then you
likely can just apply the proof through this mechanism.

What Keith said is tantamount to saying "don't use negative numbers, if
you plan on doing sound arithmetic". This is kind of useless.

Keith Thompson · Aug 5, 2006

Keith said:
Keith said:

Keith Thompson wrote:
Eigenvector wrote:
[...] I fully realize that independent developers may or may
not conform to standards, but again is it at least
encouraged?

Not really. By its very nature C encourages non-portable
programming. In general, I try to write code portably, but
the only thing keeping me honest is actually compiling my
stuff with multiple compilers to see what happens.

Yes. There is a tension between efficiency and portability. In
Java they resolved it by compromising efficiency, in C we have
to be careful to make our portable code genuinely portable,
which is why the topic is so often discussed. There is also
the problem of "good enough" portability, for instance
assuming ASCII and two's complement integers.

I rarely find it useful to assume ASCII.

Who cares what *YOU* find useful or not.

Click to expand...

Gosh, I don't know. Do you care? Because, as you know, your opinion
matters a great deal to me. It's probably because of your charming
manner.

Click to expand...

Its just so typical of you to answer generic questions with what
happens to suit you. As if you represent the only kind of C programmer
that there is, or should be.

I never said or implied that.

The subject for this thread is "code portability". So of course, I
assume you have a way of doing this portably.

No, I don't.

[...] Or you can explicitly invoke an initialization routine
exactly once as your program is starting up and save the expense of
checking on each use.

Click to expand...

Ok, read carefully, I just told you I can't do that. If I am willing
to sacrifice performance (remember we're setting up a constant
addressed look-up table so we're expecting a throughput of 1/3 of a
single clock (or even 1/4 of a clock on these new Intel Core CPUs) for
this operation) why would I bother doing this through a look up table
in the first place?

You said you wanted "an array that maps from unsigned char ->
parse-state, which makes, say, letters to one value, numbers to
another, etc.". I took that to be a description of a lookup table.
If you were referring to something else, I suggest you write more
clearly.

[...] Or (and this may or may not be available to
you), you can use C99,

Click to expand...

Again, the subject of this thread is "code portability". Use C99 is
diametrically opposite to this goal.

[...] This works with gcc 3.4.5 and 4.1.1 with "-std=c99".

Click to expand...

The irony of this statement is just unbelievable ... . Two versions of
gcc counts as portability?

No, of course not. You described a problem; I suggested some
solutions, and I clearly stated that some of them are not portable.
The fact that the subject of this thread happens to be "code
portability" does not mean that I am obligated to discuss only
portable solutions. In fact, I am discussing portable
vs. non-portable code.

That's nice, but you've removed the context. This is not a response to
the generic question posed. This is just a statement about *your*
predilictions. The fact is that *I* rarely find it useful as well,
because I don't write a lot of code that does parsing. But that is
completely irrelevant, which is why, of course, I refrained from making
such ridiculous non-sequitor statements. *rarely* is not the only word
you wrote there, you also wrote the word *I*.

Yes, I wrote the word "I" because "I" was talking about my own
experience. If it's not useful to you, that's too bad.

[...] If assuming ASCII, and
therefore making your code non-portable to non-ASCII platforms, makes
it significantly faster, that's great. I might consider adding a
check at program startup, something like
if ('A' != 65) {
/* yes, it's an incomplete check */
fprintf(stderr, "This program won't work on a non-ASCII system\n");
exit(EXIT_FAILURE);
}
or I might not bother; I'd at least document the assumption somewhere
in the code. (No, you can't reliably test this in the preprocessor;
see C99 6.10.1p3.)

The fact that you've managed to cite a single application where
assuming ASCII happens to be useful does not refute anything I've
said.

Click to expand...

This is a *single* application? I am talking about a technique, not an
application. The fact is, this comes up for a wide variety of string
parsing scenarios, where speed (or in fact *simplicity*) might be a
concern. We're talking about ASCII here -- where else would such a
concern apply?

Write portable code if you can. If you need to write non-portable
code, keep it as isolated as you can (but you may *sometimes* find
that a portable implementation would have worked just as well in the
first place).

Click to expand...

Now why couldn't you have posted this more reasoned position instead of
the drivel that you did in the first place?

It's what I've been saying all along. Pay attention.

Ok, well then maybe you are just bad at your job, or maybe you have
long term memory problems like the guy from the movie Memento.

The subject of this thread is "code portability", not "gratuitous
insults".

It's been a while since my last abstract algebra class, but isn't a
"ring module 2**n" simply the set of integers from 0 to 2**n-1?

Click to expand...

No, that would be a list or a set.

Your bizarre relationship with the definition of technical words is a
real curiosity. How can you pretend to be a computer programmer, and
be so far removed from standard nomenclature? It would be ok if you
just mixed up a few words or something I wouldn't make a big deal about
it. But you appear to not know the concepts on the other side of these
words.

[...] And isn't that precisely what C's *unsigned* integer types are?

Click to expand...

First of all no, and second of all if it was, then it wouldn't be a
ring.

A Ring is a set with a 0, a + operator and a * operator. And the point
is that its completely *closed* under these operations. In typical 2s
complement implementations, I know that integers (signed or not) are
rings. In 1s complement machines -- I have no idea; I don't have
access to such a machine (I never have in the past, and I almost
certainly never will in the future), and just don't have familliarity
with 1s complement. It doesn't have the natural wrapping properties
that 2s complement has, so my intuition is that its *not* a ring, but I
just don't know.

A ring is not just a set, nor is it just a set with a 0, a + operator,
and a - operator. There are several other properties it has to have.
You flame me for an incomplete definition, then offer another
incomplete definition yourself.

I believe that unsigned int satisfies those properties, but signed int
may or may not; for example, the standard makes no guarantee that any
signed type is closed under addition. It's probably true that signed
integers on most 2's-complement systems (which are almost all existing
systems) also happen to satisfy those properties.

The reason why this is important is for verification purposes. Suppose
I write the following:

x = (y << 7) - (y << 2);

Well, that should be the same as x = y * 124. How do I know this?
Because I know that y << 7 is the same as y * 128, and y << 2 is the
same as y * 4. After that, there is a concern that one of operands
of the subtract might wrap around, while the other one doesn't. Or
both might. Because of that, direct verification of this fact might
lead you to believe that you need to look at these as seperate cases
and very carefully examine the bits to make sure that the results
are still correct. But we don't have to. If we *know* that the
expression is equivalent to y*128 - y*4, then because 2s complement
integers form an actual ring, then we are allowed rely on ordinary
algebra without concern. Wrap around doesn't matter -- its always
correct. Verification of just straight *algebra* is unnecessary, we
can just rely on mathematics.

If you *know* that 2's-complement integers form a ring, then you are
depending on properties not guaranteed by the C standard. (You are,
of course, free to do so.)

Incidentally, you might find that it's possible to have a technical
discussion without being a hypocritical jerk. Try it.

Walter Roberson · Aug 5, 2006

I didn't claim it was. This isn't a classroom; thoroughness is not the
same as correctness.

You gave a definition for ring, but there are sets that match your
definition that are NOT rings, because your definition was incomplete
even for common types of rings.
http://mathworld.wolfram.com/Ring.html

It is not clear to me how someone can complain about someone
else's "bizarre relationship to technical terms" and then themselves
misuse a technical term that they themself have indicated is important
to part of their discussion.

Walter Roberson · Aug 5, 2006

And if you need a correctly functioning ring modulo 2**n? If you can
assume 2s complement then you've *got one*. Otherwise, you get to
construct one somehow (not sure how hard this is, I have never ever
been exposed to a system that didn't *ONLY* support 2s complement).

Caution: on most 2s complement machines, the *signed* integers do
not form a ring. In cases where INT_MIN is (-INT_MAX - 1)
(e.g., INT_MIN is -32768 for an INT_MAX of 32767) then there
is no "additive inverse" for INT_MIN -- no element in the set
such that INT_MIN plus the element is 0.

This is not an issue for *unsigned* integers: operations on the
unsigned integers are defined such that the additive inverse of
the maximum unsigned integer is always 1 [if I recall correctly.]

websnarf · Aug 5, 2006

Walter said:
Caution: on most 2s complement machines, the *signed* integers do
not form a ring. In cases where INT_MIN is (-INT_MAX - 1)
(e.g., INT_MIN is -32768 for an INT_MAX of 32767) then there
is no "additive inverse" for INT_MIN -- no element in the set
such that INT_MIN plus the element is 0.

What do you mean? The additive inverse of INT_MIN is INT_MIN.

Portability	29	Apr 6, 2008
Portability issues (union, bitfields)	7	Nov 4, 2009
Typecasting portability in C	12	Jan 28, 2007
Static code analysis tool	0	Jan 22, 2020
C portability is a myth	93	Feb 13, 2005
Portability / compatibility issues	23	Jan 15, 2006
Writing "absolutely" portable code	62	Jan 9, 2012
reinterpret_cast portability/alignment issues	10	Dec 31, 2006

code portability

Ben Pfaff

lawrence.jones

lawrence.jones

Keith Thompson

Dik T. Winter

websnarf

Chris McDonald

Keith Thompson

Keith Thompson

Ben Pfaff

Keith Thompson

Ben Pfaff

websnarf

Chris Torek

Flash Gordon

websnarf

Keith Thompson

Walter Roberson

Walter Roberson

websnarf

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads