stack addressing & type nomenclature

B

bill

I recently worked with a piece of code where dereferencing the pointer
was too slow, and I was able to achieve a nearly 2x speed-up by
replacing a local array of size 8 with 8 local variables. (*x requires
2 fetches, x requires 1, so it's easy to explain it, I was just
surprised that I actually encountered a situation where it makes sense
to do this optimization.) Now, I want to test a similar situation, but
the array that will be replaced is substantially larger, and I'd like
to make the code "cleaner" by doing something like the following:

u_int32_t x0, x1, x2, ...., xN;
u_int32_t *p;
int i;
p = &xN;

for (i=0; i<N+1; i++) {
p = initialize(i);
}

Really, the only advantage I get is that I don't have N+1 lines of
initialization in the source. This works in the one case I've
tested...is there any chance that I can rely on this? I'm not worried
about portability, but I am wondering if there's a chance that, for
instance, increasing N might cause the implementation to not put the
local variables in contiguous memory on the stack. Would it be safer
to declare a struct, or should I just avoid this 'trick' altogether?

Also, on a slightly related note, what's the deal with linux defining
"u_int32_t" instead of "uint32_t"? There's a comment before the
declarations of u_int{8,16,32}_t in /usr/include/sys/type.h (on Fedora
Core 3) that reads "But these were defined by ISO C without the first
`_'." Does this mean that ISO C wants the "_" to not be there, but the
linux implementation decided to add it? Which is the proper
nomenclature? My syntax highlighting (vim) recognizes uint32_t and not
u_int32_t, but gcc doesn't like uint32_t. Does it matter? And if so,
which is correct? I stated above that I don't care about portability,
but I want to do things properly at least!
 
E

Eric Sosman

bill said:
I recently worked with a piece of code where dereferencing the pointer
was too slow, and I was able to achieve a nearly 2x speed-up by
replacing a local array of size 8 with 8 local variables. (*x requires
2 fetches, x requires 1, so it's easy to explain it, I was just
surprised that I actually encountered a situation where it makes sense
to do this optimization.)

It's certainly surprising. "Startling" might be a better
term ... Was the code, by any chance, compiled without any
optimization at all?
Now, I want to test a similar situation, but
the array that will be replaced is substantially larger, and I'd like
to make the code "cleaner" by doing something like the following:

u_int32_t x0, x1, x2, ...., xN;
u_int32_t *p;
int i;
p = &xN;

for (i=0; i<N+1; i++) {
p = initialize(i);
}

Really, the only advantage I get is that I don't have N+1 lines of
initialization in the source. This works in the one case I've
tested...is there any chance that I can rely on this?


No. There is no guarantee that the variables are
arranged in memory in the order you require. Some of them
might not even reside in memory at all, if the compiler
decides it can hold a few of them in registers instead.
(The fact that you take the address of xN doesn't mean
that all of x0,x1,... share anything with it; as far as
the compiler knows they are unrelated variables.)

As for "cleaner" -- well, this is obviously some strange
usage of the word 'clean' of which I wasn't previously aware.
I'm not worried
about portability,

... clearly ...
but I am wondering if there's a chance that, for
instance, increasing N might cause the implementation to not put the
local variables in contiguous memory on the stack. Would it be safer
to declare a struct, or should I just avoid this 'trick' altogether?

Do whatever you like. Play your electric guitar in the
shower with the water running, go skateboarding on the
Interstate highway, ask your wife if she's been putting on
weight lately. It's your choice -- personally, I'd lump your
trick with the rest of these hazardous activities and avoid
them all, but you might not be a "Safety First" sort of guy.

Sticking the variables in a struct will certainly keep them
together, and if the variables are all the same type it will
probably even work as expected (by coincidence: the C language
permits padding after any struct element, but compilers have
little incentive to bloat the data without a pressing need).
Also, sticking all the variables in a struct makes it less
likely that the compiler will be able to "promote" some of them
to registers; the optimization may not be particularly effective
in the presence of a large number of variables, but it's a shame
to discourage the optimizer.

If I were facing your situation, I think I'd go back and
re-examine the experiments that led you to abandon arrays in
the first place. Compilers have been optimizing array accesses
for lo! these many years, and it's hardly a black art any more.
Consider also that by abandoning arrays you're bloating the
executable code, which has its own performance penalties.
Go back and re-measure, and if the performance is really as
bad as you say, complain to your compiler vendor.
Also, on a slightly related note, what's the deal with linux defining
"u_int32_t" instead of "uint32_t"? There's a comment before the
declarations of u_int{8,16,32}_t in /usr/include/sys/type.h (on Fedora
Core 3) that reads "But these were defined by ISO C without the first
`_'." Does this mean that ISO C wants the "_" to not be there, but the
linux implementation decided to add it? Which is the proper
nomenclature? My syntax highlighting (vim) recognizes uint32_t and not
u_int32_t, but gcc doesn't like uint32_t. Does it matter? And if so,
which is correct? I stated above that I don't care about portability,
but I want to do things properly at least!

The question of why Linux -- or Solaris, or AIX, or VMS --
does something This Way rather than That Way is better asked
on a newsgroup devoted to the O/S in question.

The "C99" Standard provides the <stdint.h> header that
defines various kinds of width-dependent integers. If your
system has a 32-bit integer, <stdint.h> will declare uint32_t.
This type is optional (a 36-bit system need not provide it,
for example), but <stdint.h> also defines some macros that go
along with each of its types, so you can use the preprocessor
to test for their existence. Also, uint_least32_t and
uint_fast32_t are guaranteed to be defined.

... but all this is required by the "C99" version of the
Standard, and was not present in the older "C90" version that
many compilers still follow. You'll need to check whether the
<stdint.h> header exists on your system -- if it isn't and
you've got a C90 compiler, a reasonable work-around is to whip
up your own "mystdint.h" header that looks something like

#if __STDC_VERSION__ >= 199901L
#include <stdint.h>
#else
/* System-dependent definitions; need to be
* re-examined when porting.
*/
typedef unsigned char uint8_t;
typedef unsigned short uint16_t;
...
#endif

A little work with the macros in <limits.h> can provide some
sanity-checking in the "system dependent" part.
 
T

Tim Prince

bill said:
u_int32_t x0, x1, x2, ...., xN;
u_int32_t *p;
Also, on a slightly related note, what's the deal with linux defining
"u_int32_t" instead of "uint32_t"? There's a comment before the
declarations of u_int{8,16,32}_t in /usr/include/sys/type.h (on Fedora
Core 3) that reads "But these were defined by ISO C without the first
`_'." Does this mean that ISO C wants the "_" to not be there, but the
linux implementation decided to add it? Which is the proper
nomenclature? My syntax highlighting (vim) recognizes uint32_t and not
u_int32_t, but gcc doesn't like uint32_t. Does it matter? And if so,
which is correct? I stated above that I don't care about portability,
but I want to do things properly at least
Is the following correct in implying that u_int32_t is an earlier usage,
ratified by some standards bodies, before C standard addressed the issue?
!http://lists.freedesktop.org/pipermail/release-wranglers/2004-August/000926.html

When you say "gcc doesn't like uint32_t" I assume you refer to some
particular implementation of #include files. Does it make a difference
whether you set -std=c99 ?
 
M

Michael Mair

bill said:
I recently worked with a piece of code where dereferencing the pointer
was too slow, and I was able to achieve a nearly 2x speed-up by
replacing a local array of size 8 with 8 local variables. (*x requires
2 fetches, x requires 1, so it's easy to explain it, I was just
surprised that I actually encountered a situation where it makes sense
to do this optimization.)

<OT>
Strange at best; I can imagine this working on embedded systems but not
on your average PC with a decent implementation.
Did you give the compiler a fighting chance to optimise?
Now, I want to test a similar situation, but
the array that will be replaced is substantially larger, and I'd like
to make the code "cleaner" by doing something like the following:

u_int32_t x0, x1, x2, ...., xN;
u_int32_t *p;
int i;
p = &xN;

for (i=0; i<N+1; i++) {
p = initialize(i);
}

Really, the only advantage I get is that I don't have N+1 lines of
initialization in the source. This works in the one case I've
tested...is there any chance that I can rely on this?


No. The next version of the compiler may order the variables
differently in memory which means that you would walk right into
memory you do not own - or do not want overwritten by this little
trick.
If you can prove that using *p++ or p are not faster than the
above and if you can prove that code size does not matter, then
use some advanced preprocessor to generate single assignments.
I'm not worried
about portability, but I am wondering if there's a chance that, for
instance, increasing N might cause the implementation to not put the
local variables in contiguous memory on the stack. Would it be safer
to declare a struct, or should I just avoid this 'trick' altogether?

You should avoid it altogether, portability or not.

However, asking this question in a newsgroup for your compiler and
platform may yield different results -- and make more sense as in
this newsgroup standard C (C89, C99, and sometimes K&R C) is discussed.

Also, on a slightly related note, what's the deal with linux defining
"u_int32_t" instead of "uint32_t"? There's a comment before the
declarations of u_int{8,16,32}_t in /usr/include/sys/type.h (on Fedora
Core 3) that reads "But these were defined by ISO C without the first
`_'." Does this mean that ISO C wants the "_" to not be there, but the
linux implementation decided to add it? Which is the proper
nomenclature? My syntax highlighting (vim) recognizes uint32_t and not
u_int32_t, but gcc doesn't like uint32_t. Does it matter? And if so,
which is correct? I stated above that I don't care about portability,
but I want to do things properly at least!

With C99, we have the header <stdint.h>. If you include it, you
have access to int_leastN_t, int_fastN_t for N=8,16,32,64 at
least, and intmax_t. If your implementation provides the according
types, you also have intN_t.

BTW: "Portability be damned" and "doing things properly" do not go
together very well.
What does work out is keeping the lid on non-portable assumptions
by having them hidden away in a couple of interface/low level modules.
Then the change from 16Bit to 32Bit to 64Bit systems or the porting
to another unixoid do not provide so many nasty surprises.
Sprinkling your code liberally with whatever fancy struck you at the
moment for no particular reason leads to having to write all the stuff
anew (which may not be detrimental in this case).


Cheers
Michael
 
B

bill

I whole-heartedly agree that my initial thought was a Bad Idea (TM).
Please note that I did include double-quotes around the word "cleaner".
I suppose I meant "requires less typing on my part". The u_int32_t vs
uint32_t has brought up a concern, though. Apparently, my problem was
directly #include-ing <sys/types.h> rather than <inttypes.h> or
<stdint.h>. This implies to me that it is generally a bad idea to
directly include anything in <sys/...>. Is that correct? Does this
mean that any code which includes a file from sys is inherently
non-portable?

To clarify my position on portability: I do care about portability in
the sense that I believe in doing things correctly, but I'm usually
under a lot of pressure to "make it work, now, on that box". It's
very frustrating; I want to do things correctly, but I usually don't
know how. I know just enough to realize that most of the code in my
organization is horrible, but I certainly don't have the time to fix
anything. I'm smiling at the thought that I even suggested the code
above. The funny/sad part is, I can see someone coming across it in a
few years and, rather than cursing my name for eternity for writing it,
actually thinking it's cute and using the technique. Bad code seems to
proliferate more quickly in some environments; it's very confusing why
that is.

In any case, I will re-do the profiling with more aggressive
optimizations on the compiler and see what happens. I'm hoping you're
right and that it will be unnecessary for me to get away from the array.
 
F

Flash Gordon

bill said:
I recently worked with a piece of code where dereferencing the pointer
was too slow, and I was able to achieve a nearly 2x speed-up by
replacing a local array of size 8 with 8 local variables. (*x requires
2 fetches, x requires 1, so it's easy to explain it, I was just
surprised that I actually encountered a situation where it makes sense
to do this optimization.)

That is highly implementation specific.
> Now, I want to test a similar situation, but
the array that will be replaced is substantially larger, and I'd like
to make the code "cleaner" by doing something like the following:

u_int32_t x0, x1, x2, ...., xN;
u_int32_t *p;
int i;
p = &xN;

for (i=0; i<N+1; i++) {
p = initialize(i);
}

Really, the only advantage I get is that I don't have N+1 lines of
initialization in the source. This works in the one case I've
tested...is there any chance that I can rely on this? I'm not worried
about portability, but I am wondering if there's a chance that, for
instance, increasing N might cause the implementation to not put the
local variables in contiguous memory on the stack. Would it be safer
to declare a struct, or should I just avoid this 'trick' altogether?


Avoid it completely. The compiler might decided to reorder the
parameters if you change the switches or upgrade the compiler.
Also, on a slightly related note, what's the deal with linux defining
"u_int32_t" instead of "uint32_t"? There's a comment before the
declarations of u_int{8,16,32}_t in /usr/include/sys/type.h (on Fedora
Core 3) that reads "But these were defined by ISO C without the first
`_'."

<snip>

sys/type.h is not part of standard C. Asking about it on a linux or
posix group would be better. Any C99 header you are possibly thinking of
is stdint.h
 
G

Giannis Papadopoulos

bill said:
I whole-heartedly agree that my initial thought was a Bad Idea (TM).
Please note that I did include double-quotes around the word "cleaner".
I suppose I meant "requires less typing on my part". The u_int32_t vs
uint32_t has brought up a concern, though. Apparently, my problem was
directly #include-ing <sys/types.h> rather than <inttypes.h> or
<stdint.h>. This implies to me that it is generally a bad idea to
directly include anything in <sys/...>. Is that correct? Does this
mean that any code which includes a file from sys is inherently
non-portable?

The only truly portable programs are the ones that
1) do not use compiler specific tricks and declarations
2) do not rely upon tricks that work on specific hardware
3) use only C's standard libraries
4) do not invoke undefined behavior (have I missed anything?)

Including any <sys/...> header (which is not an ISO C (C89. C99
whatever) library), you make your program able to compile only on some
un*x systems and cygwin..
To clarify my position on portability: I do care about portability in
the sense that I believe in doing things correctly, but I'm usually
under a lot of pressure to "make it work, now, on that box". It's
very frustrating; I want to do things correctly, but I usually don't
know how. I know just enough to realize that most of the code in my
organization is horrible, but I certainly don't have the time to fix
anything. I'm smiling at the thought that I even suggested the code
above. The funny/sad part is, I can see someone coming across it in a
few years and, rather than cursing my name for eternity for writing it,
actually thinking it's cute and using the technique. Bad code seems to
proliferate more quickly in some environments; it's very confusing why
that is.

You are half-way there... You care about portability and you know that
you don't know much about it. All you need is to read-read-read. And ask
when you face a dead-end. And if you ask something that is not so right,
nobody will ever accuse you (of course, making the same wrong question
over and over without trying to understand the answers, may lead to your
trollification)...

In a strange way, bad code propagates more easily than good code... I do
not know though why this happens..
In any case, I will re-do the profiling with more aggressive
optimizations on the compiler and see what happens. I'm hoping you're
right and that it will be unnecessary for me to get away from the array.

Looking at your problem, the array is the only way...

The program below compiles fine. It only works on gcc when no
optimizations are imposed. However, even with -Os or -O1 in gcc 3.3.5, I
get a very nice and informative "Segmentation Fault".

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

int main(void) {
uint32_t x0, x1, x2, x3;
uint32_t *p;
int i;
p = &x3;

fprintf(stderr, "%p %p %p %p\n", (void*)&x0, (void*)&x1, (void*)&x2,
(void*)&x3);

for (i=0; i<4; i++) {
p = 0;
}

return EXIT_SUCCESS;
}


--
one's freedom stops where others' begin

Giannis Papadopoulos
http://dop.users.uth.gr/
University of Thessaly
Computer & Communications Engineering dept.
 
K

Keith Thompson

bill said:
I whole-heartedly agree that my initial thought was a Bad Idea (TM).
Please note that I did include double-quotes around the word "cleaner".
I suppose I meant "requires less typing on my part". The u_int32_t vs
uint32_t has brought up a concern, though. Apparently, my problem was
directly #include-ing <sys/types.h> rather than <inttypes.h> or
<stdint.h>. This implies to me that it is generally a bad idea to
directly include anything in <sys/...>. Is that correct? Does this
mean that any code which includes a file from sys is inherently
non-portable?

As far as the C standard is concerned, #include'ing anything other
than one of the 24 standard headers (there are fewer in C90) is
non-portable. It doesn't matter whether its name starts with "sys/".

There may be guidelines for your particular system, and perhaps even a
secondary standard that might provide some additional guidance.

<OT>
If you're using a Unix-like system, "man getuid", for example,
probably advises you to include <unistd.h> and <sys/types.h>.
Neither of this is portable to non-Unix-like systems.
</OT>
 
H

Hans

bill said:
I recently worked with a piece of code where dereferencing the pointer
was too slow, and I was able to achieve a nearly 2x speed-up by
replacing a local array of size 8 with 8 local variables. (*x requires
2 fetches, x requires 1, so it's easy to explain it, I was just
surprised that I actually encountered a situation where it makes sense
to do this optimization.)
[snip]
I am a bit curious about how accessing through a pointer can be that
much slower. How have you accessed the data? With most CPU
architectures I have seen, once you have the pointer in a CPU register,
it is about the fastest access method available.

Do you have some mixed C/assembly listings available? That might help
us pinpoint the reason for the performance hit.
 
G

Gordon Burditt

I whole-heartedly agree that my initial thought was a Bad Idea (TM).
Please note that I did include double-quotes around the word "cleaner".
I suppose I meant "requires less typing on my part". The u_int32_t vs
uint32_t has brought up a concern, though. Apparently, my problem was
directly #include-ing <sys/types.h> rather than <inttypes.h> or
<stdint.h>. This implies to me that it is generally a bad idea to
directly include anything in <sys/...>. Is that correct? Does this

You should only include nonstandard headers (which includes anything
in <sys/...>) if you need to do something unportable anyway. That's
not a sin, but you should realize what you're doing. If the whole
purpose of the program is to manipulate password files, it's not
too surprising if you need to use OS-specific routines for accessing
password files and include the header files for them. Trying to
write your OWN password file manipulation routines may be less
portable than using the ones supplied by the system (the implementation
may vary a lot but the OS-supplied interface is consistent).
Including <sys/types.h> rather than <inttypes.h> because you like
the spelling of the typedef better probably comes under a gratuitous
unportability you should avoid.

Realize that including those headers may generate conflicts with
your own code that won't be a problem in purely portable code: for
instance, although i and p are commonly used variable names
<sys/proc.h> might provide typedefs for them, which will break your
code if you use these as variables.

mean that any code which includes a file from sys is inherently
non-portable?

If you don't supply the included header as part of your program,
but expect it to exist, and standard C doesn't guarantee that it
exists, it's unportable. On some platforms it will fail to compile
on the basis of a nonexistent header. Sometimes you bite the bullet
and say: if you expect to build this program, you need to install
the PNG library, version X.Y or greater, and its associated headers
before compiling this program. Other times you just limit the
program to, say, POSIX systems and expect the OS to supply the
headers.
To clarify my position on portability: I do care about portability in
the sense that I believe in doing things correctly, but I'm usually
under a lot of pressure to "make it work, now, on that box". It's
very frustrating; I want to do things correctly, but I usually don't
know how. I know just enough to realize that most of the code in my
organization is horrible, but I certainly don't have the time to fix
anything. I'm smiling at the thought that I even suggested the code
above. The funny/sad part is, I can see someone coming across it in a
few years and, rather than cursing my name for eternity for writing it,
actually thinking it's cute and using the technique. Bad code seems to
proliferate more quickly in some environments; it's very confusing why
that is.

It takes time to learn how to do things portably. You should avoid
gratuitous unportability that's easily avoided. You shouldn't try
to re-write your entire environment (you're screwed next time there
is an OS or compiler upgrade that depends on changes in those
non-standard header files). Sometimes the whole objective (e.g.
"Play a sound file") is inherently unportable (it can't be written
in portable C) and you need to use whatever unportable features are
needed to do that.
In any case, I will re-do the profiling with more aggressive
optimizations on the compiler and see what happens. I'm hoping you're
right and that it will be unnecessary for me to get away from the array.

Gordon L. Burditt
 
G

Gordon Burditt

The only truly portable programs are the ones that
1) do not use compiler specific tricks and declarations
2) do not rely upon tricks that work on specific hardware
3) use only C's standard libraries
4) do not invoke undefined behavior (have I missed anything?)

I would like to suggest that the following assumptions are unportable
but do not violate any of the above but are still unportable (generally
come under OS-specific, or individual system-specific assumptions that
aren't specific to hardware type):

1. The file /etc/passwd exists, can be read, and has a specific format.
2. Temporary files with the name /tmp/%s.tmp, where %s is an 8-character
alphanumeric string, can be created on this system.
3. The environment variable $HOME has a value which is related to
some kind of directory.
4. The file /usr/home/author/src/adventuregames/dungeonmaps/myhighschool.txt
exists and can be read.
5. The program "/usr/local/bin/md5sum" exists and can be invoked with
the system() function. Also, it takes a file name as an argument.

Gordon L. Burditt
 
G

Giannis Papadopoulos

Gordon said:
I would like to suggest that the following assumptions are unportable
but do not violate any of the above but are still unportable (generally
come under OS-specific, or individual system-specific assumptions that
aren't specific to hardware type):

1. The file /etc/passwd exists, can be read, and has a specific format.
2. Temporary files with the name /tmp/%s.tmp, where %s is an 8-character
alphanumeric string, can be created on this system.
3. The environment variable $HOME has a value which is related to
some kind of directory.
4. The file /usr/home/author/src/adventuregames/dungeonmaps/myhighschool.txt
exists and can be read.
5. The program "/usr/local/bin/md5sum" exists and can be invoked with
the system() function. Also, it takes a file name as an argument.

Gordon L. Burditt

So,

5) do not depend on the fact that certain files, external programs or
environment parameters exist

However, if you try to load a file that you did not provide, I don't
think it is a portability issue.

Anything else?

--
one's freedom stops where others' begin

Giannis Papadopoulos
http://dop.users.uth.gr/
University of Thessaly
Computer & Communications Engineering dept.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top