low-level pointer vs. array question

C

copx

Unforuntately, I know next to nothing about ASM and compiler construction,
and while I was aware of the syntactic differences between pointers and
arrays, I was not aware of this:

http://udrepper.livejournal.com/13851.html
(Yes, it is livejournal, but it is the journal of the glibc maintainer)

I am not sure I really understand this low-level difference between pointers
and arrays, and I do not have the time to learn ASM and compiler
construction (I am just a hobby coder). So I would like to ask you something
about this: one construct I use very often in my programs is an array of
constant strings, and after reading this I wonder whether:

const char foo_array[N_ENTRIES][ENTRY_SIZE] = {
"foo",
"bar"
};

translates to more efficient assembly than this:

const char * foo_array[N_ENTRIES] = {
"foo",
"bar"
};

One could argue that this is a platform specific question, but I would
really like to know the answer to this, and the people most likely to know
how to write efficient C code / how C compilers work are most likely to be
found here ;)

copx
 
R

Richard Heathfield

copx said:
Unforuntately, I know next to nothing about ASM and compiler
construction, and while I was aware of the syntactic differences
between pointers and arrays, I was not aware of this:

http://udrepper.livejournal.com/13851.html
(Yes, it is livejournal, but it is the journal of the glibc
maintainer)

I am not sure I really understand this low-level difference between
pointers and arrays,

Here's a pointer: ------>

Here's an array: +---+---+---+---+---+---+---+---+---+
| | | | | | | | | |
+---+---+---+---+---+---+---+---+---+

Do they not look completely different?
and I do not have the time to learn ASM and
compiler construction (I am just a hobby coder). So I would like to
ask you something about this: one construct I use very often in my
programs is an array of constant strings, and after reading this I
wonder whether:

const char foo_array[N_ENTRIES][ENTRY_SIZE] = {
"foo",
"bar"
};

translates to more efficient assembly than this:

const char * foo_array[N_ENTRIES] = {
"foo",
"bar"
};

They are different constructs with different meanings, so the question
of efficiency doesn't really arise (quick - which is more efficient, a
microwave oven or a telephone?).

The first gives you string storage the contents of which you can modify
at your leisure. The second merely gives you a handful of pointers,
which currently point at string literals that you must not change.
 
M

Mike Wahler

copx said:
Unforuntately, I know next to nothing about ASM and compiler construction,
and while I was aware of the syntactic differences between pointers and
arrays, I was not aware of this:

http://udrepper.livejournal.com/13851.html
(Yes, it is livejournal, but it is the journal of the glibc maintainer)

I don't have time to investigate that article right now,
so I can't comment on its validity. I can recommend this
one: http://pweb.netcom.com/~tjensen/ptr/pointers.htm
Also Google for Chris Torek's article "C for smarties".
IMO Mr. Torek is deservedly considered one of comp.lang.c's
'resident gurus'. I have learned a considerable amount about
C from reading his posts.

My remarks about your questions follow below.
I am not sure I really understand this low-level difference between
pointers and arrays,

At any level, 'high', 'low' or in between, arrays
and pointers are different entities. They differ
both syntactically and semantically (although in
some cases, they can behave the same -- e.g.
via the automatic conversion of an array's name
to a pointer when passed as an argument to a function).

and I do not have the time to learn ASM and compiler construction (I am
just a hobby coder).

There is no need to learn that in order to learn to use
C effectively.
So I would like to ask you something about this: one construct I use very
often in my programs is an array of constant strings, and after reading
this I wonder whether:

const char foo_array[N_ENTRIES][ENTRY_SIZE] = {
"foo",
"bar"
};

translates to more efficient assembly than this:

const char * foo_array[N_ENTRIES] = {
"foo",
"bar"
};

One could argue that this is a platform specific question,

There's no 'argument', it is indeed a platform (and compiler)
specific question. (Ideally) a compiler will be written to
exploit the strengths (and overcome the weaknesses) of a given
target platform.

but I would really like to know the answer to this, and the people most
likely to know how to write efficient C code / how C compilers work are
most likely to be found here ;)

This advice has been given here many times, but I'll say it again:
write your code to be as clear, readable, and maintainable as possible.
Let the compiler writers do what they're good at: optimizing code.
Become good at expressing solutions clearly in the source code.
Only consider optimization if and when profiling proves there's
a performance issue.

-Mike
 
C

copx

[snip]
Here's a pointer: ------>

Here's an array: +---+---+---+---+---+---+---+---+---+
| | | | | | | | | |
+---+---+---+---+---+---+---+---+---+

Do they not look completely different?

They do, but I already knew about this.
and I do not have the time to learn ASM and
compiler construction (I am just a hobby coder). So I would like to
ask you something about this: one construct I use very often in my
programs is an array of constant strings, and after reading this I
wonder whether:

const char foo_array[N_ENTRIES][ENTRY_SIZE] = {
"foo",
"bar"
};

translates to more efficient assembly than this:

const char * foo_array[N_ENTRIES] = {
"foo",
"bar"
};

They are different constructs with different meanings, so the question
of efficiency doesn't really arise (quick - which is more efficient, a
microwave oven or a telephone?).

Of course they are different constructs, but I can use them for the same
purposes. In my case index number -> string / string -> index number
conversions. The constructs do exactly the same thing at a higher level
"give me a bunch of strings which I can address using index numbers" (what I
actually want), now I want to know which construct is more efficient.
The first gives you string storage the contents of which you can modify
at your leisure. The second merely gives you a handful of pointers,
which currently point at string literals that you must not change.

Thanks, but I knew that, too. Well, except that you can modify a const char
array "at your leisure". My compiler would certainly complain about that. Of
course, you can cast the const modifier away, but that would be rather ugly
if you ask me.

copx
 
C

copx

Mike Wahler said:
I don't have time to investigate that article right now,
so I can't comment on its validity. I can recommend this
one: http://pweb.netcom.com/~tjensen/ptr/pointers.htm
Also Google for Chris Torek's article "C for smarties".
IMO Mr. Torek is deservedly considered one of comp.lang.c's
'resident gurus'. I have learned a considerable amount about
C from reading his posts.
[snip]

Thanks, Chapter 6 answered my question, because Mr. Torek did not shy away
from explaining what the statements actually translate to. So now I know
that the array version is indeed superiour for my purposes. It allocates
memory only for the strings while the pointer version additionally allocates
memory for pointer variables. Also accessing an individual string is faster,
because in the array version foo_array[SOME_INDEX] translates to a constant
memory address, while in the pointer version the program has to load a
pointer variable to get the address of the string.
 
E

Eric Sosman

copx wrote On 04/25/07 12:00,:
I don't have time to investigate that article right now,
so I can't comment on its validity. I can recommend this
one: http://pweb.netcom.com/~tjensen/ptr/pointers.htm
Also Google for Chris Torek's article "C for smarties".
IMO Mr. Torek is deservedly considered one of comp.lang.c's
'resident gurus'. I have learned a considerable amount about
C from reading his posts.

[snip]

Thanks, Chapter 6 answered my question, because Mr. Torek did not shy away
from explaining what the statements actually translate to. So now I know
that the array version is indeed superiour for my purposes. It allocates
memory only for the strings while the pointer version additionally allocates
memory for pointer variables. Also accessing an individual string is faster,
because in the array version foo_array[SOME_INDEX] translates to a constant
memory address, while in the pointer version the program has to load a
pointer variable to get the address of the string.

Note that "it allocates memory only for the strings"
is not necessarily the case. The context, you recall, is
const char foo_array[N_ENTRIES][ENTRY_SIZE] = {
"foo",
"bar"
};
versus

const char * foo_array[N_ENTRIES] = {
"foo",
"bar"
};

.... so "only for the strings" holds only if ENTRY_SIZE==4.
If your strings are of various lengths:

const char states[50][ENTRY_SIZE] = {
"Alabama",
"Alaska",
"Arizona",
...
"West Virginia",
"Wisconsin",
"Wyoming",
};

.... you will need to make ENTRY_SIZE at least one greater
than the length of the longest string, hence at least 14
(three U.S. states have thirteen-letter names). The other
forty-seven array elements will hold strings plus extra
zero bytes: "Iowa" and "Ohio" are five-byte strings, but
each will be accompanied by nine extra zeroes. All told,
you will use 50 * ENTRY_SIZE >= 700 bytes on the two-
dimensional array.

How about the array of pointers? The fifty state names
and their '\0' terminators amount to 467 bytes, and the
pointers will take another 50 * sizeof(char*) bytes. If
sizeof(char*) == 4 (as on most 32-bit machines), the total
will be 667 bytes. That's at least 33 bytes *less* than
the "superiour" two-dimensional array! The "only for the
strings" solution actually uses five percent *more* memory
than strings-and-pointers!

Finally, the remarks about access speed must be taken
with a grain of salt, or perhaps an entire heap of salt.
For one thing, conclusions of this sort are highly context-
dependent: optimizing compilers will play all manner of
strange games to exploit patterns in the code, especially
in and around loops. That pointer load you're so worried
about may be completely free, thanks to a prefetch issued
on the preceding loop iteration. Another point is that
you should estimate the potential savings (even if you
can actually achieve them, which isn't certain) in light
of whatever you're going to do with the string once you
know where it is. In a context like

printf ("The states of the U.S.A. are:\n");
for (i = 0; i < 50; ++i)
printf ("\t%s\n", states);

.... saving or wasting one memory reference per array access
will make no difference large enough to measure, even with
very sensitive and expensive test equipment. You might
improve your car's fuel economy by driving naked so the
vehicle needn't carry the weight of your clothing, but the
improvement seems hardly worth while.
 
K

Keith Thompson

copx said:
Unforuntately, I know next to nothing about ASM and compiler construction,
and while I was aware of the syntactic differences between pointers and
arrays, I was not aware of this:

http://udrepper.livejournal.com/13851.html
(Yes, it is livejournal, but it is the journal of the glibc maintainer)
[snip]

The code in the livejournal entry was:

const char *_pcre_ucp_names =
"Any\0"
"Arabic\0"
"Armenian\0"
...
"Zs";

vs.

const char _pcre_ucp_names[] =
"Any\0"
"Arabic\0"
"Armenian\0"
...
"Zs";

It doesn't require any knowledge of assembly language or compiler
construction to realize that the both declarations result in the
creation of a character array, but the first additionally creates a
pointer object. You'll get an implicit pointer *value* when you refer
to the array name (because an array expression is implicitly converted
to a pointer to its first element in most, but not all, contexts), but
there is no pointer object. All this follows directly from the
semantics of standard C.

As for speed, the C standard says nothing about that. It's likely
that access is going to be faster for the second one (since the array
has a constant address in the second case, but an address stored in an
object in the first case). But a clever optimizing compiler *could*
generate exactly the same code for both. And even without clever
optimization, it's not inconceivable that using an address stored in a
(const) object could be faster than using a constant address.

If you want to know which is faster, or otherwise better, *on your
system*, measure it and/or examine the generated code -- but remember
that there's no guarantee that your results will apply to any other
platform.

Section 6 of the comp.lang.c FAQ, <http://www.c-faq.com>, has a lot of
good information about pointers and arrays.
 
K

Keith Thompson

Mike Wahler said:
At any level, 'high', 'low' or in between, arrays
and pointers are different entities. They differ
both syntactically and semantically (although in
some cases, they can behave the same -- e.g.
via the automatic conversion of an array's name
to a pointer when passed as an argument to a function).
[...]

To be clear, an array name is implicitly converted to a pointer to the
array's first element in *most* contexts; there's nothing special
about function arguments. For example:

char arr[10];
char *ptr;
ptr = arr; /* arr is implicitly converted to char* */

The exceptions are when the array expression is the operand of a unary
"&" or "sizeof" operator, or when it's a string literal in an
initializer used to initialize an array.

Separately, an array declaration in a function parameter declaration
is implicitly *translated* to a pointer parameter:

void foo(char arr[]);
/* really means void foo(char *arr); */

This translation takes place during compilation; it's not a
conversion, and it's not directly related to the implicit conversion
mentioned above.
 
B

Bond

Unforuntately, I know next to nothing about ASM and compiler construction,
and while I was aware of the syntactic differences between pointers and
arrays, I was not aware of this:

http://udrepper.livejournal.com/13851.html
(Yes, it is livejournal, but it is the journal of the glibc maintainer)

I am not sure I really understand this low-level difference between pointers
and arrays, and I do not have the time to learn ASM and compiler
construction (I am just a hobby coder). So I would like to ask you something
about this: one construct I use very often in my programs is an array of
constant strings, and after reading this I wonder whether:

const char foo_array[N_ENTRIES][ENTRY_SIZE] = {
"foo",
"bar"

};

translates to more efficient assembly than this:

const char * foo_array[N_ENTRIES] = {
"foo",
"bar"

};

One could argue that this is a platform specific question, but I would
really like to know the answer to this, and the people most likely to know
how to write efficient C code / how C compilers work are most likely to be
found here ;)

copx

i m a hobby coder like u too.but hav a bit gud hand at array n
pointers.
array n pointers r more or less same infact array acts as a pointer.in
ur example they both more or less work the same.
both definitions are for a 2-d array.
 
F

Flash Gordon

Bond wrote, On 26/04/07 21:33:

i m a hobby coder like u too.but hav a bit gud hand at array n

Please do not use contractions like "u" for "you". They make it
needlessly hard for others to read your posts, especially for those for
whom English is not a first language or find reading harder for other
reasons such as dyslexia.
pointers.
array n pointers r more or less same

No, they are VERY different things.
> infact array acts as a pointer.in
ur example they both more or less work the same.
both definitions are for a 2-d array.

No, an array of pointers is VERY different from an array of arrays.

As someone else suggested, I think in this very thread, a signpost
pointing at a city is very different from the city itself.
 
K

Keith Thompson

Bond said:
i m a hobby coder like u too.but hav a bit gud hand at array n
pointers.
array n pointers r more or less same infact array acts as a pointer.in
ur example they both more or less work the same.
both definitions are for a 2-d array.

You're very wrong. Read section 6 of the comp.lang.c FAQ,
<http://www.c-faq.com>, for more information.

And please spell out your words. If you can spell "definitions", you
can spell "you", "good", "and", and "are". This isn't a chat room.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,808
Messages
2,569,684
Members
45,442
Latest member
JoesphSkil

Latest Threads

Top