Pointer Declaration/Array definition

U

ur8x

Why does this declaration give undefined result:

file1: extern char * p;
file2: char p[10];

Let's assume p has been initialized, now accessing p...
 
I

Ivan Vecerina

|
| Why does this declaration give undefined result:
|
| file1: extern char * p;
Allocates memory to store a pointer, which may later be changed
to refer to any memory location.
| file2: char p[10];
Allocates memory for 10 characters at a fixed address.
When the variable p is used, the array is implicitly
converted to a pointer to the first element of the array.
|
| Let's assume p has been initialized, now accessing p...

What is supposed to happen if code that includes
file1 contains a statement such as:
p = NULL;


hth,
Ivan
 
C

CBFalconer

Why does this declaration give undefined result:

file1: extern char * p;
file2: char p[10];

Let's assume p has been initialized, now accessing p...


file1 thinks that p is a pointer to a char. file2 thinks that p
is an array of 10 chars. This is why the "extern char *p;" should
be in a header file that is included in both file1 and file2, and
then the compiler would complain. This follows the simple
principle that header files are used to export things other
modules need to know about.
 
J

Jens.Toerring

Why does this declaration give undefined result:
file1: extern char * p;
file2: char p[10];

Other people already explained why this won't work, i.e. because
a char array and a char pointer are very different things, having
not much in common. I guess your confusion is coming from the
fact that under certain conditions the name of an array is dealt
with as if it would be a pointer to (the first element of) the
array, e.g. in

char p[ ] = "hello word";
char *pp = p;

But this only happens when the array is used in "value context",
i.e. if it is used as if it had a value. Then, and only then, it
is taken to mean (often called "it decays into") the address of
the first element of the array.

But in

extern char *p;

'p' isn't used in "value context" (the compiler even doesn't know
that somewhere else an array of chars named 'p' was defined since
that's in a different source file), so the "decay to pointer" rule
doesn't get involved.
Regards, Jens
 
U

ur8x

Ivan Vecerina said:
|
| Why does this declaration give undefined result:
|
| file1: extern char * p;
Allocates memory to store a pointer, which may later be changed
to refer to any memory location.
| file2: char p[10];
Allocates memory for 10 characters at a fixed address.
When the variable p is used, the array is implicitly
converted to a pointer to the first element of the array.
|
| Let's assume p has been initialized, now accessing p...


Yes, well if p is NULL, accessing p wouldn't make sense.
But let's say p[] has been initialized, if the array is
implicitly converted to a point to the first element, shouldn't
pointer arithmetic get us to p + i * sizeof(char)?
 
U

ur8x

Ok, here is what I want to know: What exactly happens when
p is called, as far accessing and dereferncing that makes
the code wrong (yes, I know it should not work, I just want
to know why).

Thanks.


(e-mail address removed) wrote:
Why does this declaration give undefined result:
file1: extern char * p;
file2: char p[10];
Other people already explained why this won't work, i.e. because
a char array and a char pointer are very different things, having
not much in common. I guess your confusion is coming from the
fact that under certain conditions the name of an array is dealt
with as if it would be a pointer to (the first element of) the
array, e.g. in
char p[ ] = "hello word";
char *pp = p;
But this only happens when the array is used in "value context",
i.e. if it is used as if it had a value. Then, and only then, it
is taken to mean (often called "it decays into") the address of
the first element of the array.
 
J

Jens.Toerring

(e-mail address removed)-berlin.de wrote:

Please be so kind not to top-post.
Ok, here is what I want to know: What exactly happens when
p is called, as far accessing and dereferncing that makes
the code wrong (yes, I know it should not work, I just want
to know why).


In the process of compiling and linking the symbol 'p' will
get replaced by a certain memory address. The code in file2
knows that at this address there's a string, e.g. "ABCDEFG".
But the code in file1 assumes that at that address a pointer
to char is stored. Since you have "ABCDEFG" at that address
the code in file1 will interpret this value stored there as
an address like 0x61626364' (assuming you have 4 byte char
wide addresses on a big-endian machine and ASCII charset, so
0x61 == 'A' etc.). But that's of course no address but just
the bit pattern of the start of the string. If you then use
'p' it tries to dereference that address (0x61626364 + i),
an address to which you proably have no access to and thus
you get a segmentation fault.
Regards, Jens
 
I

Ivan Vecerina

| > | > |
| > | Why does this declaration give undefined result:
| > |
| > | file1: extern char * p;
| > Allocates memory to store a pointer, which may later be changed
| > to refer to any memory location.

| > | file2: char p[10];
| > Allocates memory for 10 characters at a fixed address.
| > When the variable p is used, the array is implicitly
| > converted to a pointer to the first element of the array.
| > |
| > | Let's assume p has been initialized, now accessing p...
|
| Yes, well if p is NULL, accessing p wouldn't make sense.
| But let's say p[] has been initialized, if the array is
| implicitly converted to a point to the first element, shouldn't
| pointer arithmetic get us to p + i * sizeof(char)?

This "implicit conversion" is performed by the compiler, when
it knows that an array is being used as if it were a pointer.
But the generated code and memory layout is very different.

For the array, the assembly pseudocode for p[1] looks like:
- if p is an array:
1) load the address of p in register A
2) increment register A
3) read the byte at address A
- if p is a pointer:
1) load the address of p in register A
2) load the pointer at address A into register B
3) increment register B
4) read the byte at address B

The memory layout is what my previous comments where trying
to explain (left quoted above).
 
U

ur8x

get replaced by a certain memory address. The code in file2
knows that at this address there's a string, e.g. "ABCDEFG".
But the code in file1 assumes that at that address a pointer
to char is stored. Since you have "ABCDEFG" at that address
the code in file1 will interpret this value stored there as
an address like 0x61626364' (assuming you have 4 byte char
wide addresses on a big-endian machine and ASCII charset, so
0x61 == 'A' etc.). But that's of course no address but just
the bit pattern of the start of the string. If you then use
'p' it tries to dereference that address (0x61626364 + i),
an address to which you proably have no access to and thus
you get a segmentation fault.


Excellent, so the p is treated as if it holding an address
to the actual data intended to be read. Thanks.

P.S. Sorry about the top-posting, I just switched my default
editor to emacs.
 
U

ur8x

Ivan Vecerina said:
This "implicit conversion" is performed by the compiler, when
it knows that an array is being used as if it were a pointer.
But the generated code and memory layout is very different.
For the array, the assembly pseudocode for p[1] looks like:
- if p is an array:
1) load the address of p in register A
2) increment register A
3) read the byte at address A
- if p is a pointer:
1) load the address of p in register A
2) load the pointer at address A into register B
3) increment register B
4) read the byte at address B
The memory layout is what my previous comments where trying
to explain (left quoted above).

Thanks. Referring to some other posts, does this "implicit
conversion" also known as "decaying convention?"
 
C

Chris Torek

Ok, here is what I want to know: What exactly happens when
p is called, as far accessing and dereferncing that makes
the code wrong (yes, I know it should not work, I just want
to know why).


In some cases, a picture is worth a thousand words. (Be sure to
view this in a fixed-width font.)

void f(void) {
char a[6] = { '1', '2', '3' };
char *p;
...
}

+-----------------------------------+
| '1' | '2' | '3' | 0 | 0 | 0 |
+-----------------------------------+


+-------------------+ /------------->
| <garbage address> |---------/
+-------------------+

The larger box represents "a", which is made up of six bytes (each
char in C is a "C byte", always). The six bytes have known values
because we initialized "a".

The smaller box represents p, the pointer. We did not initialize
it, so (assuming these are inside a function, as in the example
code) it is full of trash. If viewed as a pointer, the result is
unpredictable -- in this case I have drawn it as a "wild pointer"
pointing off into the weeds somewhere.

Now, if we set p to point to the first element of "a":

p = &a[0];

we get a new picture:

+-----------------------------------+
| '1' | '2' | '3' | 0 | 0 | 0 |
+-----------------------------------+
^
|
+--------------------+
|
+-------------------+ |
| <valid address> -|---+
+-------------------+

Now p contains an arrow pointing to &a[0].

When you write a, the compiler says to itself: "aha, `a', that
is declared as an array, and you want to do something with the
`value' of `a' -- index it like an array, in this case -- so I will
construct a pointer pointing to &a[0] and use that."

This special rule about arrays is a quirk of C. Many other languages
are very different in their treatment of arrays. There is no
fundamental reason the C language *has* to work this way; it just
does. That means that you simply have to memorize this rule. It
is a thing you have to know about C that has no reason other than
"the guy who wrote the language decided to do it that way" -- rather
like the syntax for declarations.

On the other hand, when you write p, the compiler takes the
pointer value p already has -- here, pointing to &a[0] -- and
follows the arrow and then "moves right" according to the number
in "i". Moreover, if you have the variable "p", you can set it
to point to some place other than &a[0]:

p = &a[2];

makes p point to the '3', and p[1] is the first 0 (or '\0' -- same
thing) byte, while p[-2] and p[-1] now exist, naming the '1' and
'2' in a[0] and a[1] respectively. This is because the compiler
generates code that follows the arrow and then "moves right" as
requested, and you have already moved right -- which lets you move
left again, if you want to.

The difference between using a pointer ("p") and using the array
name ("a"), then, is that when you use the array name, the compiler
has to take an extra step to *construct* the pointer it needs, just
so that it can then follow the pointer. Curiously, this extra work
*can* (not necessarily "does", just "can") result in faster machine
code. The reason is that the compiler is allowed to know a lot
more about the pointer it constructed here, *because it constructed
it*. It is not some unknown pointer taken in off the street, with
a mysterious and shady background. The constructed pointer has a
solid pedigree. Of course, given a local variable like "p", a
smart compiler can probably look around and figure out whether "p"
has a similar pedigree -- so on *good* compilers, there tends to
be little if any performance difference. On not-so-good compilers,
it is hard to tell which will be faster -- the array, because the
compiler knows about the pointer it makes, or the pointer, because
the compiler does not have to do the extra "make a pointer" step.
Or perhaps neither will be faster there, either.

The moral of the "performance story" above, as it were, is: use
whichever one is clearer to the human programmer. On a good compiler
it will make no real difference, and on a bad one, you cannot predict
what kind of difference it will make.

For more on The Rule about arrays and pointers in C, see also
<http://web.torek.net/torek/c/pa.html>.
 
I

Ivan Vecerina

| > This "implicit conversion" is performed by the compiler, when
| > it knows that an array is being used as if it were a pointer.
| > But the generated code and memory layout is very different.
....
| Thanks. Referring to some other posts, does this "implicit
| conversion" also known as "decaying convention?"

Some like to say that arrays "decay" into pointers,
which illustrates the fact that the conversion is
not (easily) reversed. But I've also seen it use
to designate the fact that function parameters declared
as having an array type are actually treated as pointers.
E.g.:
int f( char param[16] );
is interpreted by the compiler as:
int f( char *param );


hth
 
D

Dan Pop

Why does this declaration give undefined result:

file1: extern char * p;
file2: char p[10];

Why did you expect anything else? It's the same as:

file1: extern double c;
file2: char c;

All declarations of the same object must match its definition.

If you think that there is anything special about pointers and arrays
in this context, read the FAQ.

Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top