Neatest way to get the end pointer?

  • Thread starter Tomás Ó hÉilidhe
  • Start date
P

Peter Nilsson

Keith Thompson said:
...The standard's rather long-winded explanation of this
is in C99 6.5.6p8 ...

I have thought about this, and clearly the standard talks
about arrays. If we have foo[N], then foo + N is a valid
pointer that cannot be dereferenced.
However, in foo = &bar; foo+1 is *not* a valid pointer
because &bar is a pointer, not an array.

The cited paragraph is specifically about adding an integer
to a pointer. Arrays decaying to pointers is in 6.3.2.1p3.
 
V

vippstar

I have thought about this, and clearly the standard talks
about arrays. If we have foo[N], then foo + N is a valid
pointer that cannot be dereferenced.
However, in foo = &bar; foo+1 is *not* a valid pointer
because &bar is a pointer, not an array.

The cited paragraph is specifically about adding an integer
to a pointer. Arrays decaying to pointers is in 6.3.2.1p3.
I don't think you understand what I am trying to say.
--
char foo[N]; /* array of N length */
char c;
char *bar = foo + N; /* valid */
bar = &c + 1; /* invalid */
--
I understand the matter with arrays and the pointer over the last
element, but I believe that does not apply for pointers.
Furthermore, I think you don't understand arrays since you claimed
my_array to be int[] while it is int[X], and &my_array to be int (*)[]
while it is int (*)[X].
 
W

William Ahern

On Feb 6, 2:09 am, William Ahern <[email protected]>
wrote:
I don't think that can happend.

It does on most of my machines.
As i said, NULL does not need to be casted here.
You don't need the comma operator either. the expression '(x, y, z)'
evaluates to the type of z with value z.

Indeed, exactly as 6.5.17 states. But, what is the type of an unadorned `0'?
(Note, as I mentioned earlier, on *BSD NULL is defined as simply `0'.) Is it
a pointer or an integer? In a comma expression, it's an integer, because
there's no other context to suggest otherwise. Why? I assume because the
language is too rigid. 6.5.17 must be satisified before 6.8.6.4(3)--which
specifies how expression types are coerced when used with return.

Try it in your compiler. GCC, for instance, exhibits this behavior.

cc test.c -o test
test.c: In function ‘foo’:
test.c:4: warning: return makes pointer from integer without a cast

This is one of the primary reasons I don't use NULL. Maybe its foolhardy
(likely, even), but if you truly grok how type coercion works in C, then the
visual syntactical benefit of using NULL loses much of its value. It's sort
of like seeing the red headed woman in a screen of falling green glyphs.
(Obligatory The Matrix reference.)
Likewise, you have to cast it when passing as a vararg.
You have to cast 0 to (char *)0 too.
execl("/bin/true", "true", (char *)NULL);

Usually I just use an unadorned `0' in my code. I do this in part because,
like the OP, I don't feel like including <stddef.h> or <stdio.h> or any
other header that I don't need.
That in my opinion is very bad practise for a project.
I comment my include statements to describe
what I'm [trying] to import; it got really old doing:

#include <stddef.h> /* NULL */

But, to each his own. I don't know very many people who do this. I'm okay
being alone ;)
Other programmers in your project might not be okay with that thought,
ask them first.
As for me, I wouldn't. Consider
--
char *p;
/* ... */
if(p == 0) /* did you really mean p == 0 or *p == 0 ? */
if(p == NULL) /* clearly ment p == NULL */

(1) Though you snipped the imaginary intermediate code, I strive to make all
of my function definitions fit within an 80x24 terminal window. So the
declaration is usually as easy to spot as it is here.

(2) How does that compare to?

if(!p) ...

Granted, I do indeed use the former, but my point is that very likely you
(or most other people) would find it less objectionable. But I don't expend
any effort to ease a programmer's cognitive load at _that_ particular level
of examination. That is, at the level where he's either unsure of the type
of that object, or what it's being used for. If somebody is doing as
brain-dead an operation as, say, search+replace, I don't want him anywhere
near my code.

Instead, I use 8-space hard tabs and lots of whitespace. Mostly K&R style.
That makes actually _reading_ (as opposed to _looking_) at my code easier,
IMO. Even there I'm usually in the minority, though.
 
W

Walter Roberson

char foo[N]; /* array of N length */
char c;
char *bar = foo + N; /* valid */
bar = &c + 1; /* invalid */
I understand the matter with arrays and the pointer over the last
element, but I believe that does not apply for pointers.

Incorrect. A pointer to a non-array object type is indicated by the
standard to be the same thing as a pointer to the beginning of array
of length 1.
 
J

Jack Klein

I commonly use pointers to iterate thru an array. For example:

int my_array[X];

int *p = my_array;
int const *const pend = my_array + sizeof my_array/sizeof*my_array;

Why not:

int const *const pend = my_array + X;

....???

Whether the value that defines the size, which you have tokenized here
as 'X', is a macro or an enumeration constant, and whether it is
defined immediately above the definition of the array or in an
included header or file, it is certainly available for initializing a
pointer in the source line following the definition of the array.

And this, of course, is bullet-proof, even if you decide you need an
array twice as large, or only half the size.

It even works for C99 VLS's (extraneous const keywords omitted for
brevity):

void some_func(int X)
{
int my_array [X];
int *pend = my_array [X];

int const *const pend = *(&my_array+1);

I had to look closely at this to understand what you are doing. At
first glance, it appeared you were assigning the address of an array
of X ints to a pointer to int.

Obviously at least some other posters thought you were dereferencing
the one-past pointer. That may point to their lack of understanding
the calculation of * and & in this type of expression. But even
though I do understand it, it took a moment to think through weather
or not it applies in the unusual syntax.

If you can't use my suggestion above (an I don't see why not, unless
you are using hard coded magic numbers for the array size, and think
you will forget to change it in the pointer initialization after you
change it in the array definition), they why not the, to me, cleaner:

int *pend = (int *)(my_array + 1);

....which does not even appear to dereference the pointer, and should
not confuse anybody?

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.club.cc.cmu.edu/~ajo/docs/FAQ-acllc.html
 
K

Keith Thompson

William Ahern said:
I don't use NULL 'cause I don't know to what it's defined. Usually the NULL
macro expands to `0' or `(void *)0'. The former, typical on *BSD, needs to
be cast when used from a comma expression and the return type of your
function is a pointer, otherwise it might be evaluated as an int, which
might not be wide enough if (sizeof int != sizeof (void *)).

if (some_test_fails)
return (errno = ESOMETHING), (void *)NULL;

Is there something wrong with:

if (some_test_fails) {
errno = ESOMETHING;
return NULL;
}

? The comma operator is rarely necessary outside macro definitions.
Likewise, you have to cast it when passing as a vararg.

execl("/bin/true", "true", (char *)NULL);

Yes, you do. You also have to cast 0:

execl("/bin/true", "true", (char*)0);

(execl isn't defined by standard C, of course; let's assume that it's
variadic and that its last argument should be a null pointer of type
char*.)

I know that NULL is defined as a null pointer constant. That's all I
need to know. That tells me I can use an unadorned NULL in most
contexts, but I need to cast it if it's an argument whose type is not
specified by the called function's prototype (this almost always means
it's a variadic function).

Of course, using 0 rather than NULL is perfectly legal. I personally
dislike it on stylistic grounds (I find NULL more explicit), but any C
programmer needs to be able to read and understand code written in
either style.

[snip]
 
K

Keith Thompson

Peter Nilsson said:
Ah, I was a bit confused I guess.
However, OP _does_ dereference it. Not in that snipped,
but in 4)

Again, it is not derefenced...

The type of my_array is int[]
The type of &my_array is int (*)[]
The type of &my_array + 1 is int (*)[]
The type of *(&my_array + 1) is int []

The last expression will decay to an int * when used in the
assignment to pend. At no stage is that pointer dereferenced.

I'm not convinced (which is not to say that you're wrong).

The declaration was

int my_array[X];

so (&my_array + 1) is of type int (*)[X] (let's assume X is constant;
VLAs make my head hurt).

Then *(&my_array + 1) is of type int[X]. This is an expression of
array type, and it's the value of an array object that does not exist.
It then immediately decays to a value of type int*, pointing just past
the end of my_array. But I think that the intermediate expression
*before* the array-to-pointer conversion invokes UB. (Though I'd be
surprised if any implementation did anything other than what was
intended; after all, there's no need to load the value of the
nonexistent array object.)

Since it's easy enough to compute the length of the array, you can
achieve the desired result with
my_array + X
or (as others have mentioned in this thread)
my_array + sizeof my_array / sizeof *my_array
with the length calculation encapsulated in a macro definition if you
like.
 
K

Keith Thompson

...
int *p = my_array;
int const *const pend = my_array + sizeof my_array/
sizeof*my_array;
do *p++ = 42;
while (pend != p);
At no stage is pend dereferenced; and the loop exits on
p == pend, so the address is not dereferenced by p
either.
Ah, I was a bit confused I guess.
However, OP _does_ dereference it. Not in that snipped,
but in 4)
4) *(&my_array+ 1) decays to the address of the first
element in the non-existant array after the current one,
which is also the "pend" address for the array that
actually exists.

Again, it is not derefenced...

The type of my_array is int[]
The type of &my_array is int (*)[]
The type of &my_array + 1 is int (*)[]
The type of *(&my_array + 1) is int []

The last expression will decay to an int * when used in the
assignment to pend. At no stage is that pointer dereferenced.
What i ment is that (&myarray + 1) is dereferenced, which would be
invalid, and if my reply to mr Thompson is correct, then even
computing &my_array+1 is invalid.
Remember; We are no longer talking about arrays but pointers, the
pointer-after-the-last-element rule does not apply.

No, just computing &my_array+1 is valid; only dereferencing it would
be invalid.

Consider this declaration:

my_type my_obj;

where my_type is some arbitrary type. Then &my_obj is the address of
my_obj (obviously), and &my_obj + 1 points just past the end of
my_obj; it's the address of the non-existent object of type my_obj
that immediately follows my_obj in memory.

For purposes of pointer arithmetic, an object of type my_type is
treated as a one-element array of type my_type[1].

All the above is equally valid even of my_type happens to be an array
type. The object my_array (of type int[X]) is treated as a
single-element array of type int[X][1].
 
W

William Ahern

Is there something wrong with:
if (some_test_fails) {
errno = ESOMETHING;
return NULL;
}

Twice the vertical space. Though, lately I've been experimenting with

if (some_test_fails)
{ errno = ESOMETHING; return NULL; }

At least as often, however, I would jump to an error handling label. This
would be preferred if the function were complex enough:

if (0 == (p = malloc(n)))
goto sysfail;

Setting errno in the comma expression, combined with the return statement,
says something about what state I'm returning to the caller in a way that
using a multi-statement block wouldn't do so... succinctly.
? The comma operator is rarely necessary outside macro definitions.


Yes, you do. You also have to cast 0:

Yes. And so, how often do you see people casting NULL as opposed to casting
0 in such a context? Maybe it's a case of the tail wagging the dog, but you
have to appreciate the consistency at some level. Since, as I stated
earlier, I don't like to include headers just for the NULL definition, it's
not too far fetched to keep this style. A significant proportion of my
source code has no need to include headers at all.

All things being equal I would prefer to use NULL, if only because it's
common place. But, all things aren't equal.
 
T

Tomás Ó hÉilidhe

William Ahern:
I comment my include statements to describe what I'm [trying] to
import; it got really old doing:

#include <stddef.h> /* NULL */


I thought I was the only one that did that :-D It makes it a hell of a lot
easier to strip out unneeded header files, and also just to keep track of
why you included it in the first place.
 
T

Tomás Ó hÉilidhe

Tomás Ó hÉilidhe:
Anyway, just wondering what people think of the alternative. Saves me
that little rush of pissed-off-ness every time I've to write out the
tedious sizeof thing.


A few people have asked me why I just don't write:

int const *const pend = my_array + X;

The reason for this is:

1) I can't see the declaration or definition for my_array (and I don't
want to see it). It could be defined in another source file. I don't
know what macros is used for its length.

2) The macro that decides the array's length might change. For instance,
today, the code might be:

int my_array[AMOUNT_VOWELS_IN_ALPHABET];

while tomorrow it might be:

int my_array[AMOUNT_VOWELS_IN_ALPHABET + 1];

....while next week it might be:

int my_array[AMOUNT_CONSONANTS_IN_ALPHABET];


But anyway... back to my proposed alternative.


Originally, I had: (void*)(&my_array+1)

but still that was a bit of a pain. So I switched to:

*(&my_array+1)

I'd like to hear what people think in regard to there being UB when I
dereference at the end. Strictly speaking, there probably is, but
realisticly speaking, I think it's another thing to be added to the list
of "UB" that we ignore. Like for instance, here's a UB that I ignore:


union { int s; unsigned us; } x;

x.s = -27;

printf("%u",x.us); /* Let's see what the bit pattern is */


Anyone with me on this one?
 
A

Army1987

Tomás Ó hÉilidhe said:
I'd like to hear what people think in regard to there being UB when I
dereference at the end. Strictly speaking, there probably is, but
realisticly speaking, I think it's another thing to be added to the list
of "UB" that we ignore. Like for instance, here's a UB that I ignore:


union { int s; unsigned us; } x;

x.s = -27;

printf("%u",x.us); /* Let's see what the bit pattern is */
Knowing that x.s isn't going to be accessed anymore, an implementation can
optimize the assignment away. I don't think this is very common, probably
because there is code depending on type punning like that, but that would
not be pointless to do. (And, just in case, I'd prefer to keep distinct
objects and use memcpy.)
 
A

Army1987

vippstar said:
Here is an example of what i am trying to say
--
int * foo;
int bar;
int baz[N];
foo = baz + N; /* valid */
foo = &bar + 1; /* invalid */
foo = &bar; /* valid */
foo++; /* invalid */

6.5.6 Additive operators
[...]
7 For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.
 
A

Army1987

Yes. And so, how often do you see people casting NULL as opposed to casting
0 in such a context?
Yeah, I don't like unadorned 0 as a null pointer, and I prefer NULL, but I
found (type *)0 OK, and (type *)NULL as somewhat redundant.

(And I *hate* '\0' as a null pointer. Code such as ptr == '\0' always makes
me wonder whether that's a typo for *ptr == '\0'.)
 
P

pete

Tomás Ó hÉilidhe said:
Tomás Ó hÉilidhe:
Anyway, just wondering what people think of the alternative. Saves me
that little rush of pissed-off-ness every time I've to write out the
tedious sizeof thing.

A few people have asked me why I just don't write:

int const *const pend = my_array + X;

The reason for this is:

1) I can't see the declaration or definition for my_array (and I don't
want to see it). It could be defined in another source file. I don't
know what macros is used for its length.

2) The macro that decides the array's length might change. For instance,
today, the code might be:

int my_array[AMOUNT_VOWELS_IN_ALPHABET];

while tomorrow it might be:

int my_array[AMOUNT_VOWELS_IN_ALPHABET + 1];

...while next week it might be:

int my_array[AMOUNT_CONSONANTS_IN_ALPHABET];

.... or it might even be:

int my_array[] = {b,c,d,f,g,h,j,k,l,m,n,p,q,r,s,t,v,w,x,z};

Where's their 'X' now?
Hah!

I use an NMEMB() macro.

http://www.mindspring.com/~pfilandr/C/lists_and_files/string_sort.c

#define NUMBERS \
{"one","two","three","four","five",\
"six","seven","eight","nine","ten"}

#define NMEMB(A) (sizeof (A) / sizeof *(A))

char *numbers[] = NUMBERS;
char **const after = numbers + NMEMB(numbers);


http://www.mindspring.com/~pfilandr/C/lists_and_files/Lf_sort.c

#define NUMBERS \
{15,14,13,7,20,9,8,12,11,6}

#define NMEMB(A) (sizeof (A) / sizeof *(A))

long double numbers[] = NUMBERS;
long double *const after = numbers + NMEMB(numbers);
 
P

pete

Keith said:
Peter Nilsson said:
...
 int *p = my_array;
 int const *const pend = my_array + sizeof my_array/
                   sizeof*my_array;
 do *p++ = 42;
 while (pend != p);

At no stage is pend dereferenced; and the loop exits on
p == pend, so the address is not dereferenced by p
either.

Ah, I was a bit confused I guess.
However, OP _does_ dereference it. Not in that snipped,
but in 4)

4) *(&my_array+ 1) decays to the address of the first
element in the non-existant array after the current one,
which is also the "pend" address for the array that
actually exists.

Again, it is not derefenced...

The type of my_array is int[]
The type of &my_array is int (*)[]
The type of &my_array + 1 is int (*)[]
The type of *(&my_array + 1) is int []

The last expression will decay to an int * when used in the
assignment to pend. At no stage is that pointer dereferenced.

I'm not convinced (which is not to say that you're wrong).

The declaration was

int my_array[X];

so (&my_array + 1) is of type int (*)[X] (let's assume X is constant;
VLAs make my head hurt).

Then *(&my_array + 1) is of type int[X]. This is an expression of
array type, and it's the value of an array object that does not exist.
It then immediately decays to a value of type int*, pointing just past
the end of my_array. But I think that the intermediate expression
*before* the array-to-pointer conversion invokes UB.

It seems funny that we think that we have such a good idea
of what the value of (*(&my_array + 1)) *should* be.

I'm not able to come up with how it might not be UB.
But I'm still thinking about it.
 
P

pete

Army1987 said:
Yeah, I don't like unadorned 0 as a null pointer,
and I prefer NULL, but I
found (type *)0 OK, and (type *)NULL as somewhat redundant.

(And I *hate* '\0' as a null pointer.
Code such as ptr == '\0' always makes
me wonder whether that's a typo for *ptr == '\0'.)

It *should* make you wonder.
NULL makes the code easier to read.
That's a strong reason for using it.
 
K

Keith Thompson

Tomás Ó hÉilidhe said:
Tomás Ó hÉilidhe:

A few people have asked me why I just don't write:

int const *const pend = my_array + X;

The reason for this is:

1) I can't see the declaration or definition for my_array (and I don't
want to see it). It could be defined in another source file. I don't
know what macros is used for its length.

2) The macro that decides the array's length might change. For instance,
today, the code might be:
[snip]

So use a macro that gives you the length of an array object,
regardless of how it's declared, such as:

#define ARRAY_LEN(a) (sizeof (a) / sizeof *(a))

Then you can write:

int const *const pend = my_array + ARRAY_LEN(my_array);

and not have to worry about UB (as long as you don't try to
dereference pend, of course).
But anyway... back to my proposed alternative.


Originally, I had: (void*)(&my_array+1)

but still that was a bit of a pain. So I switched to:

*(&my_array+1)

I'd like to hear what people think in regard to there being UB when I
dereference at the end. Strictly speaking, there probably is, but
realisticly speaking, I think it's another thing to be added to the list
of "UB" that we ignore.

In my opinion, there's an instance of undefined behavior here, but one
that *probably* isn't likely to cause any real problems unless the
compiler goes out of its way to cause problems. But the real risk, I
think, is that an optimizing compiler could rearrange the code
*assuming* that there's no UB.

In effect, in every C translation unit you write, you're implicitly
making a promise to the compiler that there is no undefined behavior.
If you break that promise, the compiler may get its revenge -- not out
of malice, but just because it innocently assumes that there is no
undefined behavior.
Like for instance, here's a UB that I ignore:


union { int s; unsigned us; } x;

x.s = -27;

printf("%u",x.us); /* Let's see what the bit pattern is */

Here you might be on somewhat safer ground. The use of unions for
type-punning is so widespread (even though the standard doesn't
support it except perhaps for character arrays) that compiler writers
will probably go out of their way to avoid breaking it. But in theory
an optimizing compiler could observe that you store a value to x.s but
never use that value, and eliminate the assignment.

Note that you can achieve the same effect with pointer conversions:

int s = -27;
printf("s as unsigned = %u\n", *(unsigned*)&s);

or with memcpy (left as an exercise).
 
K

Keith Thompson

pete said:
Army1987 said:
It *should* make you wonder.
NULL makes the code easier to read.
That's a strong reason for using it.

I've also seen code that uses NULL rather than '\]0' for the null
character, such as:

char empty_string[1];
empty_string[0] = NULL; /* bad */

Apart from being misleading, it won't work if NULL happens to be
defined as ((void*)0).

(If you're really perverse, you can write:
empty_string[0] = EXIT_SUCCESS;
or
empty_string[0] = EOF+1;
both of which are horrendously ugly and not guaranteed to work, but
slightly more likely to work than NULL -- and more likely to fail
silently.)

This is not an argument either way about using NULL where it's
appropriate, of course.
 
K

Keith Thompson

Keith Thompson said:
I've also seen code that uses NULL rather than '\]0' for the null
character, such as:
[...]

Of course I meant NULL rather than '\0'.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top