Testing if a pointer is valid

K

Keith Thompson

Kleuskes & Moos said:
(e-mail address removed) (Richard Harter) writes: [...]
Any language in which it is possible for a pointer to be invalid but
which provides no way to test for validity is fundamentally flawed.

What existing language has pointers and is not "fundamentally flawed"?
Indeed. Testing a pointer by simple inspection can tell you whether it
is null or not. If it is null it is not valid. If not you simply can't
tell.

I actually consider NULL a valid value for a pointer, and of course
checking for NULL pointers, if called for, is not what i had in mind.

The phrase "valid pointer" is context-dependent. NULL is not valid
as the operand of the unary "*" operator; it's perfectly valid as
the argument of free().
 
K

Keith Thompson

BartC said:
The body of these functions is so trivial that one wonders why they aren't
written in-line anyway, or at least wrapped in a macro.

Writing them in-line would be more error-prone. Making them functions
lets you give each operation a name.

How would a macro be better than an inline function?

[...]
 
K

Kleuskes & Moos

Nice strawman. I make no such insistance, just as I don't insist that
all software I use be written by somebody named Fred.


So? I would be highly surprised if they did.




In general, users of libraries are unreliable. Maybe your users write
error free code and never have dangling pointers. If so, I am happy for
you.

No they aren't and neither am i. I think, however, i can assume both know
how to use a debugger. Besides, i don't usually write libraries.
The issue is, what can you do about being passed bad pointers, if
anything. The trouble I have with your post is that you present a false
dichotomy - either you accept no responsibility for handling bad input
pointers (crap passing users are fair game) or else you go into paranoid
mode. ("You" being a library author, not you personally.)

I don't see much in the middle, except replacing pointers with something
more robust, like your solution below. The only trick i sometimes employ
is crude range-checking, but that depends on know where in memory a
pointer should (or rather, should not) point and/or checking alignment if
required, but these do not end up in production code.
What you do or even can do depends upon the specific situation. We all
(I hope) understand that C pointers are inherently unsafe and there is
no general way to check for pointer validity. That said, sometimes
there are things that can be done. One can at least ask the question,
"is there anything meaningful I can do".

Not much, I'm afraid.
And sometimes "paranoid defensive programming" is right and proper.

Aye. But that's usually restricted to having proper asserts in place
and various other sanity checks. Don't get me wrong, I'm all for that,
if it's possible to use them.
As
it happens I am working with some software in which the possibility of
stale *pointers* is part of the spec. The answer is not to use raw C
pointers but rather to use "safe references" that can be and are checked
for validity.

Thus avoiding the problem. Clever. But it doesn't really address being
passed bad pointers.
Having done a bit of real-time programming myself, I quite understand.
:)


Eventually, of course, you *should* be able to tell. I notice, though,
that I know of no large software product that is know to be error free.

There's a bet on TeX, IIRC. Problem is, by the time the software has matured,
the product is obsolete.
Finding the dangling pointer after the rocket has crashed is not the
happiest of outcomes.

Nope. And unless you're talking manned missions, there are worse possible
outcomes, but the solutions suggested in this thread do not strike me as
being "the way to go".
Likewise.

-------------------------------------------------------------------------------
_________________________________
< YOU PICKED KARL MALDEN'S NOSE!! >
---------------------------------
\
\
___
{~._.~}
( Y )
()~*~()
(_)-(_)
-------------------------------------------------------------------------------
 
J

jacob navia

Le 19/09/11 19:42, Kenneth Brody a écrit :
Personally, I prefer debugging a program that crashes at the actual
point of failure.

Well, it depends where the point of failure is.

Suppose your library is expecting a pointer to "a" and is passed a
pointer to "b", a similar syructure but with slightly different layout.

1) The program crashes deep inside the library when processing the
pointer data. You have a stack trace that tells you:

0x488776AAD()
0x446655FF4()
MyFunction() myfunction.c 476
main() main.c 876

Now what?


2) When you pass a bad pointer the library opens a dialog box:
"Invalid argument to function Library_input(), argument two"

You see?

A crash isn't always a good thing.
 
J

jacob navia

Le 19/09/11 09:14, Kleuskes & Moos a écrit :
If you're writing software (and that's the only way i can think of to pass
bogus pointers to a library), you should be able to tell whether or not a
pointer is valid or not at a given point. If you're not, *something* is
seriously wrong.


But you can write in your code

LibraryFunction(handle1,pdata);

instead of the correct

LibraryFunction(pdata,handle1);

you see, because you are (like many other beings)
a HUMAN, not a machine.


And now you do not get any warning because both are
pointers to opaque types, i.e. eventually void *.

Or, you have two similarly named variables and instead of
writing h1 you wrote h2...

etc, the possibilities (as you very well know) are endless.

Shit happens...

In a debug setting, a clear error message from the library would be
very welcome. You are spared starting the debugger, driving the program
to the crash point and debugging!!!
 
A

August Karlstrom

Oberon comes to mind.

Never mind. I misread Richard's statement as something along the lines
of "any language with non-safe pointers is fundamentally flawed." The
best solution is of course if the language can prevent invalid pointers
in the first place. In Oberon the only invalid pointers are
uninitialized local pointer variables but a compiler will typically
issue a warning if such a variable is used before being initialized.


August
 
B

BartC

Keith Thompson said:
Writing them in-line would be more error-prone. Making them functions
lets you give each operation a name.

How would a macro be better than an inline function?

Looking at the original functions, they are just applying some
abstraction/hiding, so are doing exactly what macros are for.

Written as functions, they are not guaranteed to be inline. And some
overzealous maintainer might come along and decide the pointer arguments
need to be validated...
 
I

ImpalerCore

It's pretty easy to come up with examples of functions that are
trivial if you don't check the arguments and nontrivial if you
do.  An excerpt from a library of mine is below.  Each of these
functions is only one or two machine instructions when inlined.
If the functions were modified to check for null pointers and for
invalid pointers, then they would at least double in code size
(making them less suitable as inline functions) and presumably in
execution time also.

Elsewhere in this thread, Jacob Navia suggested using 64-bit
magic numbers to mark memory regions of a particular type.  That
would also increase the size of this data structure by 40% with
the C implementation that I most commonly use.  Perhaps not
fatal, but definitely significant.

    /* A node in the range set. */
    struct range_set_node
      {
        struct bt_node bt_node;             /* Binarytree node. */
        unsigned long int start;            /* Start of region. */
        unsigned long int end;              /* One past end of region. */
      };

    /* Returns the position of the first 1-bit in NODE. */
    static inline unsigned long int
    range_set_node_get_start (const struct range_set_node *node)
    {
      return node->start;
    }

    /* Returns one past the position of the last 1-bit in NODE. */
    static inline unsigned long int
    range_set_node_get_end (const struct range_set_node *node)
    {
      return node->end;
    }

    /* Returns the number of contiguous 1-bits in NODE. */
    static inline unsigned long int
    range_set_node_get_width (const struct range_set_node *node)
    {
      return node->end - node->start;
    }

The only pointer check I consider useful at the library level is
whether NULL pointers are semantically valid as arguments to function
parameters. In your functions defined above, passing a NULL pointer
in as 'node' will introduce a memory access violation. One could
consider inserting a check at the module (library) level to verify
that the pointer argument is semantically valid.

\code
/* Returns the position of the first 1-bit in NODE. */
static inline unsigned long int
range_set_node_get_start (const struct range_set_node *node)
{
check_pointer_semantics (node != NULL);
return node->start;
}
\endcode

Unfortunately, when one wants to insert checking at the library level,
they enforce behavior on the client that is not satisfactory in all
scenarios. In development and debugging, 'check_pointer_semantics'
would prefer an assert like function. In production, crashing the
application due to passing a NULL pointer argument in
'range_set_node_get_start' may not be the preferred option. It might
be nice to save the current program's progress if it's a long
algorithm, or perhaps inform the user in a nicer fashion.

The best option I can come up with is to try to abstract the outcome
of a precondition violation. The check of a precondition can be split
into three phases: detection, reporting, and response. The detection
is evaluation of the precondition expression 'node != NULL'. The
reporting step may be a display of the file and line of the error like
'Precondition violation: node != NULL, file lib_source.c, line 123'.
The response can be to abort like 'assert', or to early-return with a
error value, or invoke a global procedure to save and shutdown, or pop
a dialog with an option to report the error. The standard library
'assert' can be represented as a combination of the three phases,
where the report is to print file and line information, and the
response is to 'abort'. The advantage is that when the constraint is
enforced at the library level, every call to
'range_set_node_get_start' is validated irregardless of the knowledge
and competency of the developers calling it.

The key to acceptance is providing an architecture that allows the
client (the application developer using the module) to sculpt the
report and response to these precondition violations in a manner that
is convenient. To help satisfy these requirements, I use macros that
reference a global function table that contains function pointers to
'report' and 'response' functions.

\code snippet
static struct c_constraint_violation_ftable
gc_private_constraint_violation_ftable =
{
gc_default_report_handler,
NULL
};

void c_constraint_violation_set_report_handler( void (*report_fn)
( const char*, const char*, int ) );
void c_constraint_violation_set_response_handler( void (*response_fn)
( void ) );
\endcode

The report handler 'gc_default_report_handler' would look like the
following.

\code snippet
void gc_default_report_handler( const char* expr, const char* file,
int line )
{
fprintf( stderr, "%s, file %s, line %d\n", expr, file, line );
fflush( stderr );
}
\endcode

The response and report handlers are called from a generic constraint
violation handler.

\code
void gc_constraint_violation( const char* expr, const char* file, int
line )
{
if ( gc_private_constraint_violation_ftable.report ) {
(*gc_private_constraint_violation_ftable.report)( expr, file,
line );
}
if ( gc_private_constraint_violation_ftable.response ) {
(*gc_private_constraint_violation_ftable.response)();
}
}
\endcode

Finally, I have a macros to evaluate the constraint and add any
additional response that cannot be encapsulated in the 'response'
function pointer.

\code snippet
#define c_return_value_if_fail( expr, val ) \
do \
{ \
if ( expr ) {} \
else \
{ \
g_constraint_violation( "Constraint violation: " #expr, \
__FILE__, __LINE__ ); \
return (val); \
} \
} while (0)
\endcode

There is also a 'c_return_if_fail' that replace the 'return (val);'
with 'return;' statement, and a c_assert_if_fail that doesn't have any
return so it acts as a pass through for cases when an 'early-exit'
response isn't required (an example would be putting a constraint that
informs the user when strlcat truncates its result, as a truncation is
likely a serious error, but not fatal to the program since truncation
is defined as valid semantics for the function).

Just as 'assert' is disabled by NDEBUG, one can define a preprocessor
symbol to disable evaluation of these macros at library level, or at
source level with a granularity that depends on how many source files
one uses to implement a module. If 'range_set_node_get_start',
'range_set_node_get_end', and 'range_set_node_get_width' are all
defined in different source .c files, one can use the build system to
control which functions get compiled with or without constraint
checking (in my case, by defining C_NO_CONSTRAINTS). It's even
possible to do it at run-time if the 'report' or 'response' function
pointer check a global status value.

The result is that more of the library's assumptions can be enforced
within the framework of a single architecture with more flexibility
offered that by using 'assert' or manual defensive programming
techniques. If one wants to use this system in a design-by-contract
style, simply wire the 'response' function pointer to 'abort', which
ensures that a 'c_return_value_if_fail' will stop in its tracks. Or
one can place a breakpoint in an empty response function pointer to
stop at each constraint violation while still maintaining the
defensive style of 'c_return_value_if_fail' for production
environments.

For example, here is my version of strlcat that I use, that validates
that the source and destination strings reference memory, and warns if
the truncation occurs.

size_t c_strlcat( char* dst, const char* src, size_t dst_size )
{
size_t dst_length;
char* d = dst;
const char* s = src;
size_t n = dst_size;
size_t dlen;

c_return_value_if_fail( dst != NULL, 0 )
c_return_value_if_fail( src != NULL, 0 );

/* Find the end of dst and adjust bytes left but don't go past end.
*/
while ( *d != '\0' && n-- != 0 ) {
++d;
}
dlen = d - dst;
n = dst_size - dlen;

if ( n == 0 ) {
return dlen + strlen( s );
}

while ( *s != '\0' )
{
if ( n != 1 )
{
*d++ = *s;
--n;
}
++s;
}
*d = '\0';

/* count does not include NUL character */
dst_length = dlen + (s - src);

c_assert_if_fail( dst_length < dst_size );

return dst_length;
}
\endcode

If I get a constraint violation, I get a message like the following.

\result
Constraint violation: dst_length < dst_size, file strops.c, line 326
\end result

If one wanted to annotate these constraints, I simply add a '&&
"constraint annotation"' to the expression.

\code snippet
c_assert_if_fail( dst_length < dst_size && "buffer truncation" );
\endcode

The point is that it's possible to create an architecture that
enforces many of the module's constraints in a *consistent* manner
that is useful (since as the library writer, you're the one with the
best knowledge and the most control). Even if it doesn't satisfy all
the people, all the time, one can still write wrappers around these
functions and hard-define C_NO_CONSTRAINTS to reduce the error
checking to nothing which allows people the freedom to completely
customize their error handling behavior absent of any constraint
checking the library provides. In the ideal case, it should reduce
the support needed to help the general user-base interface with the
module.

So for your functions, I would have no problem writing them like the
following (the value of '0' in the examples could be a named constant
if more applicable).

\code snippet
/* A node in the range set. */
struct range_set_node
{
struct bt_node bt_node; /* Binary tree node. */
unsigned long int start; /* Start of region. */
unsigned long int end; /* One past end of region. */
};

/* Returns the position of the first 1-bit in NODE. */
static inline unsigned long int
range_set_node_get_start (const struct range_set_node *node)
{
c_return_value_if_fail( node != NULL, 0 );
return node->start;
}

/* Returns one past the position of the last 1-bit in NODE. */
static inline unsigned long int
range_set_node_get_end (const struct range_set_node *node)
{
c_return_value_if_fail( node != NULL, 0 );
return node->end;
}

/* Returns the number of contiguous 1-bits in NODE. */
static inline unsigned long int
range_set_node_get_width (const struct range_set_node *node)
{
c_return_value_if_fail( node != NULL, 0 );
return node->end - node->start;
}
\endcode

Even though there may be some conflict where the value '0' is a valid
semantic value, the report portion of a constraint violation is often
enough to trigger the developer to quickly fix the error and move on,
or at least recognize that something wrong may be going on and the
result may be invalid. It's more informative to the developer and the
user than just crashing the program. And if one wants blistering
speed, define C_NO_CONSTRAINTS to disable constraint checking and
throw caution to the wind. It's also better than pure defensive
programming, especially scenarios where an error return value shares
the same domain as valid return values. For example, one can return 0
as a result of a NULL pointer constraint violation detected in
strlcat, but it doesn't distinguish it from a valid value of 0 when
copying an empty string "".

It's important to recognize that these macros often can just verify a
partial view of a module's preconditions. That's why it's important
to document explicit preconditions (depending on the formal-ness of
the code) as well as any constraints one can verify in code. Even if
one cannot validate all properties of the preconditions of a module,
checking a subset of those conditions can still be very powerful.

Most of the time, I place constraint checking on all arguments when
applicable to public module functions, but not on private functions.
But, the more complicated a private function is, the more likely I
will add constraints to it (typically of the 'c_assert_if_fail'
variety).

I've been "simulated annealing" on this for quite a while, but I
recognize that there is no one best solution. And I'm sure there are
rocks that one could throw at this setup (like the lack of __func__ to
allow function names; it's still in the annealing phase). All I can
say is that from my personal experience, this scheme's value exceeds
that of 'assert' and the GLib g_return[_val]_if_fail macros. It
allows one to switch quite easily from the defensive programming style
to the contract programming style with a minimal change in its
'response' function pointer. It's as close to the best of both worlds
that I have come up with.

In Navia's case, for any pointer validation beyond semantic checking
for NULL, I would highly recommend not placing it within any library
API other than an allocator that centers on creating "heavy
pointers". If one wants to decorate pointers to dynamically allocated
memory blocks with its allocated size to track how much allocated
memory the application currently uses, or fence checking to place
magic numbers to try to detect memory corruption, determine ALIGN_MAX
to properly align and stack the decorations on the pointer and use
custom allocator routines. Or rely on a memory debugger framework
like valgrind, dmalloc, ... life's too short to reinvent this wheel.

Best regards,
John D.
 
I

ImpalerCore

Writing them in-line would be more error-prone.  Making them functions
lets you give each operation a name.

Right, encapsulating the operation as a function assigns a semantic
name to an underlying operation. It is a textbook case of information
hiding. The value of the routine is that if the internal structure
name changes for any reason, the name change only needs to be done to
the access routines, not everywhere in the code. It can provide more
readability in the general case where you may find 'struct->what->does-
this->mean'. The compiler is typically clever enough in most cases
to optimize the function call away.
How would a macro be better than an inline function?

[...]

I find a macro to be more beneficial when one wants compiler
granularity to enable/disable the function. For access routines to a
data structure, definitely not.

Best regards,
John D.
 
I

Ian Collins

Le 19/09/11 09:14, Kleuskes& Moos a écrit :

But you can write in your code

LibraryFunction(handle1,pdata);

instead of the correct

LibraryFunction(pdata,handle1);

you see, because you are (like many other beings)
a HUMAN, not a machine.


And now you do not get any warning because both are
pointers to opaque types, i.e. eventually void *.

Or, you have two similarly named variables and instead of
writing h1 you wrote h2...

etc, the possibilities (as you very well know) are endless.

Shit happens...

In a debug setting, a clear error message from the library would be
very welcome. You are spared starting the debugger, driving the program
to the crash point and debugging!!!

But the examples you used are precisely the common case where run time
pointer validity checking achieves nothing: both pointers are valid.
 
I

Ian Collins

Le 19/09/11 19:42, Kenneth Brody a écrit :

Well, it depends where the point of failure is.

Suppose your library is expecting a pointer to "a" and is passed a
pointer to "b", a similar syructure but with slightly different layout.

How do your checking function help in this case? They can't.
1) The program crashes deep inside the library when processing the
pointer data. You have a stack trace that tells you:

0x488776AAD()
0x446655FF4()
MyFunction() myfunction.c 476
main() main.c 876

Now what?


2) When you pass a bad pointer the library opens a dialog box:
"Invalid argument to function Library_input(), argument two"

You see?

Again, your example uses an incorrect, but otherwise good, pointer.
 
B

Ben Pfaff

BartC said:
Looking at the original functions, they are just applying some
abstraction/hiding, so are doing exactly what macros are for.

I would not describe abstraction or hiding as one of the main
reasons for macros, so I'd not use macros in this case.

I don't think that the "get_width" function could be portably
defined as a macro without expanding the node expression twice.

One of the reasons that I defined these as functions is because I
wasn't sure at first that I was going to use this
representation. For example, I could have used start and width
instead of start and end, or I could have used a different
representation that only defined start and end implicitly.
 
J

jacob navia

Le 19/09/11 22:09, Ian Collins a écrit :
Again, your example uses an incorrect, but otherwise good, pointer.

Ian, the context is the proposed solution with
a 64 bit magic number. You read 8 bytes and test for your
magic number. If you crash or the magic number is not there
then it is a bad pointer you see?

I proposed that a few messages above, maybe you missed it.

jacob
 
I

Ike Naar

Ian, the context is the proposed solution with
a 64 bit magic number. You read 8 bytes and test for your
magic number. If you crash or the magic number is not there
then it is a bad pointer you see?

But if it does not crash and the magic number is there, you
still cannot be sure that the pointer was good.
 
J

jacob navia

Le 19/09/11 22:50, Ike Naar a écrit :
But if it does not crash and the magic number is there, you
still cannot be sure that the pointer was good.


Well, the chances of hitting a 64 bit number by chance for a well chosen
magic number ar 1 in 2^64...

Quite low really

:)
 
I

ImpalerCore

Thanks for your very informative message John

Excellent.

No problem. I've been chewing on this bone for quite a while. I'm
glad it sounds somewhat reasonable to at least one person :)

John D.
 
I

Ian Collins

Le 19/09/11 22:09, Ian Collins a écrit :

Ian, the context is the proposed solution with
a 64 bit magic number. You read 8 bytes and test for your
magic number. If you crash or the magic number is not there
then it is a bad pointer you see?

I proposed that a few messages above, maybe you missed it.

It was on a different branch.

Even that scheme will still miss valid, but wrong (pointing to wrong
objects of same type) pointers.

That's probably why most code simply checks for null and then proceeds.
Further checking tends to take you down a very steep slope of
diminishing returns.
 
N

Nobody

If you don't mind I would prefer not to use libraries you write.

I guess you don't use anyone's libraries then. Because, other than
checking for NULL pointers, I've never encountered a library which
attempts to validate the pointers which it is passed.
 
B

Ben Bacarisse

jacob navia said:
Le 19/09/11 22:50, Ike Naar a écrit :


Well, the chances of hitting a 64 bit number by chance for a well
chosen magic number ar 1 in 2^64...

I doubt that very much. Computers are predictable beasts, and you will
have these numbers dotted around in memory. They could get copied by
buggy code or they may simply be left lying around in deceptive places:

mytype *f(void)
{
mytype t;
/* ... */
return &t;
}

A function that uses the return from f may well find the magic number
still in place.

I am not saying the technique has no merit (I happen to have been
looking at some code that does exactly this a couple of days ago) but
the probability of false positives is very unlikely to be as low as you
hope.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,066
Latest member
VytoKetoReviews

Latest Threads

Top