"null Considered Harmful"

Discussion in 'C Programming' started by Lynn McGuire, Dec 11, 2013.

  1. You still have to take them into account though. You can't avoid the
    fact that malloc's parameter is a size_t so it may be better to include
    a test for size < 0.
     
    Ben Bacarisse, Dec 12, 2013
    #21
    1. Advertisements

  2. One advantage of a signed type is that, in situation of being fed random
    garbage values, you'll hit a negative which can be flagged as an error,
    pretty quickly.
     
    Malcolm McLean, Dec 12, 2013
    #22
    1. Advertisements

  3. Lynn McGuire

    BGB Guest

    people also often regularly rant about for loops, variable assignment,
    ability to call stuff recursively, ...


    some stuff I have done in my own code is also seen as "pure evil" by
    some people (some parts of infrastructure are built on ugly hacks with
    pointers and function pointers), ...

    but, if all this stuff were taken away, you would be left with languages
    which can't really do much of anything.


    though, some things I wouldn't mind as much, like having a
    general-purpose bounds checking mechanism, mostly as array-overruns can
    often be difficult to track down and result in far reaching bugs.

    MSVC now (added fairly recently) does provide a special case
    bounds-checking mechanism, but I am annoyed at how stupid it can be,
    like it is either fully on or fully off, apparently not sufficiently
    smart as to figure out that indexing an array with a constant index will
    either always pass or always fail, and so does not need run-time bounds
    checks.

    luckily, it is easy to side-step: use a pointer to an array, no bounds
    checking:
    int foo[32];
    int *pfoo;

    foo[8]=3; //uses bounds checking at runtime...
    pfoo=foo;
    pfoo[8]=3; //no bounds checking...

    in some ways, a totally amazing piece of engineering there...
     
    BGB, Dec 12, 2013
    #23
  4. П> I think better solution is to use exceptions. You can't ignore
    П> an exception, whilst return code is very easy to ignore.

    Exceptions add overhead to the runtime, which makes them only a better
    solution in situations where that overhead is acceptable.

    This is why I'm glad they exist in Java and Perl and Javascript, and
    glad they don't exist in C.

    Charlton
     
    Charlton Wilbur, Dec 12, 2013
    #24
  5. Lynn McGuire

    BGB Guest

    hmm...

    void *newb_malloc(int size)
    {
    void *ptr;

    if(size == 0)
    { size = 1; }
    else if(size<0)
    {
    newb_logandtrap("Can't allocate negative sizes\n");
    }else if(size>(1<<24))
    {
    newb_logandtrap("Not allowed to allocate overly large memory\n");
    }else if(size>(1<<22))
    {
    newb_logandtrap("Not good to allocate large memory\n");
    }else ...

    ptr=malloc(size);
    if(!ptr)
    {
    newb_logandtrap("Out of memory\n");
    }
    return(ptr);
    }


    where logandtrap could do something like record information about the
    event into a log file and trap to the debugger or similar (such as via
    raising an exception or similar).

    maybe the library could also be set up such that the default handler
    displays an error box:
    "Something Happened"
    Abort, Retry, Ignore, Fail.
    or:
    OK, Cancel.


    better yet if a person could get a little animation and some annoying
    sound effects for their box, or make it rapidly shake in place, ...
    because, after all, it apparently works so well for website ads.
     
    BGB, Dec 12, 2013
    #25
  6. Lynn McGuire

    James Kuyper Guest

    I don't know anything about the details - but it is commonly claimed in
    C++ forums, that there are well-known techniques, in widespread common
    usage, for implementing C++ exceptions so that they incur no runtime
    overhead - the cost of using those techniques is that actually throwing
    an exception is slower than it would be if using the more naive
    implementations that do incur runtime overhead.
     
    James Kuyper, Dec 12, 2013
    #26
  7. Lynn McGuire

    Les Cargill Guest


    Why is checking all the return codes such an unreasonable burden
    anyway?
     
    Les Cargill, Dec 12, 2013
    #27
  8. Lynn McGuire

    Les Cargill Guest

    Well... that doesn't matter to "standards" enforcers who embrace a
    Mandarin mentality.
    I have never understood how this is such a burden. Just either
    globalize any such constraints or pass them as parameters.
    That sounds like a good thing.
    Doctor, doctor it hurts when I do that....
     
    Les Cargill, Dec 12, 2013
    #28
  9. Lynn McGuire

    James Kuyper Guest

    I know what you mean. We have coding guidelines that call for checking
    all error return codes, and I'm a believer in that idea, but it often
    ends up creating code that looks like it's main purpose is the creation
    of error messages. I once has someone submit a test driver for a routine
    which triggered every single error condition, but which failed to check
    whether the routine handled a successful call correctly.
     
    James Kuyper, Dec 12, 2013
    #29
  10. Lynn McGuire

    BGB Guest

    the issue is that lots of code either does not check bounds, or stuff
    goes out of bounds due to "simple coding errors", like accidentally
    allocating either the wrong amount of memory, or calculating the array
    index wrong.


    to some extent it is possible to use a "tripwire detector" or similar
    built into the memory manager, which can detect when code has ran off
    the end of an array (either when freeing memory, or periodically based
    on a timer or similar and doing a scan over the heap to make sure
    everything is still in-order), generally because memory objects may have
    "tripwires" before and after them, and in most cases, an overrun will
    mangle them, and when detected it is possible to analyze them and write
    the results out to a log or similar (the memory manager also keeps track
    of things like where memory allocation and free calls come from, and so
    can basically say where the offending memory-object is from, ...).


    but, it isn't really possible to catch the code which does so in the act
    (like with an explicit trap), and sometimes there might still be memory
    corruptions due to code which seemingly escapes the tripwire detector
    (writes to addresses not directly adjacent to the memory object in
    question).

    likewise, writes to random (garbage) addresses (such as from
    accidentally using uninitialized variables) may often not trap if the
    application uses most of the available address space (it only reliably
    traps if the address points at an address range owned by the OS).


    a few times I had thought it might be nice if, when an uninitialized
    variable is used by code, besides just generating a warning or similar,
    if it could also set the variable initially to a "known bad" address.

    like, to better deal with cases where the programmer has not gone and
    faithfully fixed up every "use of uninitialized variable 'foo'" or
    "'foo' must return a value" style warning.

    well, though in general I try to go and clean them up, but this is not
    always reliably done.

    bounds-checking, in general, yes.

    not being smart enough to optimize away some constant-index checks, not
    always so good.

    it can matter for code which tends to be CPU bound...

    a particular case where it came up was within a video codec, where these
    constant-index array accesses ate lots of clock cycles.

    code within video codecs tends to be fairly performance-sensitive, as
    wasting too many cycles in a codec can very easily lead to performance
    problems... (more so when a person is trying to squeeze speeds of like
    300+ megapixels per second out of the thing, like for multiple
    concurrent streams or for 4K+ or similar...).


    variable indexes could theoretically still be checked, as then it is
    doing its job, rather than just being stupid and wasting clock cycles.

    in a few cases this has led mostly to just using lots of variables
    instead of arrays, like:
    int i0, i1, i2, i3, ...;
    i8=3;

    but, then gets kind of annoying as the compiler's optimizer sometimes
    goes and rewrites things in stupid ways.

    like sometimes, one will store something to a variable, and the
    optimizer is like "hey, I will go inline/repeat this expression every
    time that variable is used", which is sometimes not good:
    say, the programmer had a specific reason for factoring out the
    expression into its own variable (say, because calculating something
    once and using a variable can often be cheaper than recalculating it a
    bunch of times in a row, ...).


    other times, it is like, one writes:
    memcpy(dst, src, 8);
    and the compiler is like, "I will helpfully rewrite this to":
    mov edi, [...]
    mov esi, [...]
    mov ecx, 8
    rep movsb
    (*)

    and the programmer is left having to manually intervene, as it is
    generally preferable to have the compiler generate something like:
    mov ecx, [...]
    mov edx, [...]
    mov [...], ecx
    mov [...], edx
    or, better yet:
    movq xmm0, [...]
    movq [...], xmm0


    *: actually, "rep movsb" and friends can go "surprisingly fast" for
    larger memory copies (several KiB or more), where they can outperform
    more generic loops, but this doesn't mean it is the best choice for
    small fixed-size copies.


    so, sometimes, one is left to wonder if MSVC's code-generator is
    brain-damaged, or contains a "cleverness seemingly beyond the
    understanding of mortal men".
     
    BGB, Dec 12, 2013
    #30
  11. LC> Why is checking all the return codes such an unreasonable burden
    LC> anyway?

    It isn't. But if the function in question has 10 lines of logic, and
    200 lines of checking for all the possible error conditions, your code
    will be unreadable.

    Exceptions means that you can write the 10 lines of logic clearly,
    provide a way to handle the exceptions that you can handle, and ignore
    all the ones you can't, for them to be caught at a higher level.

    There are other virtues besides performance; simple, readable code that
    focuses the maintainer's attention on the common expected cases, in
    this day where my *phone* has a dual-core 64 bit processor, is likely to
    save more money than code that shaves 3 cycles off each loop iteration.

    Charlton
     
    Charlton Wilbur, Dec 12, 2013
    #31
  12. Lynn McGuire

    Les Cargill Guest

    I can sympathize somewhat, but it's also possible to do the error
    checking in a called function which hides it all.
     
    Les Cargill, Dec 13, 2013
    #32
  13. Lynn McGuire

    Les Cargill Guest

    So I have habits that *preclude* that sort of thing. It's too detailed,
    but enforce your constraints with the furniture provided by the language
    system.
    *Shudder*. -Wall.
    Ah, the 0xDEADBEEF thing...
    Agreed. Although you really should fix those before it leaves
    your hands.
    I don't know why not.
    That's probably true.

    <snip>
     
    Les Cargill, Dec 13, 2013
    #33
  14. Lynn McGuire

    Eric Sosman Guest

    In my experience, that would be the exception (sorry) and
    not the rule.

    The hard part about dealing with a failure is not checking
    a result code, but figuring out what to do when the result says
    R-O-N-G. The decision about how to respond typically involves
    a large amount of the caller's context -- in fact, a large amount
    of the caller's caller's caller's caller's context (which is why
    strategies like "call abort() on any failure" are stupid, and
    "call a caller-supplied callback, passing it not enough information"
    are even stupider).

    Checking for errors is usually[*] pretty simple:

    char *p = malloc(n);
    if (p == NULL) { ... }

    It's what goes in the "..." that's difficult.

    [*] But not always. Can you spot the gaffe in

    FILE *stream = ...;
    int ch;
    while ((ch = getc(stream)) != EOF ) {
    ...
    }

    ? Hint: The GNU coding standards forbid you from caring about
    it, even if you do spot it.
     
    Eric Sosman, Dec 13, 2013
    #34
  15. Lynn McGuire

    Les Cargill Guest

    Har! :)

    I'd rather see code that handles all the errors cases properly
    and clearly and up front than try to mix error evaluation
    and "real work".

    I've worked on a lot of systems where you do *all* the error checking up
    front, then a "commit" at the end. You have to manage partial updates,
    that sort of thing.

    If the "commit" fails, that's a BIG error, and we either crash or
    get adult help, or erase as much of the "transaction" as we can. That's
    usually a "somebody unplugged is in the 100 microseconds since we did
    all the error checking" thing.

    I also save off the previous values of things we changed and keep
    a list of things we added, in order to restore to the previous state.

    These are habits arrived at from old '90s CASE tools and
    writing things like SNMP agents.
    Fair enough. See below (1) for my high-torque tool of choice for
    this.
    Ha! That old warhorse. :) You have to break it up or at least call
    feof() *AND* check ch. If GNU does not like "while (1)" with
    "break;"s, then they are missing a good thing. The preference for
    using only predicate trees troubles me - I think it's a good
    way to write unreadable code.

    I haven't used getc() or the like in a very long time. My dominant mode
    is to use fread() or read() and unmarshal from a buffer. It's a bit more
    code, but the library functions are just a little too weird. And I try
    to never use a blocking or buffered channel if I can avoid it.

    (1)
    Past a certain level of complexity ( say, an if() {} else if () else {}
    tree about six long ), I will go to a state machine. I work very hard
    at *not* using nested ifs if I can. I prefer "cascaded" ones.

    that at least puts all the invariants in a list-like structure for you,
    and allows you to manage time-order dependencies between invariants.

    In a way, this makes it worse ( it's less like natural language ) but
    it's habit now. It's a bit of a trade towards being correct the first
    time rather than having nicer, more friendly code later ( there are
    soldier words when I have to read them again, too ) but by my lights,
    it's usually worth it.

    Also also - for *scripting* languages with an eval(), having "arrays
    of text" indexed by strings ( or numbers ) is a heck of a great way to
    write state machines.
     
    Les Cargill, Dec 13, 2013
    #35
  16. Lynn McGuire

    BGB Guest

    tends to happen sometimes.

    the main issue mostly is the hassle of tracking down these sorts of
    issues in a largish codebase.

    "-Wall", specifically, is a GCC feature.

    there is "/Wall /WX" though, which basically can do similar to "-Wall
    -Werror" though.

    0xDEADBEEF and 0xBAADF00D and similar are useful, but pop up in fairly
    specific cases (malloc, ...).

    more often local variables just contain whatever was previously in this
    location on the stack, which isn't as useful.

    effort mostly, sometimes when going and throwing something together or
    working on something else, not all warnings end being up addressed
    immediately, though every so often, an effort is made to go clean up all
    the warnings within a given library or similar.

    yeah.


    I am probably one of those strange people, who primarily develops things
    like codecs mostly as plain C code (mostly ASM-free) without relying on
    multithreading or GPGPU stuff.

    but, granted, it also requires a bit of careful coding (and a little bit
    of "magic") to get good performance out of these sorts of things.


    most of my DCT-style codecs seems to cap out at around just short of 100
    Mpix/sec (for comparison, XviD reaches 105 Mpix/sec on my PC with it
    having a lot of ASM thrown in the mix).

    I can reach 300-500 Mpix/sec, but generally this is with VQ-style codecs
    targeting special image representations (such as DXT or BC7) rather than
    generic RGBA (*).

    *: it is harder than it may seem to get pixel arrays filled all that
    quickly. a direct comparison between XviD and a VQ codec is cheating
    though IMO, and XviD seems pretty hard to beat as far as the speed of
    DCT-based codecs goes AFAICT.

    FWIW, 62 Mpix/sec is sufficient for 1080p30 (1080p60 needs 125 though,
    and 2160p30 needs 249).


    but, faster is better, and leaves more time for the CPU to do other things.

    likewise, VQ allows higher speeds generally at the cost of worse
    compression (the goal being to get high speeds with at least passable
    compression).
     
    BGB, Dec 13, 2013
    #36
  17. Basically I agree with you.
    But oddly enough, both very high end and very low end hardware tend to have
    precious cycles. On very low end hardware, obviously, the device might not
    really be powerful enough to do what you want. On very high end hardware,
    the reason for using the hardware at all is usually that you have a "heroic"
    computation to do. You generally have to book very limited processor time,so
    shaving cycles is a great advantage.
     
    Malcolm McLean, Dec 13, 2013
    #37
  18. Lynn McGuire

    BGB Guest

    usually, in my case, it amounts to one of:
    critical problem, blow up and trap to debugger;
    non-critical problem: maybe log the event, try to handle gracefully
    (usually means passing an error status back to the caller and otherwise
    having the call behaving as a no-op).

    eventually, either it will reach a point where the error is either
    ignorable, becomes part of the normal program logic, or can otherwise be
    dealt with as appropriate.


    OTOH, simply exiting on error is usually stupid, as then one ends up
    with apps which just dump the user back out on the desktop or similar
    for no obvious reason, which is actually one of the few options actually
    worse than a "This program has performed an illegal operation and must
    be shut down." box, as at least in the case of the box, you know that it
    was because of a program crash...

     
    BGB, Dec 13, 2013
    #38
  19. Lynn McGuire

    BartC Guest

    Yes, what about script languages; you can write, for example:

    a = b+c

    which, where b and c are large data structures, might involve allocating
    memory. But what happens when the allocation fails? How do you even test for
    it? These languages might offer exceptions, but do you really want to test
    every single operation and assignment in hundreds of lines of code?

    Implemented in C, however, the suggestion seems to be that you *have* to do
    just that: have checks scattered all over the actual code, rather than have
    one check in a wrapper around malloc() for example. Or to have such checks
    separated out and hidden away from the 'user' code.

    (Since, as was mentioned, the hard bit is not checking these things,
    although that will destroy the structure and readability of any code, as
    well as introducing more scope for errors; it's what on earth do you do when
    there is an error.)
     
    BartC, Dec 13, 2013
    #39
  20. Lynn McGuire

    BartC Guest

    If we're talking about memory allocation failures, then it will usually be
    for one of two reasons: there's either a bug in the software that makes it
    use up all the memory, or (more rarely) the machine has actually run out of
    memory because the program needs too much of it. (Or, sometimes, it might be
    other programs that are hogging most of the memory).

    In any case, usually there's not much that can be done other than report the
    fact and then quit, if this was a routine allocation that was expected to
    work. It's too bad if this has happened to a customer, then you need to
    establish the reason.

    In some cases an out-of-memory condition (ie. an attempt to allocate memory,
    that failed) *can* be handled by the software, but the code will know about
    it and will use a different allocator that doesn't just quit. So in an
    interactive app, for example, it can report that a particular operation
    failed, but then carries on and the user can try something else.
     
    BartC, Dec 13, 2013
    #40
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.