Non-constant constant strings

Discussion in 'C Programming' started by Rick C. Hodgin, Jan 19, 2014.

  1. Yeah. Don't use it for that. But, if you're writing an algorithm that
    computes some way on some data, then you can do immediate testing on that
    algorithm, which is what I'm talking about.

    Best regards,
    Rick C. Hodgin
     
    Rick C. Hodgin, Feb 5, 2014
    1. Advertisements

  2. Rick C. Hodgin

    David Brown Guest

    I believe various C-like interpreters have been made over the years, but
    none have been very popular - basically, if you first decide to use an
    interpreted model (and thus get easy equivalents to
    "edit-and-continue"), there are better languages for the job. C or
    C-like interpreters would only be useful for specifically checking
    something you write in C. And in such cases, you can often transplant
    the code into a minimal test program, taking a fraction of a second to
    compile and run.

    But I agree with you that features such as bounds checking could often
    be useful in debugging C code. There are quite a lot of related tools,
    such as "valgrind", instrumentation functions, libraries with extra
    checks, etc. But there is always scope for more. For example, gcc has
    a "-fbounds-check" option, but it is only supported for Java and Fortran
    (Ada has it automatically, I think). Supporting it for C (or C++, where
    exceptions give an obvious choice of how to react to a bounds error)
    would be helpful for some types of debugging.
    Yes, because some bugs are only noticeable when you are optimising. For
    example, if you've got code that has aliasing issues, it might run as
    you wanted when optimising is disabled, but fail when the compiler takes
    advantage of type-based aliasing analysis to produce faster code. It is
    for this sort of reason that I normally use a single set of options for
    my compiling - with fairly strong optimisation as well as debug
    information - rather than having separate "debug" and "release"
    configurations. Occasionally I need to reduce optimisation while
    identifying weird bugs, but mostly it is constant throughout development.
    I sometimes do things like manually manipulate the program counter from
    within the debugger, to get this effect. Most of my work is embedded
    systems, and in some systems the "compile, download and run" cycle is
    long (the compile is typically quick, but downloading or reprogramming
    flash can be slow on some microcontrollers). edit-and-continue might be
    useful in such systems - unfortunately, it would be completely
    impractical to implement.
    Fair enough point.
     
    David Brown, Feb 6, 2014
    1. Advertisements

  3. Rick C. Hodgin

    BartC Guest

    That's not surprising, as I'm not sure whether 'bounds' checking could be
    even be meaningful in C.

    With statically allocated arrays, it seems straightforward - until you pass
    the array to a function. Then, and in all other cases of indexing, you only
    have a pointer.

    Then you need a lot of things going on behind the scenes in order to somehow
    have pointers carry bounds with them. But it is not always that obvious what
    the bounds are supposed to be, if there are any at all (for pointers to
    individual elements for example). The flexibility to be able to do anything
    with pointers makes it harder to apply bound checking, except perhaps for
    hard memory limits.
    It would definitely help, and be better than a crash, or (worse) no crash,
    as you quietly read or overwrite something else.
    Is there any option on these to make use of external ram to run the program
    from? (I started off on this sort of thing, at a time when a reprogramming
    meant an eprom erase/program cycle, or if there was ram, then a formal
    compile might have meant several minutes of floppy disks grinding away. By
    using ram, and my own tools and hardware setup (we couldn't afford ICE
    anyway), the development cycle was just as fast as I could type.)
     
    BartC, Feb 6, 2014
  4. Rick C. Hodgin

    David Brown Guest

    Indeed - bounds checking would only be meaningful in some situations.
    (Newer versions of gcc do compile-time checking for particularly obvious
    cases.)

    The most practical "bounds checking in C" is probably to use C++
    container classes.
    Special debugging libraries and tools can help (I believe they use
    memory mapping tricks, such as making the space beyond a malloc'ed area
    trigger processor exceptions).
    That depends entirely on the microcontroller. For some, such as the
    AVR, you cannot execute code from ram. On others, such as ARM Cortex-M
    devices (which are extremely popular these days), you can run fine from
    ram. But your ram is usually much smaller than your flash, and there
    can also be significant timing differences, so it is not always easy.
    When I can, I do most of the development on a bigger device with more
    ram, and run the program from ram - only at later stages do I use the
    final target chip and flash. Putting the program in ram also usually
    means you can be flexible about breakpoints - with code in flash, you
    are limited to the on-chip debugger's "hard" breakpoints.
     
    David Brown, Feb 6, 2014
  5. Rick C. Hodgin

    David Brown Guest

    The standard defines bounds for a pointer, but I feel there is no way to
    enforce it completely (even if the pointer carries around a "min" and
    "max" bound). For example, a pointer can be converted to an
    appropriately sized integer type, then converted back again (someone
    will no doubt quote chapter and verse if I get this wrong). As long as
    you don't change that integer, the result is guaranteed correct behaviour.

    Maybe I'm wrong, and the "fat pointer" could cover all cases - but as
    you say, it would be quite inefficient for most uses.
     
    David Brown, Feb 6, 2014
  6. Rick C. Hodgin

    ais523 Guest

    You could encode the bounds of the fat pointer in the integer, too. It'd
    have to be a pretty large integer type, but there's no reson why an
    implementation can't have a really large intptr_t just to be able to
    hold a fat pointer. (Also, IIRC it's possible to have an implementation
    with no intptr_t, but that would be less useful than an implemntation
    that did have one.)
     
    ais523, Feb 6, 2014
  7. Rick C. Hodgin

    David Brown Guest

    I suppose that would work, unless there are restrictions on intptr_t
    such as requiring it to be synonymous with an existing integer type (to
    store a fat pointer, it would really need to be a "long long long" !).
     
    David Brown, Feb 6, 2014
  8. Rick C. Hodgin

    BartC Guest

    If I have a four-element array A, how does it know whether &A[1] is:

    - a pointer into A[0] ... A[3] which is the whole original array

    - a pointer into the slice A[1] ... A[3], a slice of the original (I can
    pass any slice to a function, complete with the length I want, but how will
    it know my intentions)

    - a pointer only to the element A[1]

    With bounds errors occurring when I attempt arithmetic on the pointer (or
    maybe when I try to dereference the result).

    In languages where arrays and pointers are distinct, and where arrays might
    have stored bounds anyway, it's a lot easier. It can also be more efficient
    (compare a potential index against a limit) compared with C where you might
    have an lower and upper limit, when a pointer points into the middle of an
    array.
     
    BartC, Feb 6, 2014
  9. Rick C. Hodgin

    Tim Rentsch Guest

    As a point of information, gets() was listed as both deprecated
    and obsolescent in n1256.
     
    Tim Rentsch, Feb 6, 2014
  10. Rick C. Hodgin

    James Kuyper Guest

    On 02/06/2014 11:04 AM, David Brown wrote:
    ....
    The only requirements on intptr_t are that, if an implementation
    chooses to provide them (which it need not), they must be sufficiently
    large [un]signed integer types. They can be either standard or extended
    integer types. I expect they would usually be implemented as typedefs
    for types that can be named by other means, but that's not actually
    required - for instance, <stdint.h> could contain a #pragma that turns
    on recognition of those names as built-in types just like "int" or "double".

    Of course, it wouldn't be an obstacle even if intptr_t were required
    to be typedefs for standard types. The standard imposes only a very low
    minimum on the amount of memory an implementation is required to
    support, and no upper limit on the size of [unsigned] long long, so an
    implementation could always choose to make [unsigned] long long large
    enough to store a fat pointer.
     
    James Kuyper, Feb 6, 2014
  11. (snip on bounds checking in C)
    In the 80286 and OS/2 1.0 days (when most people were running MS-DOS
    on their 80286 machines) I was debugging a program that used lots of
    arrays, and had a tendency to go outside them.

    Instead of malloc(), I called the OS/2 segment allocator to allocate
    a segment of the requested length, put a zero offset onto it, and used
    it as a C pointer. This is in large memory model where pointers have
    a segment selector and offset. The segment selector is an index into
    a table managed by OS/2 for allocates segments. When you load a segment
    register with a selector, the hardware loads a segment descriptor
    register with the appropriate descriptor, which includes the length.
    The hardware then check that the offset is within the segment on every
    memory access. Segments can be up to 65536 bytes long, (it might be
    65535, I might have forgotten). For 2D arrays, I would allocate arrays
    of pointers, which conveniently never needed to be larger than that.

    The overhead on this process is in the loading of segment descriptors,
    but OS/2 had to do it anyway. A segment descriptor cache in hardware
    would have made it faster, but intel never did that.

    In this system, int was 16 bits, and all array arithmetic was done
    with 16 bits, so bounds checking would fail if you managed to wrap.
    If you access element 8192 of a (double) array, for example, it would
    not be detected. That was rare enough. With 16 bit int, you could wrap
    even before the subscript calculation, a general hazzard on 16 bit
    machines.

    Note that you can add or subtract to pointers all you want, pass them
    to called functions, and the system still knows where the bounds are.

    -- glen
     
    glen herrmannsfeldt, Feb 6, 2014
  12. It could be an extended integer type. For a 64-bit system, a fat
    pointer would likely be 256 bits.

    Note that if you provide a 256-bit integer type, then intmax_t and
    uintmax_t have to be names for that type (unless there's something even
    bigger), which could affect performance for code that uses intmax_t.
    But fat pointers themselves are going to affect performance anyway.
     
    Keith Thompson, Feb 6, 2014
  13. So it is. I'm sure I knew that at one time, but I had forgotten.

    The change was introduced by Technical Corregendum #3, published in
    2007, in response to Defect Report #332. (The proposal in the DR was to
    allow it to copy at most BUFSIZ-1 characters, discarding any excessive
    characters; the committee chose to go further than that.)

    http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_332.htm
     
    Keith Thompson, Feb 6, 2014
  14. Suppose you have fat pointers. Each allocated object (created either by
    an object definition or by an allocation call) has a base address and a
    size. A pointer contains the base memory address of the pointed-to
    object, the total size of the object, and the byte offset within that
    object to which the pointer points. (Or, equivalently, you could store
    the pointed-to address directly, with a possibly negative offset for the
    enclosing object).

    Suppose A has 4 elements of 4 bytes each. Then &A[1] would yield a fat
    pointer containing the address of A, the size of A (16 bytes), and an
    offset (4 bytes).

    (You could also store sizes and offsets as element counts rather than
    byte counts.)
    Pointer arithmetic would check against the bounds information stored in
    the fat pointer. For example, you could subtract up to 4 bytes from the
    fat pointer resulting from evaluating &A[1]; subtracting more than that
    would be an error.

    [...]
     
    Keith Thompson, Feb 6, 2014
  15. Rick C. Hodgin

    David Brown Guest

    I believe there are modern systems that do pretty much the same thing,
    such as "electric fence".
     
    David Brown, Feb 7, 2014
  16. Rick C. Hodgin

    David Brown Guest

    <http://gcc.gnu.org/wiki/Mudflap_Pointer_Debugging>
    <https://code.google.com/p/address-sanitizer/wiki/ComparisonOfMemoryTools>

    It seems there are tools doing this sort of thing, and there are
    slowdowns involved.
     
    David Brown, Feb 7, 2014
  17. On Mon, 3 Feb 2014 21:43:10 +0000 (UTC), glen herrmannsfeldt
    I'd say rather that most interpreters need to keep >and use< full type
    and location information -- traditionally the 'symbol table' --
    whereas compilers can and traditionally did discard it, and then add
    some to most of it back in as 'debug symbols', which often work for
    the simple cases but not quite all the cases I need.
    IME Windows CMD seems to re-read .bat files nearly always. In
    particular everytime I have started running a .bat and then looked in
    my notepad window and realized 'oh that's not quite right' and saved a
    change, the running CMD promptly died or went nuts.

    OTOH if you try to do anything seriously complicated with .bat you
    probably deserve what you get.
    I don't see that he expects that. He expects the toolchain to keep
    track of where they are, and that's exactly what the (gigantic) .pdb
    does. The compiler can put them wherever it thinks best as long as the
    debugger can find them (using the .pdb) and the recompiled code can
    use the same (ditto). If the compiler's choices/guesses don't pan out,
    that's the 1% or 10% or whatever times he must restart. This is the
    Nit: separate (virtual) memory regions, which you usually want anyway
    just for simplicity; but rarely if ever address spaces.

    Catching a return to a patched routine is easy, it's the same as
    catching a return to a routine at all, which is pretty common. It does
    depend on your stack not getting clobbered, but all symbolic debugging
    and even most machine-level debugging depends on that.

    Like apparently quite a few others, I don't find edit-and-continue
    very valuable, certainly not enough to tie myself down to MSVS. But it
    is there, it is free-beer, it apparently does mostly work, and it's no
    skin off my nose if he or anyone else likes it.
     
    David Thompson, Feb 9, 2014
  18. Rick C. Hodgin

    Seebs Guest

    I'm not entirely sure this is right. The famous example is two-dimensional
    arrays, where there's been official responses that, yes, a compiler is
    allowed to bounds-check the bounds of a sub-array.

    -s
     
    Seebs, Feb 14, 2014
  19. Allowed, and, at least for C89 the sub-array bounds are known.

    But more often it is an array of pointers to 1D arrays, in which case
    that problem doesn't come up.

    -- glen
     
    glen herrmannsfeldt, Feb 14, 2014
  20. in <>:
    #> If a, b, and c are of some built-in type, then `a = b + c;` cannot
    #> involve any function calls in either C or C++.
    #
    # Sure it can. It could involve a call to a function called __addquad().

    I remember Dan Pop once told me about a C implementation for a Z80 (an
    8-bit CPU). On this CPU, 16bit addition uses two of three register pairs
    ("ADC HL,BC") and I'm confident it would call a function saving and
    restoring the registers clobbered.

    Regards,

    Jens
     
    Jens Schweikhardt, Feb 22, 2014
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.