Non-constant constant strings

R

Rick C. Hodgin

People doing weather forecasting have even more strict requirements.
If it takes 10 days to predict tomorrow's weather, it is useless.

Yeah. Don't use it for that. But, if you're writing an algorithm that
computes some way on some data, then you can do immediate testing on that
algorithm, which is what I'm talking about.

Best regards,
Rick C. Hodgin
 
D

David Brown

(snip on edit-and-continue)



Seems to me that one could write a C interpreter, or something close
to one, that would allow for the usual abilities of interpreted
languages. Among other things that could make C easier/faster to
debug would be full bounds checking.

I believe various C-like interpreters have been made over the years, but
none have been very popular - basically, if you first decide to use an
interpreted model (and thus get easy equivalents to
"edit-and-continue"), there are better languages for the job. C or
C-like interpreters would only be useful for specifically checking
something you write in C. And in such cases, you can often transplant
the code into a minimal test program, taking a fraction of a second to
compile and run.

But I agree with you that features such as bounds checking could often
be useful in debugging C code. There are quite a lot of related tools,
such as "valgrind", instrumentation functions, libraries with extra
checks, etc. But there is always scope for more. For example, gcc has
a "-fbounds-check" option, but it is only supported for Java and Fortran
(Ada has it automatically, I think). Supporting it for C (or C++, where
exceptions give an obvious choice of how to react to a bounds error)
would be helpful for some types of debugging.
Well, even more, does that change the bugs that need to be found?

Yes, because some bugs are only noticeable when you are optimising. For
example, if you've got code that has aliasing issues, it might run as
you wanted when optimising is disabled, but fail when the compiler takes
advantage of type-based aliasing analysis to produce faster code. It is
for this sort of reason that I normally use a single set of options for
my compiling - with fairly strong optimisation as well as debug
information - rather than having separate "debug" and "release"
configurations. Occasionally I need to reduce optimisation while
identifying weird bugs, but mostly it is constant throughout development.
If it takes one second to start the program from the beginning and
get to the point of the bug, is it really worth edit-and-continue?

For me, when I am debugging I want to be thinking as much as I can
about the program I am working on. What it could possibly be doing
wrong, and how to fix that. If I also have to think about other causes
for bugs, such as having the wrong pointer due to edit-and-continue,
then I can't completely think about the program.

Agreed.

As I wrote, I did years ago use a BASIC system that allowed for
edit and continue. (You could even continue at a different place.)
But later, due to changes in ORVYL, that feature was removed.

I sometimes do things like manually manipulate the program counter from
within the debugger, to get this effect. Most of my work is embedded
systems, and in some systems the "compile, download and run" cycle is
long (the compile is typically quick, but downloading or reprogramming
flash can be slow on some microcontrollers). edit-and-continue might be
useful in such systems - unfortunately, it would be completely
impractical to implement.
They might not be so rare, but in many cases other ways have been
found to get around them.

Fair enough point.
 
B

BartC

David Brown said:
On 05/02/14 18:31, glen herrmannsfeldt wrote:
But I agree with you that features such as bounds checking could often
be useful in debugging C code. There are quite a lot of related tools,
such as "valgrind", instrumentation functions, libraries with extra
checks, etc. But there is always scope for more. For example, gcc has
a "-fbounds-check" option, but it is only supported for Java and Fortran
(Ada has it automatically, I think).

That's not surprising, as I'm not sure whether 'bounds' checking could be
even be meaningful in C.

With statically allocated arrays, it seems straightforward - until you pass
the array to a function. Then, and in all other cases of indexing, you only
have a pointer.

Then you need a lot of things going on behind the scenes in order to somehow
have pointers carry bounds with them. But it is not always that obvious what
the bounds are supposed to be, if there are any at all (for pointers to
individual elements for example). The flexibility to be able to do anything
with pointers makes it harder to apply bound checking, except perhaps for
hard memory limits.
Supporting it for C (or C++, where
exceptions give an obvious choice of how to react to a bounds error)
would be helpful for some types of debugging.

It would definitely help, and be better than a crash, or (worse) no crash,
as you quietly read or overwrite something else.
Most of my work is embedded
systems, and in some systems the "compile, download and run" cycle is
long (the compile is typically quick, but downloading or reprogramming
flash can be slow on some microcontrollers). edit-and-continue might be
useful in such systems - unfortunately, it would be completely
impractical to implement.

Is there any option on these to make use of external ram to run the program
from? (I started off on this sort of thing, at a time when a reprogramming
meant an eprom erase/program cycle, or if there was ram, then a formal
compile might have meant several minutes of floppy disks grinding away. By
using ram, and my own tools and hardware setup (we couldn't afford ICE
anyway), the development cycle was just as fast as I could type.)
 
D

David Brown

That's not surprising, as I'm not sure whether 'bounds' checking could
be even be meaningful in C.

With statically allocated arrays, it seems straightforward - until you
pass the array to a function. Then, and in all other cases of indexing,
you only have a pointer.

Indeed - bounds checking would only be meaningful in some situations.
(Newer versions of gcc do compile-time checking for particularly obvious
cases.)

The most practical "bounds checking in C" is probably to use C++
container classes.
Then you need a lot of things going on behind the scenes in order to
somehow have pointers carry bounds with them. But it is not always that
obvious what the bounds are supposed to be, if there are any at all (for
pointers to individual elements for example). The flexibility to be able
to do anything with pointers makes it harder to apply bound checking,
except perhaps for hard memory limits.

Special debugging libraries and tools can help (I believe they use
memory mapping tricks, such as making the space beyond a malloc'ed area
trigger processor exceptions).
It would definitely help, and be better than a crash, or (worse) no
crash, as you quietly read or overwrite something else.


Is there any option on these to make use of external ram to run the
program from? (I started off on this sort of thing, at a time when a
reprogramming meant an eprom erase/program cycle, or if there was ram,
then a formal compile might have meant several minutes of floppy disks
grinding away. By using ram, and my own tools and hardware setup (we
couldn't afford ICE anyway), the development cycle was just as fast as I
could type.)

That depends entirely on the microcontroller. For some, such as the
AVR, you cannot execute code from ram. On others, such as ARM Cortex-M
devices (which are extremely popular these days), you can run fine from
ram. But your ram is usually much smaller than your flash, and there
can also be significant timing differences, so it is not always easy.
When I can, I do most of the development on a bigger device with more
ram, and run the program from ram - only at later stages do I use the
final target chip and flash. Putting the program in ram also usually
means you can be flexible about breakpoints - with code in flash, you
are limited to the on-chip debugger's "hard" breakpoints.
 
D

David Brown

Actually, the standard does well define what are the 'bounds' for any
pointer. You could include make pointers be "fat pointers", where every
pointer includes a pointer to an "array description", which gives the
bounds for that array. (pointers to non-arrays could create "arrays" of
one element for their bounds, or a special value to indicate this, and a
second special value if the pointer is incremented to point to the "one
past the end of an array" value).

This is very much against the "style" of C, as it as significant
"weight" to the code, and some things that are normally cheap become
expensive. (In particular, some pointer casts may need lookups), but
would be allowed, an possibly useful in a special "debug" mode.

The standard defines bounds for a pointer, but I feel there is no way to
enforce it completely (even if the pointer carries around a "min" and
"max" bound). For example, a pointer can be converted to an
appropriately sized integer type, then converted back again (someone
will no doubt quote chapter and verse if I get this wrong). As long as
you don't change that integer, the result is guaranteed correct behaviour.

Maybe I'm wrong, and the "fat pointer" could cover all cases - but as
you say, it would be quite inefficient for most uses.
 
A

ais523

David said:
The standard defines bounds for a pointer, but I feel there is no way to
enforce it completely (even if the pointer carries around a "min" and
"max" bound). For example, a pointer can be converted to an
appropriately sized integer type, then converted back again (someone
will no doubt quote chapter and verse if I get this wrong). As long as
you don't change that integer, the result is guaranteed correct behaviour.

Maybe I'm wrong, and the "fat pointer" could cover all cases - but as
you say, it would be quite inefficient for most uses.

You could encode the bounds of the fat pointer in the integer, too. It'd
have to be a pretty large integer type, but there's no reson why an
implementation can't have a really large intptr_t just to be able to
hold a fat pointer. (Also, IIRC it's possible to have an implementation
with no intptr_t, but that would be less useful than an implemntation
that did have one.)
 
D

David Brown

You could encode the bounds of the fat pointer in the integer, too. It'd
have to be a pretty large integer type, but there's no reson why an
implementation can't have a really large intptr_t just to be able to
hold a fat pointer. (Also, IIRC it's possible to have an implementation
with no intptr_t, but that would be less useful than an implemntation
that did have one.)

I suppose that would work, unless there are restrictions on intptr_t
such as requiring it to be synonymous with an existing integer type (to
store a fat pointer, it would really need to be a "long long long" !).
 
B

BartC

Richard Damon said:
Actually, the standard does well define what are the 'bounds' for any
pointer.

If I have a four-element array A, how does it know whether &A[1] is:

- a pointer into A[0] ... A[3] which is the whole original array

- a pointer into the slice A[1] ... A[3], a slice of the original (I can
pass any slice to a function, complete with the length I want, but how will
it know my intentions)

- a pointer only to the element A[1]

With bounds errors occurring when I attempt arithmetic on the pointer (or
maybe when I try to dereference the result).

In languages where arrays and pointers are distinct, and where arrays might
have stored bounds anyway, it's a lot easier. It can also be more efficient
(compare a potential index against a limit) compared with C where you might
have an lower and upper limit, when a pointer points into the middle of an
array.
 
T

Tim Rentsch

Keith Thompson said:
They didn't merely deprecate gets(); that would mean declaring
that its use is discouraged and it may be removed in a future
standard. They removed it completely (without going through an
initial step of marking it as obsolescent).

As a point of information, gets() was listed as both deprecated
and obsolescent in n1256.
 
J

James Kuyper

On 02/06/2014 11:04 AM, David Brown wrote:
....
I suppose that would work, unless there are restrictions on intptr_t
such as requiring it to be synonymous with an existing integer type (to
store a fat pointer, it would really need to be a "long long long" !).

The only requirements on intptr_t are that, if an implementation
chooses to provide them (which it need not), they must be sufficiently
large [un]signed integer types. They can be either standard or extended
integer types. I expect they would usually be implemented as typedefs
for types that can be named by other means, but that's not actually
required - for instance, <stdint.h> could contain a #pragma that turns
on recognition of those names as built-in types just like "int" or "double".

Of course, it wouldn't be an obstacle even if intptr_t were required
to be typedefs for standard types. The standard imposes only a very low
minimum on the amount of memory an implementation is required to
support, and no upper limit on the size of [unsigned] long long, so an
implementation could always choose to make [unsigned] long long large
enough to store a fat pointer.
 
G

glen herrmannsfeldt

BartC said:
"David Brown" <[email protected]> wrote in message

(snip on bounds checking in C)
That's not surprising, as I'm not sure whether 'bounds' checking
could be even be meaningful in C.
With statically allocated arrays, it seems straightforward - until
you pass the array to a function. Then, and in all other cases of
indexing, you only have a pointer.
Then you need a lot of things going on behind the scenes in
order to somehow have pointers carry bounds with them.

In the 80286 and OS/2 1.0 days (when most people were running MS-DOS
on their 80286 machines) I was debugging a program that used lots of
arrays, and had a tendency to go outside them.

Instead of malloc(), I called the OS/2 segment allocator to allocate
a segment of the requested length, put a zero offset onto it, and used
it as a C pointer. This is in large memory model where pointers have
a segment selector and offset. The segment selector is an index into
a table managed by OS/2 for allocates segments. When you load a segment
register with a selector, the hardware loads a segment descriptor
register with the appropriate descriptor, which includes the length.
The hardware then check that the offset is within the segment on every
memory access. Segments can be up to 65536 bytes long, (it might be
65535, I might have forgotten). For 2D arrays, I would allocate arrays
of pointers, which conveniently never needed to be larger than that.

The overhead on this process is in the loading of segment descriptors,
but OS/2 had to do it anyway. A segment descriptor cache in hardware
would have made it faster, but intel never did that.

In this system, int was 16 bits, and all array arithmetic was done
with 16 bits, so bounds checking would fail if you managed to wrap.
If you access element 8192 of a (double) array, for example, it would
not be detected. That was rare enough. With 16 bit int, you could wrap
even before the subscript calculation, a general hazzard on 16 bit
machines.

Note that you can add or subtract to pointers all you want, pass them
to called functions, and the system still knows where the bounds are.

-- glen
 
K

Keith Thompson

David Brown said:
I suppose that would work, unless there are restrictions on intptr_t
such as requiring it to be synonymous with an existing integer type (to
store a fat pointer, it would really need to be a "long long long" !).

It could be an extended integer type. For a 64-bit system, a fat
pointer would likely be 256 bits.

Note that if you provide a 256-bit integer type, then intmax_t and
uintmax_t have to be names for that type (unless there's something even
bigger), which could affect performance for code that uses intmax_t.
But fat pointers themselves are going to affect performance anyway.
 
K

Keith Thompson

Tim Rentsch said:
As a point of information, gets() was listed as both deprecated
and obsolescent in n1256.

So it is. I'm sure I knew that at one time, but I had forgotten.

The change was introduced by Technical Corregendum #3, published in
2007, in response to Defect Report #332. (The proposal in the DR was to
allow it to copy at most BUFSIZ-1 characters, discarding any excessive
characters; the committee chose to go further than that.)

http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_332.htm
 
K

Keith Thompson

BartC said:
Richard Damon said:
Actually, the standard does well define what are the 'bounds' for any
pointer.

If I have a four-element array A, how does it know whether &A[1] is:

- a pointer into A[0] ... A[3] which is the whole original array

- a pointer into the slice A[1] ... A[3], a slice of the original (I can
pass any slice to a function, complete with the length I want, but how will
it know my intentions)

- a pointer only to the element A[1]

Suppose you have fat pointers. Each allocated object (created either by
an object definition or by an allocation call) has a base address and a
size. A pointer contains the base memory address of the pointed-to
object, the total size of the object, and the byte offset within that
object to which the pointer points. (Or, equivalently, you could store
the pointed-to address directly, with a possibly negative offset for the
enclosing object).

Suppose A has 4 elements of 4 bytes each. Then &A[1] would yield a fat
pointer containing the address of A, the size of A (16 bytes), and an
offset (4 bytes).

(You could also store sizes and offsets as element counts rather than
byte counts.)
With bounds errors occurring when I attempt arithmetic on the pointer (or
maybe when I try to dereference the result).

Pointer arithmetic would check against the bounds information stored in
the fat pointer. For example, you could subtract up to 4 bytes from the
fat pointer resulting from evaluating &A[1]; subtracting more than that
would be an error.

[...]
 
D

David Brown

(snip on bounds checking in C)





In the 80286 and OS/2 1.0 days (when most people were running MS-DOS
on their 80286 machines) I was debugging a program that used lots of
arrays, and had a tendency to go outside them.

Instead of malloc(), I called the OS/2 segment allocator to allocate
a segment of the requested length, put a zero offset onto it, and used
it as a C pointer. This is in large memory model where pointers have
a segment selector and offset. The segment selector is an index into
a table managed by OS/2 for allocates segments. When you load a segment
register with a selector, the hardware loads a segment descriptor
register with the appropriate descriptor, which includes the length.
The hardware then check that the offset is within the segment on every
memory access. Segments can be up to 65536 bytes long, (it might be
65535, I might have forgotten). For 2D arrays, I would allocate arrays
of pointers, which conveniently never needed to be larger than that.

The overhead on this process is in the loading of segment descriptors,
but OS/2 had to do it anyway. A segment descriptor cache in hardware
would have made it faster, but intel never did that.

In this system, int was 16 bits, and all array arithmetic was done
with 16 bits, so bounds checking would fail if you managed to wrap.
If you access element 8192 of a (double) array, for example, it would
not be detected. That was rare enough. With 16 bit int, you could wrap
even before the subscript calculation, a general hazzard on 16 bit
machines.

Note that you can add or subtract to pointers all you want, pass them
to called functions, and the system still knows where the bounds are.

-- glen

I believe there are modern systems that do pretty much the same thing,
such as "electric fence".
 
D

David Brown

Actually, the standard does well define what are the 'bounds' for any
pointer.

If I have a four-element array A, how does it know whether &A[1] is:

- a pointer into A[0] ... A[3] which is the whole original array

- a pointer into the slice A[1] ... A[3], a slice of the original (I can
pass any slice to a function, complete with the length I want, but how
will it know my intentions)

- a pointer only to the element A[1]

With bounds errors occurring when I attempt arithmetic on the pointer (or
maybe when I try to dereference the result).

In languages where arrays and pointers are distinct, and where arrays might
have stored bounds anyway, it's a lot easier. It can also be more efficient
(compare a potential index against a limit) compared with C where you might
have an lower and upper limit, when a pointer points into the middle of an
array.

The bounds that the C compiler is allowed to enforce is the whole array,
since the standard defines what

ptr = &A[1];

ptr[-1], ptr[0], ptr[1] and ptr[2] must evaluate to.

You may want to call it a slice, you can, but unless you do something to
make accessing outside that slice UB, the standard doesn't allow it to
"reject" such accesses (unless you do some sort of "Undefined Behavior"
to tell the compiler to do such a check).

As I said, bounds checking, while allowed, mostly goes against the
purpose of the language, but might be reasonable in a heavy debugging mode.

<http://gcc.gnu.org/wiki/Mudflap_Pointer_Debugging>
<https://code.google.com/p/address-sanitizer/wiki/ComparisonOfMemoryTools>

It seems there are tools doing this sort of thing, and there are
slowdowns involved.
 
D

David Thompson

On Mon, 3 Feb 2014 21:43:10 +0000 (UTC), glen herrmannsfeldt
I once used a BASIC interpreter that allowed edit and continue.
(But for some changes it wouldn't allow you to continue.)

It seems to me it makes more sense for an intepreter, where
variables can exist separate from compilation.
I'd say rather that most interpreters need to keep >and use< full type
and location information -- traditionally the 'symbol table' --
whereas compilers can and traditionally did discard it, and then add
some to most of it back in as 'debug symbols', which often work for
the simple cases but not quite all the cases I need.
I once edited a csh script while it was running, and noticed
(while not planning for) the change to take effect immediately.
Seems that sometimes csh will reload from disk on loop iterations,
though I don't know that you can predict that.
IME Windows CMD seems to re-read .bat files nearly always. In
particular everytime I have started running a .bat and then looked in
my notepad window and realized 'oh that's not quite right' and saved a
change, the running CMD promptly died or went nuts.

OTOH if you try to do anything seriously complicated with .bat you
probably deserve what you get.
Well, one could implement C as an interpreter. Many interpreted
languages are difficult to compile, though.

I notice that Rick seems to expect variables to exist in memory
in the order they are declared. I don't believe that C requires
that, and don't know how many compilers do that. Also, you probably

I don't see that he expects that. He expects the toolchain to keep
track of where they are, and that's exactly what the (gigantic) .pdb
does. The compiler can put them wherever it thinks best as long as the
debugger can find them (using the .pdb) and the recompiled code can
use the same (ditto). If the compiler's choices/guesses don't pan out,
that's the 1% or 10% or whatever times he must restart. This is the
same as any compiler does except much more complicated. said:
want a system that keeps code and data in separate address spaces.
Locations in code could move around, such that function returns
could easily fail.
Nit: separate (virtual) memory regions, which you usually want anyway
just for simplicity; but rarely if ever address spaces.

Catching a return to a patched routine is easy, it's the same as
catching a return to a routine at all, which is pretty common. It does
depend on your stack not getting clobbered, but all symbolic debugging
and even most machine-level debugging depends on that.

Like apparently quite a few others, I don't find edit-and-continue
very valuable, certainly not enough to tie myself down to MSVS. But it
is there, it is free-beer, it apparently does mostly work, and it's no
skin off my nose if he or anyone else likes it.
 
S

Seebs

The bounds that the C compiler is allowed to enforce is the whole array,

I'm not entirely sure this is right. The famous example is two-dimensional
arrays, where there's been official responses that, yes, a compiler is
allowed to bounds-check the bounds of a sub-array.

-s
 
G

glen herrmannsfeldt

I'm not entirely sure this is right. The famous example is
two-dimensional arrays, where there's been official responses that,
yes, a compiler is allowed to bounds-check the bounds of a sub-array.

Allowed, and, at least for C89 the sub-array bounds are known.

But more often it is an array of pointers to 1D arrays, in which case
that problem doesn't come up.

-- glen
 
J

Jens Schweikhardt

in <[email protected]>:
#> If a, b, and c are of some built-in type, then `a = b + c;` cannot
#> involve any function calls in either C or C++.
#
# Sure it can. It could involve a call to a function called __addquad().

I remember Dan Pop once told me about a C implementation for a Z80 (an
8-bit CPU). On this CPU, 16bit addition uses two of three register pairs
("ADC HL,BC") and I'm confident it would call a function saving and
restoring the registers clobbered.

Regards,

Jens
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,040
Latest member
papereejit

Latest Threads

Top