What is wrong with this c-program?

Discussion in 'C Programming' started by Albert van der Horst, Dec 25, 2013.

  1. As I'm sure you know, the ISO C standard uses the term "byte" to refer
    to the size of type char, which is CHAR_BIT bits. CHAR_BIT is required
    to be *at least* 8, but may be larger. I can't think of anything in the
    standard that even implies that 8 is the preferred value.

    And yes, I understand that there are real-world systems with CHAR_BIT >
    8 (DSPs, mostly), though I haven't used a C compiler on any of them.

    Even if CHAR_BIT were required to be exactly 8, I'd still prefer to
    refer to CHAR_BIT rather than using the constant 8, since the macro name
    makes it clearer just what I mean.

    But if I have a need to write code that won't work unless CHAR_BIT==8,
    I'll probably take a moment to ensure that it won't *compile* unless
    CHAR_BIT==8. (Unless I'm working on existing code that has such
    assumptions scattered through it; in that case, I probably won't bother.)
    Keith Thompson, Dec 30, 2013
    1. Advertisements

  2. Albert van der Horst

    David Brown Guest

    I am an embedded programmer myself - with 8-bit, 16-bit and 32-bit
    microcontrollers. I have used DSP's a little, but DSP programming is a
    niche area. I don't have any hard numbers, but I think the percentage
    of C code written for DSP's is very low, and getting lower as ordinary
    microcontrollers replace them. There are lots of DSP devices sold - but
    fewer people programming them.

    Anyway, these devices do not have 16-bit "bytes" as such. Many types
    have 16-bit or 32-bit "chars" - but they are not (normally) referred to
    as "bytes". The exception, of course, is the C standards which define
    "byte" to be the smallest addressable unit. (Some microcontrollers and
    DSP's allow direct addressing of bits, but that uses compiler-specific
    extensions to C.) I don't have any DSP datasheets or toolchain manuals
    handy, so I am relying on memory here, but with the devices I used,
    groups of 16 bits were never referred to as "bytes".

    The word "byte" has several definitions, such as the one in the C (and
    C++) standards, the one in the IEEE (I've forgotten the number)
    standard, and the ones used by various computer manufacturers over the
    decades. But the de facto definition used in almost all current
    contexts is 8 bits. That is why I say other uses are old-fashioned. The
    C standards are written in a rather specific style, with their own
    specific definitions of terms that are often consistent with historical
    usage rather than current usage. (Compare that to the Java standard,
    which I believe defines a "byte" as 8 bits.)
    Fair enough - the context of C standards is a clear exception where
    "byte" can mean more than 8 bits, and obviously that is a common case
    here (although it is "esoteric" outside the world of c.l.c. and similar

    But even here, how often does the issue of "char" possibly having more
    than 8 bits come up when it makes a real-world, practical difference?
    It is almost invariably when someone writes code that assumes chars are
    8 bits, or assumes the existence of uint8_t, and then one of the
    standards experts points out that this might not always be true. (I
    don't mean this comment in a bad way - it is a useful thing to be
    informed of these details.) And almost invariably, the code the OP is
    writing will never be used on a system without 8-bit chars, and the OP
    knows this.

    And even in the context of machines with more than 8-bit chars, how
    often are these referred to as "bytes" without a great deal of
    qualification or context to avoid confusion?

    Maybe I expressed myself a bit too strongly, but I would certainly be
    surprised to read any reference to "byte" here that did not refer to
    8-bit bytes, unless there was context to qualify it.
    David Brown, Dec 30, 2013
    1. Advertisements

  3. Albert van der Horst

    David Brown Guest

    From the N1570 draft of C11 standard (since I have it to hand):

    addressable unit of data storage large enough to hold any member of the
    basic character set of the execution environment

    I don't know /exactly/ what that means in a non-hosted environment
    without any character set (as would be the case for most DSP's), but I
    take it to mean the smallest directly addressable unit of storage of
    size at least 8-bit (by the later definition of CHAR_BIT).

    And I'll agree that these standards are current documents, though their
    definition of "byte" is for consistency with historical versions of the
    standards rather than for consistency with modern usage.
    Yes, that's true - and I have used a C compiler for a couple of them.
    But they don't refer to a 16-bit "char" as a "byte" (except as implied
    by the C standards), precisely to avoid confusion. There are more than
    enough sources of confusion when you have to work with such systems...
    Fair enough - clarity is important.
    Absolutely. Usually I do that by using "uint8_t" (and occasionally
    "int8_t") types - code that is dependent on CHAR_BIT == 8 will typically
    have use of such types. And on the few occasions when I have been
    unable to avoid working on DSP's with 16-bit (or even 32-bit) chars, I
    have gone through all the code carefully to make sure it will work.
    David Brown, Dec 30, 2013
  4. Albert van der Horst

    James Kuyper Guest

    I don't follow that - it would only make sense if the de-facto usage had
    already long since replaced the standard-defined meaning. The fact is
    that both meanings have been in existence and in use at the same time
    for a very long time. The de-facto usage is unambiguously the more
    common one, but that's not because it has replaced the standard one. The
    standard definition has never been widely used, but the same type of
    people who used to use it in the appropriate contexts have continued to
    use it in those contexts.
    James Kuyper, Dec 30, 2013
  5. Albert van der Horst

    James Kuyper Guest

    On 12/30/2013 06:35 AM, David Brown wrote:
    The environment might not have any character set, but the C standard
    requires that a conforming implementation for that environment support
    the basic character set as defineed in section 5.2.1. The first use of
    "basic character set" in that section is in italics, an ISO convention
    indicating that it constitutes the definition of that term:
    A freestanding implementation is not required to support <stdio.h>,
    which makes that character set relatively unimportant, but it still must
    be defined, - each of those characters must be assigned a unique
    encodings, character literals must have those values, and string
    literals require the existence of arrays of char whose elements have
    those values. There are, if I counted correctly, 92 members of the basic
    character set, so it requires more than 7 bits to give each one a unique
    encoding, so the implication is that a the addressable storage unit must
    be at least 8 bits. This is also implied the requirements that SCHAR_MIN
    <= -127, SCHAR_MAX >= 127, and UCHAR_MAX >= 255 (
    James Kuyper, Dec 30, 2013
  6. 92 distinct values can be represented in just 7 bits, so the definition
    of the basic character set only implies CHAR_BIT >= 7. The actual
    requirement that CHAR_BIT >= 8 is stated explicitly elsewhere in the
    Keith Thompson, Dec 30, 2013
  7. If you call printf, you need to call it correctly. Some errors, such as
    passing something of the wrong type as the first argument, are
    compile-time errors; others, such as passing a later argument with the
    wrong format, needn't be caught by the compiler. The latter class of
    errors result in *undefined behavior*. A segmentation fault is one
    possible result.

    If you want to print a long int value, you need to use the "%ld" format
    (or something like it); "%d" requires an int argument.

    Without looking at the code, I don't know whether that's what's causing
    the symptom you're seeing, but you should certainly fix that problem.
    You should also invoke gcc with options to enable more warnings, such as
    "gcc -std=c99 -pedantic -Wall -Wextra -O3". (You might vary the
    "-std=c99 -pedantic" options depending on what dialect of C you're
    trying to use.)
    Keith Thompson, Dec 30, 2013
  8. Albert van der Horst

    James Kuyper Guest

    You're right, of course - I wasn't thinking hard enough about what I was
    Specifically,, where CHAR_BIT is described as "number of
    bits for smallest object that is not a bit-field (byte)".

    It seems a little odd that "byte" is defined as being "large enough to
    hold any member of the basic character set", when that requirement is
    NOT the one that determines its minimum size. I had incorrectly
    remembered the size as being determined by that requirement, which would
    have rendered the requirement that CHAR_BIT >= 8 redundant.
    James Kuyper, Dec 30, 2013
  9. Albert van der Horst

    James Kuyper Guest

    On 12/30/2013 01:27 PM, Martin Ambuhl wrote:
    Yes, but it's substantially less likely to malfunction in that
    particular fashion when only one format specifier is present, and it
    specifies a type that is probably no larger than the type of the only
    argument. Malfunctions of other types are still quite likely, of course.
    James Kuyper, Dec 30, 2013
  10. Albert van der Horst

    James Kuyper Guest

    That should have said 6 and 7 bits respectively. I should also have
    mentioned that CHAR_BIT is explicitly required to be at least 8, which
    is therefore the tighter constraint.
    That's why I said "if I counted correctly". I knew that 92 didn't sound
    right. The basic character set includes

    26 upper case letters
    26 lower case letters
    10 decimal digits
    29 graphic characters
    1 space character
    4 control characters
    96 characters

    My count included the null character, which is only in the basic
    execution character set (5.2.1p2), and didn't include the last two

    The basic execution character set, the one referenced by the definition
    of "byte", includes the null character and the three additional control
    characters mentioned in the sentence you cited, bringing the total to an
    even 100 characters.
    James Kuyper, Dec 31, 2013
  11. Kiki will be by any minute now to point out that that compiler was not

    And is thus OT in this newsgroup. Shame on you!
    Kenny McCormack, Dec 31, 2013
  12. There are places in Fortran IV and Fortran 66 where variables are
    allowed, but not constants. (The I/O list of WRITE statements,
    for one.) That sometimes results in such variables.

    One I remember from a C compiler was failing to compile ++
    applied to a double variable. Presumably rare enough that it
    wasn't caught in testing, but I had to change it so my program
    would compile.

    -- glen
    glen herrmannsfeldt, Dec 31, 2013
  13. Albert van der Horst

    Jorgen Grahn Guest

    That matches my recollection (I've been programming Texas DSPs at two
    quite diffeerent workplaces, around 1997 and in 2003 or so). IIRC you
    tended to talk about "words" instead -- another rather fuzzy term
    which in my mind translates to "the width which is a rough best match
    for registers and memory accesses".

    Jorgen Grahn, Jan 1, 2014
  14. Your problem lies here. The limit of the loop should be the truncated
    (integer) square root of n, not n itself.

    See https://en.wikipedia.org/wiki/Sieve_of_eratosthenes#Implementation
    Although by no means wrong, this is an unusual choice of parameter
    names for main(). By convention, the argument-count is usually
    named 'argc' and the argument-values pointer 'argv'.

    guinness.tony, Jan 9, 2014
  15. Albert van der Horst

    Jorgen Grahn Guest

    It's not wrong, but it's so close to wrong that it might as well been.
    (And I think I commented on it last year, elsewhere in the thread.
    I expect it to be fixed already.)

    Jorgen Grahn, Jan 9, 2014
  16. Albert van der Horst

    Ken Brody Guest

    It's a little wrong to say a tomato is a vegetable. It's very wrong to say
    it's a suspension bridge.
    Ken Brody, Jan 10, 2014
  17. Albert van der Horst

    Kaz Kylheku Guest


    apt-get install valgrind
    You're joking, right?

    How about making effective use of readily available tools.

    Compiler's opinion: plenty wrong here!

    $ gcc -Wall -ansi -pedantic -W -g sieve.c -o sieve
    sieve.c: In function ‘main’:
    sieve.c:41:5: warning: implicit declaration of function ‘atoi’ [-Wimplicit-function-declaration]
    sieve.c:42:5: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘long int’ [-Wformat]
    sieve.c:40:9: warning: unused variable ‘i’ [-Wunused-variable]
    sieve.c:38:15: warning: unused parameter ‘argv’ [-Wunused-parameter]
    sieve.c:45:1: warning: control reaches end of non-void function [-Wreturn-type]

    Validation with valgrind:

    $ valgrind ./sieve 1231234
    [.. snip ...]
    ==31965== Invalid write of size 1
    ==31965== at 0x8048519: fill_primes (sieve.c:33)
    ==31965== by 0x8048579: main (sieve.c:43)
    ==31965== Address 0x881002e9 is not stack'd, malloc'd or (recently) free'd

    The offending line 33 is

    for (j=i*i; j<=n; j+=i) composite[j] = 1;

    suggesting you're blowing past the end of a the static array.
    Kaz Kylheku, Jan 10, 2014
  18. When I saw this subject line the first thing I thought was:

    "Probably a newb question/program".

    The second thing was more funny:

    "Everything because it was written in C :)"

    Skybuck Flying, Jan 18, 2014
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.