Unaligned access

Discussion in 'C Programming' started by aleksa, May 5, 2010.

  1. aleksa

    aleksa Guest

    I'm relatively new to C.

    How should I define a ptr to monochrome bitmap?
    I currently have char*.

    So far, I'm only allocating (w/o OS functions) memory for a
    series of monochrome bitmaps. Every bitmap is different in size.

    int size; // bitmap size (in bytes)
    char* pbitmap; // ptr to current bitmap

    size = GetBitmapSize(...);

    // align next bitmap address on 16 bytes
    pbitmap += (size + 16) & ~15;

    When I had void* as ptr, this aligning thing was troublesome,
    so I switched to char*.

    Later, I will create functions PGet and PSet which will
    operate on bits (pixels) within selected byte.

    Now, suppose the code will run on CPU that
    doesn't have byte memory access.

    Is it my job or compiler's job to correctly access bytes?

    The same thing is reading an ascii file. Do I have to make
    any special precautions or just use char* and read trough?
    aleksa, May 5, 2010
    1. Advertisements

  2. Unsigned char for a pointer to arbitrary byte data. This is one of C's
    little quirks. Plain char can be either signed or unsigned, and if
    signed can contain trap representations. Alos, unsigned char documents
    your intention.

    By definition processors can access individual bytes. However on some
    machines the hardware bytes are 32 bits whilst C chars are 8 bits. The
    compiler handles this transparently by bit twiddling. This is
    efficient in memory use, inefficient in processor usage. Whilst such
    designs are rare, often it will be faster to handle image data as 32
    bit pixels rather than as separate 8-bit channels. However these days
    framebuffer operations are seldom the bottleneck.

    Images tend to have sizes not known at runtime. Allocate a flat buffer
    using malloc and calculate x y offsets yourself. Don't try to use a 2d
    Malcolm McLean, May 5, 2010
    1. Advertisements

  3. Ok, so we are now way, way, out into undefined
    At this point, pbitmap is not defined. I assume
    that at some point you do something (e.g. assign
    an integer) to define it. Be aware that this
    may not work, and even if it does you probably
    want a different integer for each platform and perhaps
    for different implementations on the same platform.

    pbitmap = malloc(size);

    will always work, but then you lose control over where
    pbitmap is. If you want portability, define a macro

    ASSIGN_BITMAP(pbitmap, size)

    and change it for each implementation used, with the
    above malloc as the default.

    Not surprising, adding an integer to a void* has no meaning
    in C. Adding an integer to a char* may have meaning and on
    many platforms it will do exactly what you expect.
    (I am not sure what will happen if (size + 16) will not fit
    in an int, but I think that ~15 will not be what you want)

    If x is a char* then it is the compilers job to get the
    "correct" byte if you ask for x[47]. However, bear in mind that the
    compiler may not do what you want. For one thing, the compiler may not
    use 8 bit bytes. So what the compiler gets may not be
    the 8 bits at offset x + 8*46
    If you fopen the file and read character by character, you will
    get the values you expect. However, the processor may not do this
    in the way you expect.

    - William Hughes
    William Hughes, May 5, 2010
  4. aleksa

    aleksa Guest

    The compiler handles this transparently by bit twiddling. This is
    In other words, it will *always* work, regardless of CPU used, right?

    Currently, I plan this to execute only on
    x86 and ARM9 and both can access bytes.

    I just wanted to be sure, in case I choose some ARM7 in the future.
    That ARM7 probably won't work on bitmaps (too slow), but will most
    read some ascii files and I wouldn't want to change the sources later.
    aleksa, May 5, 2010
  5. aleksa

    Chad Guest

    How do you lose control over where bitmap is? I mean, it's pointing at
    some 'valid' area of memory, isn't it?
    Chad, May 5, 2010
  6. void setpixel(unsigned char *buff, int width, int height, int x, int
    y, unsigned char red, unsigned char green, unsigned char blue)
    unsigned char *pixel = buff + ((y * width) + x) * 3;
    pixel[0] = red;
    pixel[1] = green;
    pixel[2] = blue;

    will always work, as long as buff points to a big enough area of
    memory, and other fucntions treat the image buffer as an array of 8
    bit rgbs.
    (There will be a chorus of demands to use size_ts instead of ints,
    which are correct. The total size of the buffer mustn't overflow the
    width of an int).

    You don't need to worry abput buff's alignment.

    However other systems may well be faster. An obvious problem with the
    above is that it takes too many parameters, a compiler may not always
    optimise the call out.
    Malcolm McLean, May 5, 2010
  7. Yes, you know the memory is valid, but you have no idea where it
    is or what it is. It might be a little old man in China
    with a brush and pad who communicates by mail.
    A more reasonable example is a machine with some fast
    memory that is never saved to disk, and some slower memory
    that can be paged to disk. If you need your bitmaps to be
    in the fast memory, malloc may not cut it.

    - William Hughes
    William Hughes, May 5, 2010
  8. aleksa

    aleksa Guest

    At this point, pbitmap is not defined.

    And it really isn't, I didn't get to that point yet.
    How can I assign an integer to a pointer, isn't that invalid?

    This is what I have planned:

    1. pbitmap is (will be) initialized from a void* that
    points to free memory.

    2. store (somewhere) current pbitmap.

    3. GetBitmapSize (in bytes) and adjust pbitmap.

    4. get next bitmap and goto 2.

    Actually, I have this already working, but written in ASM.

    Now I'm converting it to C, and my first problem was void* or char*
    as a ptr to bitmap. It seems that is must be char* although that
    is a bit confusing since I'm not really accessing ascii characters.
    I don't understand this.

    There are platforms that will behave differently?
    Are x86 and ARM in that platform-list?
    int is 32-bits, why wouldn't it fit? I don't plan to use 16 bitters
    anymore :)

    ~15 is to align the memory ptr to 16 bytes, and the code generated
    for x86 is correct (checked).

    How else can I align my ptr?
    I don't have any OS. My ASM code received the file with RS-232.

    C code will do the same, and than scan with char*.
    Any problems with that on x86, ARM?

    I'm switching from x86 ASM to C and my experience so far is:

    - the sources are more readable and shorter.

    - generated code is 5% faster than my hand-written ASM.
    (I've tested the speed on one project only, easy-writing in both

    - portability... thats why I even started learning C and now I hear
    that bytes can be longer than 8 bits... How long is a nibble, than?

    - and yeah, C can be very frustrating at times, and always wants
    to be smarter than me...
    aleksa, May 5, 2010
  9. aleksa

    aleksa Guest

    void setpixel(unsigned char *buff, int width, int height, int x, int
    Thanks, pretty straightforward, but I'll have to convert it to
    monochrome BMP.
    My English is poor here..
    Do you say that people would say size_t is a better choice here,
    but *you* stick with ints (as in, sorry folks, ints are correct)?

    Why would size_t be better? From what I read here:
    size_t should not be used here..

    Also, width, height, x and y are always positive values,
    so int could/should be unsigned int (by my POW).

    I've seen examples where always-positive variables are
    not defined as unsigned int, only int, so I'm doing the
    same, even though I don't understand why.
    aleksa, May 5, 2010
  10. Yes, I think that's what Malcolm is saying.

    Incidentally, I had to go back to the parent article to confirm that
    Malcolm was the one who wrote it. Please leave attribution lines in
    place for any quoted text.
    Speaking of unattributed quotations, that web page appears to be a
    copy of a thread from comp.lang.c, published with no indication of
    where it came from and with a strong implication that the articles
    were posted by bytes.com's "community of C / C++ experts". So this
    article will probably appear on bytes.com as well.

    To anyone reading this on bytes.com: I am not a member of this
    "community", and bytes.com does not have my permission to re-post
    this article or falsely claim credit for it.

    Anyway, the thread you refer to includes several different opinions on
    the topic. The statement that you should use int rather than size_t
    appears to be from Malcolm, the same person you were replying to,
    so I wouldn't say it's supporting evidence.

    My own opinion is that size_t is the appropriate type to use for
    an array index. It's ok to use int if you're reasonably sure
    that indices cannot exceed INT_MAX, but can be difficult to be
    sure of that as the program is modified in the future. size_t,
    on the other hand, is very nearly guaranteed to be big enough
    (the rationale for that has been discussed here at length before).
    unsigned int typically (almost always) can represent a wider range
    of positive values than int. For example, on a 16-bit system
    INT_MAX is typically 32767 and UINT_MAX is typically 65535.
    So that might be a slight advantage for unsigned int over int.
    But size_t (another unsigned type) has the further advantage that
    it's guaranteed to be able to represent any valid array index.

    There are some pitfalls in using unsigned types (unsigned int,
    unsigned long, size_t, etc.). Integer types, whether signed
    or unsigned, represent a finite subrange of the infinite set of
    mathematical integers. If you stay well within that subrange, you
    can safely pretend that you're dealing with mathematical integers.
    As you approach the endpoints of the range, you can run into cases
    where the results of a calculation don't match the mathematical
    result, and may not even be well defined. For signed types, those
    endpoints are at large negative and positive values that you're often
    not likely to reach. For unsigned types, one of the endpoints is at
    0, well within the range of values you're likely to be dealing with.

    An example:

    unsigned int count = 10;
    while (count >= 0) {
    /* ... */
    count --;

    This is an infinite loop, because the condition "count >= 0" is always

    Using int for array indices is quite common, and I'm not saying
    that it's wrong. But I do think that size_t is the safest and most
    sensible type to use for array indices. You just have to keep in
    mind that it's an unsigned type and watch out for any pitfalls.
    Obviously not everyone agrees.
    Keith Thompson, May 5, 2010
  11. aleksa

    aleksa Guest

    aleksa, May 5, 2010
  12. aleksa

    aleksa Guest

    Hmm, after reading again, I now too think that size_t
    should be used, since it is unsigned and "width, height,
    x and y are always positive values".

    At first, I just stopped at:
    "`size_t' is a type suitable for representing the amount
    of memory a data object requires, expressed in units of `char'."
    aleksa, May 5, 2010
  13. Note that the same rationale would justify using unsigned int rather
    than size_t.

    The reason size_t is better is this:

    The size in bytes of any object can be expressed as a value within the
    range of size_t. For a declared object, ``sizeof obj'' yields a size_t
    result; for an allocated object, malloc()'s argument is of type size_t.

    Nitpick, feel free to ignore: {
    There have been lengthy discussions here questioning this
    assumption. An implementation might permit a declared object to
    be bigger than size_t bytes, and ``sizeof obj'' could overflow
    like any other operator. calloc() takes two size_t arguments,
    and could in theory create an object bigger than SIZE_MAX bytes.
    Some implementation-specific mechanism could be used to create or
    access such objects. In practice, though, I know of no systems
    where this is an issue. My own argument is that if objects bigger
    than size_t bytes are possible, the implementation just needs to
    choose a bigger type for size_t. And if size_t isn't guaranteed
    to be big enough, no other type is either -- not even uintmax_t.

    Since array elements are always at least 1 byte (C has no arrays of
    bits or of bit fields), it follows that size_t is also guaranteed
    to be big enough to hold any valid array index.

    Some have argued that this is ugly, because the name "size_t" implies a
    size in *bytes*, and perhaps because the "_t" suffix is just clutter.
    I disagree.
    Keith Thompson, May 5, 2010
  14. aleksa

    bart.c Guest

    For working with images, it's unlikely that width, height, x or y will need
    more than 16 bits, so that int will always be enough.

    (But calculations with these to work out an offset within the entire image
    will usally need more.)

    Using size_t for x,y might also be problematic, if x,y can ever be negative.
    For example, to draw geometric elements with some points to the left or
    above your image (if (0,0) is the top left). Negative coordinates allows
    clipping to be applied; always-positive coordinates makes this harder.
    bart.c, May 5, 2010

  15. Highly unlikey, why such an image would probably be several
    gigabytes in size. Such images are very unlikely until the mid

    On the other hand it is unlikely you would try to manipulate
    such an image using an implementation with 16 bit ints.

    - William Hughes
    William Hughes, May 5, 2010
  16. aleksa

    bart.c Guest

    I was arguing against using size_t. My point was int would normally suffice,
    probably even when ints were 16-bits.
    16-bits unsigned allows addressing of up to 4000 Mpixel images. Of course in
    the 1990's we were all dealing with much bigger images than that...

    However 32-bit signed x,y do make more sense (than both 16-bits and size_t)
    allowing for degenerate image sizes, and virtual coordinates.
    bart.c, May 5, 2010
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.