Good book

Discussion in 'C Programming' started by Bill Cunningham, May 13, 2014.

  1. Bill Cunningham

    Clark Smith Guest

    That's going to depend on the actual platform and compiler. It
    may be true for x86, but for IA64 in particular, it is not, even when
    using the compilers supplied by Intel and HP.
    Not necessarily - see above.
     
    Clark Smith, May 15, 2014
    #21
    1. Advertisements

  2. Bill Cunningham

    Ian Collins Guest

    Just as long as you keep that knowledge up to date. Stale knowledge is
    a dangerous thing when it comes to processor architecture!
     
    Ian Collins, May 15, 2014
    #22
    1. Advertisements

  3. No, because if you try to optimise a function by rewriting it in
    assembler, the first thing you will do is test it for correctness,
    and the second thing you will do is test it for execution speed.
    So the worst likely result is that you waste a couple of day's
    work. Not good, but a wasted day or two happens all the time
    in a software development environment, it's not going to make
    or break the success of the project, normally.
     
    Malcolm McLean, May 15, 2014
    #23
  4. Bill Cunningham

    David Brown Guest

    If you have a chip in which "((y << 8) + (y << 2)) + x" is faster than
    "y * 260 + x", and your compiler doesn't optimise "y * 260 + x" into the
    shifts and adds, then you need a better compiler.

    And note that your compiler is likely to get the optimisation right -
    and use a "+" instead of an "|", while your manual "optimisation" is
    incorrect but is likely to pass simple code tests.

    Using "y * 260 + x" is also vastly clearer and more flexible than an
    explicit shift-and-add version.

    All in all, using such shift-and-add expressions when you really want a
    multiply is pretty much always a bad idea in many ways. There was a
    time when such things were useful (I've used them myself), but not now.
     
    David Brown, May 15, 2014
    #24
  5. I did very brief "coding" with BASIC for a bit. But couldn't really put
    much together. There's a difference in "coding" and "programming".

    B
     
    Bill Cunningham, May 16, 2014
    #25
  6. Say the data is actually sales of 257 varieties of baked bean by
    300 shops. So a 300*257 2D array, hard-coded (our products rarely
    change, we've been using "257 varieties" as a slogan for donkey's
    years).We can pad the array out to 260 and put null values in the empty
    slots.
    To give a slightly more realistic example, you often used to
    pad a sprite to 8x8, even if a row or column was all transparent.
     
    Malcolm McLean, May 16, 2014
    #26
  7. Bill Cunningham

    David Brown Guest

    I'm a little confused by what you are saying here, but I believe you are
    saying that a knowledge of assembly is important for efficient IA64
    programming, and that re-writing IA64 code in hand-written assembly can
    often be necessary to get good code. Is that right?

    I don't have any direct experience with the IA64, but from what I have
    read, you are correct. The theory behind the VLISC architecture was
    that the compiler would generate good static scheduling of multiple
    parallel instructions. In practice, compilers have never managed it
    very well - so a good understanding of the assembly and architecture
    would help IA64 programmers, and I can well believe that hand-written
    assembly /could/ outperform C code. However, writing assembly for the
    IA64 of any significant size and complexity is going to be very hard,
    and suffer from the usual maintenance issues and portability issues
    (across different IA64 devices), meaning it is unlikely to be a good
    solution. The only sane solution, of course, is to drop the IA64
    architecture - the theory on which it is founded is simply wrong for
    most general programming. No compiler or human assembler can pick good
    static scheduling because there is no good static scheduling choice in
    most cases - it has to be done dynamically (as is done in most
    superscaler cpus).
     
    David Brown, May 16, 2014
    #27
  8. Bill Cunningham

    David Brown Guest

    It is irrelevant - you write your multiplication using the real numbers
    (preferably #define'd values, or static const values). As long as the
    compiler can get compile-time access to the fixed values, it should
    generate the fastest code.

    That's a different matter - and certainly in some cases it is worth
    adding padding to make alignments fit well or to make array elements fit
    within cache lines (for big cpus) or to let the compiler turn multiplies
    into shifts (for small cpus).

    But note that I said "to let the /compiler/ turn multiplies into shifts"
    - not "to let the /programmer/ turn multiplies into shifts". Work
    /with/ your compiler, not /against/ it.
     
    David Brown, May 16, 2014
    #28
  9. No, because it's not allowed to replace

    /* NBEANS = 257 */
    int sales[NSHOPS][NBEANS];

    with
    int sales_internal[NSHOPS][260];because 260 can be efficiently backcracked while 257 can't.
     
    Malcolm McLean, May 16, 2014
    #29
  10. Bill Cunningham

    David Brown Guest

    We will leave it up to the language lawyers for a ruling on whether or
    not the compiler is allowed to make this change (I think not, but am not
    sure). But I don't think any compiler /would/ make that change, even if
    it turns out to be legal.

    However, no one is asking a compiler to do this change - changing a
    multiply by 257 into a multiply by 260 would change the logic of the
    code. I am saying that you should let the compiler change the multiply
    by 257 into shifts-and-adds if it wants, you should never do it manually.

    That is a different thing from manually adding padding to a struct in
    order to give the compiler easier numbers to multiply - that can
    sometimes be worth doing. But you do it by defining NBEANS to 260 and
    using "x * NBEANS" in the code - not by writing shifts.

    Incidentally, for most targets it will be at least as efficient, and
    often more efficient, to multiply by 257 rather than 260. The fact that
    you regularly get your "micro-optimism" examples wrong shows that this
    sort of thing is normally best left up to the compiler.
     
    David Brown, May 16, 2014
    #30
  11. Once you've worked out that 260 can be backcracked, 257 can't be, you can
    write C code with 260 hardcoded. But then you've got to look at the assembly
    to check that the compiler genuinely made the optimisation. Also, there
    are lots of numbers you could potentially pad to. Unless you've got some insight
    into the instruction set, it's hard to know which ones to choose. The average
    person is not going to think "260, that's only got two set bits".

    Then the data table holds sales of beans for a supermarket chain. Let's say each
    sale is a record, so typically customers will buy one or two cans of beans. So
    well be indexing into this table millions of times on a typical run. But the
    compiler has no way of knowing that. It can't distinguish a loop that will be
    run millions of times from one that will be run half a dozen times, because of
    course Ncustomers has to be soft-coded.
     
    Malcolm McLean, May 16, 2014
    #31
  12. Bill Cunningham

    Clark Smith Guest

    That is right.
    I have seen some spectacular examples, especially in the realm of
    cryptography, where carefully scheduled IA64 assembly language code
    outperforms its C counterpart by an order of magnitude.
    True. You can still get a bang for your buck by focussing on the
    right areas though. This is probably no different than other processors,
    in principle; it's just that under IA64 there are more opportunities for
    hand-coded assembly language code to shine vis-a-vis compiler-created
    assembly language code.
    Actually, I partially disagree. Experienced human assemblers can
    and do pick up good static scheduling for IA64, and it is certainly worth
    the development time in some cases, as mentioned above.

    I agree that current compilers most certainly don't do nearly as
    well. Current compiler technology does not seem to be sophisticated
    enough to do so, and it is not clear to me whether this is so because of
    a lack of investment and interest, or because it is just too difficult in
    general. The IA64 C compilers from Intel and HP generate far better IA64
    code than GCC, but they still make a suboptimal use of the architecture.
     
    Clark Smith, May 16, 2014
    #32
  13. Bill Cunningham

    David Brown Guest

    I am close to giving up and killfiling you for my own sanity.
    "backcrack" is clearly yet another of your private terms that you think
    are known to all speakers of the English language. I had assumed you
    were talking about the ease of transforming multiplies into
    shifts-and-adds, since that was what you were talking about a few posts
    earlier when you incorrectly claimed that it might be worth converting
    "y * 260 + x" into "((y << 8) | (y << 2)) + x". As I pointed out
    earlier, this is an incorrect transformation (unless you are sure that y
    is less than 64 and non-negative). The correct transformation would be
    "(y << 8) + (y << 2) + x". And any half-decent compiler will do such
    strength reduction as and when appropriate.

    /If/ that is what you mean by "backcrack", then y * 257 + x can also be
    "backcracked" into (y << 8) + y + x. This is simpler, and could be
    faster than the shift+add arrangement for 260.
    If you care about the speed here, your main task is to measure the
    timings. /How/ the compiler optimises the code is secondary to how fast
    the code is (compared to how fast you need it to be). Looking at the
    assembly is one way to get an idea if you don't want to measure the real
    times - it should not be hard in this particular case.
    The average person is not going to care - they will let the compiler
    deal with it. The more advanced programmer will know that 257 /also/
    has only two bits set, and the bit numbers are more amenable to optimal
    speed even on the simplest of processors. And a /really/ advanced
    programmer will take a step back and think whether this matters, and if
    so ask what the /real/ bottlenecks are - aligning to cache lines is
    likely to be more relevant than the multiplication on big systems.
     
    David Brown, May 16, 2014
    #33
  14. Bill Cunningham

    Ian Collins Guest

    Possibly because the underlying architecture shifts every 18 months or
    so and even Intel's compiler team can't keep up?
     
    Ian Collins, May 16, 2014
    #34
  15. Bill Cunningham

    David Brown Guest

    I have no problem believing that example - cryptography is a field that
    lends itself well to good assembly language implementations (that
    applies to most or all cpus, not just the IA64). It often needs lots of
    similar operations, and lots of maths operations that don't map well to
    C (such as multiplying with numbers and results of different sizes, or
    rotations of large numbers).

    There are other types of algorithm that can have similar effect,
    especially if the cpu supports particular instructions that don't fit
    well with C or the C compiler. DSP algorithms are often in this category.

    But these are the exceptions - "general" code is usually best written in C.
    That can be true for some types of code with a very predictable flow.
    But for a lot of general code, full of jumps and conditional branches,
    there is no way to get a good static scheduling.
     
    David Brown, May 16, 2014
    #35
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.