Bitshifts and "And" vs Floor-division and Modular

Discussion in 'Python' started by jimbo1qaz, Sep 7, 2012.

  1. jimbo1qaz

    jimbo1qaz Guest

    Is it faster to use bitshifts or floor division? And which is better, & or %?
    All divisors and mods are power of 2, so are binary operations faster? And are they considered bad style?
    jimbo1qaz, Sep 7, 2012
    1. Advertisements

  2. Why don't you use the timeit module and find out for yourself?
    Mark Lawrence, Sep 7, 2012
    1. Advertisements

  3. jimbo1qaz

    jimbo1qaz Guest

    How do I use it? timeit.timer is not defined.
    jimbo1qaz, Sep 7, 2012
  4. jimbo1qaz

    jimbo1qaz Guest

    How do I use it? timeit.timer is not defined.
    jimbo1qaz, Sep 7, 2012
  5. jimbo1qaz

    jimbo1qaz Guest

    OK, I decided to change my code. Which raises a similar question: Which oneis better for setting a bit of a byte: |= or +=, assuming each will only be run once? Intuitively, I think |=, but some timeits are inconclusive, mainly because I don't know how it works.
    jimbo1qaz, Sep 7, 2012
  6. jimbo1qaz

    Dave Angel Guest

    Yes, and yes. Without doing any measurement, I'd expect that in
    CPython, it makes negligible performance difference for ordinary ints
    (under 2**31, more or less). Ordinary ints can be done with single
    instructions, and any such instruction would be a tiny fraction of the
    opcode overhead.

    One place were there might be a difference would be for longs. The
    implementation of those would have to be a loop, and eventually one
    might be faster than the other. At that point, maybe you'd want to measure.
    The better way is not the faster one, but rather is the one that more
    clearly expresses the original problem. If the problem is a modulo one,
    use % (or frequently divmod). If the problem is a bit shift/masking
    one, then use such operators.

    BTW, '/' on integers is redefined for Python 3.x to give float results,
    and not to truncate.
    Dave Angel, Sep 7, 2012
  7. jimbo1qaz

    Terry Reedy Guest

    Yes, meaningless, yes, and no.
    I would do what seems sensible to you in the context.
    Terry Reedy, Sep 7, 2012
  8. Does it matter?

    If you ever find yourself writing an application where the difference
    between 0.0476 microseconds and 0.0473 microseconds matters to you,
    Python is probably the wrong language.

    py> from timeit import Timer
    py> min(Timer('456789 // 8').repeat(repeat=35))
    py> min(Timer('456789 >> 3').repeat(repeat=35))

    Depends what you want to do. If you want to perform bitwise-and, then I
    strongly recommend you use &, the bitwise-and operator. If you want to
    perform modulus, then the modulus operator % is usually better.

    As the MiddleMan says, "specificity is the soul of all good
    communication". Python has many binary operations, e.g.:

    + - * / // % == is < <= > >= ** & ^ |

    Some of them are faster than some other things, so would you like to be
    more specific?

    My *guess* is that you mean *bitwise* operators, compared to numeric
    operators like * and // (integer division). The runtime cost is mostly
    dominated by the object-oriented overhead -- Python is not C or assembly,
    and the integers are rich objects, not low-level bitfields, so the
    difference between division and bitshifting is much less than you might
    expect from assembly language.

    But, in principle at least, there *may* be some tiny advantage to the
    bitwise operators. I say "may" because the difference is so small that it
    is likely to be lost in the noise. I do not believe that there will be
    any real world applications where the difference between the two is
    significant enough to care about. But if you think different, feel free
    to use the profile module to profile your code and demonstrate that
    divisions are a significant bottleneck in your application.

    Absolutely. Using & when you mean to take the remainder is a dirty
    optimization hack only justified if you really, really, really need it.
    I'm pretty confident that you will never notice a speed-up of the order
    of 0.1 nanoseconds.

    "More computing sins are committed in the name of efficiency (without
    necessarily achieving it) than for any other single reason — including
    blind stupidity." — W.A. Wulf

    "We should forget about small efficiencies, say about 97% of the time:
    premature optimization is the root of all evil. Yet we should not pass up
    our opportunities in that critical 3%. A good programmer will not be
    lulled into complacency by such reasoning, he will be wise to look
    carefully at the critical code; but only after that code has been
    identified" — Donald Knuth

    "Bottlenecks occur in surprising places, so don't try to second guess and
    put in a speed hack until you have proven that's where the bottleneck
    is." — Rob Pike

    "The First Rule of Program Optimization: Don't do it. The Second Rule of
    Program Optimization (for experts only!): Don't do it yet." — Michael A.
    Steven D'Aprano, Sep 7, 2012
  9. Python (like most languages) doesn't have a "set this bit" operator, so
    the closest match is a bitwise-or. So to set a bit of a byte, the
    operation which most closely matches the programmer's intention is to use
    the bitwise operator.

    Even better would be to write a function called "setBit", and use that.

    Timing results are usually inconclusive because the difference between
    the results are much smaller than that average random noise on any
    particular result.

    All modern computers, say for the last 20 or 30 years, have used
    multitasking operating systems. This means that at any time you could
    have dozens, even hundreds, of programs running at once, with the
    operating system switching between them faster than you can blink.

    In addition, the time taken by an operation can depend on dozens of
    external factors, such as whether the data is already in a CPU cache,
    whether CPU prediction has pre-fetched the instructions needed,
    pipelines, memory usage, latency when reading from disks, and many

    Consequently, timing results are very *noisy* -- the *exact* same
    operation can take different amount of time from one run to the next.
    Sometimes *large* differences.

    So any time you time a piece of code, what you are *actually* getting is
    not the amount of time that code takes to execute, but something slightly
    more. (And, occasionally, something a lot more.) Note that it is always
    slightly more -- by definition, it will never be less.

    So if you want a better estimate of the actual time taken to execute the
    code, you should repeat the measurement as many times as you can bear,
    and pick the smallest value.

    *Not* the average, since the errors are always positive. An average just
    gives you the "true" time plus some unknown average error, which may not
    be small. The minimum gives you the "true" time plus some unknown but
    hopefully small error.

    The smaller the amount of time you measure, the more likely that it will
    be disrupted by some external factor. So timeit takes a code snippet and
    runs it many times (by default, one million times), and returns the total
    time used. Even if one or two of those runs were blown out significantly,
    the total probably won't be. (Unless of course your anti-virus decided to
    start running, and *everything* slows down for 10 minutes, or something
    like that.)

    But even that total time returned by timeit is almost certainly wrong. So
    you should call the repeat method, with as many iterations as you can
    bear to wait for, and take the minimum, which will still be wrong but it
    will be less wrong.

    And remember, the result you get is only valid for *your* computer,
    running the specific version of Python you have, under the specific
    operating system. On another computer with a different CPU or a different
    OS, the results may be *completely* different.

    Are you still sure you care about shaving off every last nanosecond?

    The internal details of how timeit works are complicated, but it is worth
    reading the comments and documentation, both in the Fine Manual and in
    the source code:
    Steven D'Aprano, Sep 7, 2012
  10. jimbo1qaz

    rusi Guest

    On an 8086/8088 a MUL (multiply) instruction was of the order of 100
    clocks and a DIV nearly 200 compared to ADD, OR etc which were
    something like 8 (IIRC -- this is decades-stale knowledge)
    On most modern processors (after the pentium) the difference has
    mostly vanished. I cant find a good data sheet to quote though -- one
    of the sad things about modern processors is that the clocks which
    were politely offered by intel earlier have now stopped presumably
    because cache-(in)coherence, pipelining etc are more likely to
    dominate the number of clocks than the specific instruction.

    This question is interesting to a programmer but meaningless at the
    python level (as others have pointed out). If it still interests you,
    work at the C (or still better assembly) level and use a more
    finegrained timer measure -- the finest being the RDTSC instruction.
    rusi, Sep 7, 2012
  11. jimbo1qaz

    Paul Rubin Guest

    See :

    4. Instruction tables: Lists of instruction latencies, throughputs
    and micro-operation breakdowns for Intel, AMD and VIA CPUs

    Multiplication is now fast but DIV is still generally much slower.
    There are ways to make fast parallel dividers that I think nobody
    bothers with, because of chip area and because one can often optimize
    division out of algorithms, replacing most of it with multiplication.

    Worrying about this sort of micro-optimization in CPython is almost
    always misplaced, since the interpreter overhead generally swamps any
    slowness of the machine arithmetic.
    Paul Rubin, Sep 7, 2012
  12. I don't suppose there's much of a chance that the OP is running Python
    on a CPU that doesn't have an integer divide instruction? If that
    _were_ the case, the difference would be more noticable, but would
    still probably not worth worrying about unless a truely huge number of
    operations were being done in a very tight loop with no intervening
    I/O operations.
    Grant Edwards, Sep 7, 2012
  13. jimbo1qaz

    rusi Guest

    Hey Thanks! Seems like a nice resource! How on earth does he come up
    with the data though, when Intel does not publish it?
    rusi, Sep 7, 2012
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.