ffs16

Discussion in 'C Programming' started by Joe keane, Apr 22, 2012.

  1. Joe keane

    Joe keane Guest

    case A

    int ffs16a(int x)
    {
    int res;

    if ((x & 0xFF) == 0)
    res = kk_ffstab[x & 0xFF];
    else
    res = kk_ffstab[x >> 8 & 0xFF] + 8;
    return res;
    }

    case B

    int ffs16b(int x)
    {
    int resl;
    int resh;
    int res;

    fla = (x & 0xFF) != 0;
    resl = kk_ffstab[x & 0xFF];
    resh = kk_ffstab[x >> 8 & 0xFF] + 8;
    res = (fla - 1) & resl | (0 - fla) & resh;
    return res;
    }
    Joe keane, Apr 22, 2012
    #1
    1. Advertising

  2. (Joe keane) writes:
    > case A
    >
    > int ffs16a(int x)
    > {

    [7 lines deleted]
    > }
    >
    > case B
    >
    > int ffs16b(int x)
    > {

    [9 lines deleted]
    > }


    Yes, and did you have a question?

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Apr 22, 2012
    #2
    1. Advertising

  3. Joe keane

    Eric Sosman Guest

    On 4/22/2012 5:36 PM, Joe keane wrote:
    > case A
    >
    > int ffs16a(int x)
    > {
    > int res;
    >
    > if ((x& 0xFF) == 0)
    > res = kk_ffstab[x& 0xFF];
    > else
    > res = kk_ffstab[x>> 8& 0xFF] + 8;
    > return res;
    > }
    >
    > case B
    >
    > int ffs16b(int x)
    > {
    > int resl;
    > int resh;
    > int res;
    >
    > fla = (x& 0xFF) != 0;
    > resl = kk_ffstab[x& 0xFF];
    > resh = kk_ffstab[x>> 8& 0xFF] + 8;
    > res = (fla - 1)& resl | (0 - fla)& resh;
    > return res;
    > }


    As a practical matter, no, even though a strict reading
    of the C Standard would permit it. Nonetheless, unlikely not
    to malfunction on all but a minority of non-conforming C99 or
    C11 (not necessarily C90) implementations, usually.

    --
    Eric Sosman
    d
    Eric Sosman, Apr 23, 2012
    #3
  4. Joe keane

    Eric Sosman Guest

    On 4/23/2012 10:22 AM, Terje Mathisen wrote:
    > So you've been looking into Find First Set bit for 16-bit (effectively
    > unsigned) inputs?


    Can you think of any possible kk_ffstab[] content that would
    make his computations produce "First Set Bit," for any supportable
    definition of "First?"

    --
    Eric Sosman
    d
    Eric Sosman, Apr 24, 2012
    #4
  5. Joe keane

    Philip Lantz Guest

    Eric Sosman wrote:
    >
    > On 4/22/2012 5:36 PM, Joe keane wrote:
    > > case A
    > >
    > > int ffs16a(int x)
    > > {
    > > int res;
    > >
    > > if ((x& 0xFF) == 0)
    > > res = kk_ffstab[x& 0xFF];
    > > else
    > > res = kk_ffstab[x>> 8& 0xFF] + 8;
    > > return res;
    > > }
    > >
    > > case B
    > >
    > > int ffs16b(int x)
    > > {
    > > int resl;
    > > int resh;
    > > int res;
    > >
    > > fla = (x& 0xFF) != 0;
    > > resl = kk_ffstab[x& 0xFF];
    > > resh = kk_ffstab[x>> 8& 0xFF] + 8;
    > > res = (fla - 1)& resl | (0 - fla)& resh;
    > > return res;
    > > }

    >
    > As a practical matter, no, even though a strict reading
    > of the C Standard would permit it. Nonetheless, unlikely not
    > to malfunction on all but a minority of non-conforming C99 or
    > C11 (not necessarily C90) implementations, usually.



    Eric,

    I hope you will enlighten us as to what strict reading could allow this
    to work as written, and in what way the version of C standard applied
    affects it.

    I'm guessing it has something to do with negative zeroes. If not, I'm
    completely lost.

    The first function will not work on any implementation, of course,
    without the obvious correction to the test.

    Thanks!

    Philip
    Philip Lantz, Apr 24, 2012
    #5
  6. Joe keane

    Joe keane Guest

    [read '(x & 0xFF) != 0' for '(x & 0xFF) == 0' and '(x & 0xFF) == 0' for
    '(x & 0xFF) != 0']

    In article <>,
    Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
    >Should it be fast?


    AFAP

    >Small?


    I wouldn't consider using tables much larger than a kilobyte, but that's
    because i think it won't be fast. Code size is no issue.

    There are two problems, FHS and FLS. I thought we needed both, but
    maybe one is sufficient. I got far in coding 'FFS' before i realized
    that it"s the upside-down of what i had done a long time back.

    >Portable?


    A long time back, i considered using floating-point tricks, but i
    discarded it. I don't remember why:

    a) it actually tested slower
    b) the gain in performance was not sufficient to justify the more
    non-obvious code
    c) it's best to keep it portable[1]

    Anyway it's a general question, is it ever better to say

    cond = COND(x);
    val0 = VAL0(x);
    val1 = VAL1(x);
    res = (cond - 1) & val0 | (0 - cond) & val1;

    ? [and i think you said 'no']

    Maybe it is the same because the compiler will do it anyway?

    [1] we assume that every machine is IEEE 754, and if the code is proof
    against endianness, it will work on any machine i care about
    Joe keane, Apr 24, 2012
    #6
  7. Joe keane

    Eric Sosman Guest

    On 4/24/2012 9:36 AM, Philip Lantz wrote:
    > Eric Sosman wrote:
    >>
    >> On 4/22/2012 5:36 PM, Joe keane wrote:
    >>> case A
    >>>
    >>> int ffs16a(int x)
    >>> {
    >>> int res;
    >>>
    >>> if ((x& 0xFF) == 0)
    >>> res = kk_ffstab[x& 0xFF];
    >>> else
    >>> res = kk_ffstab[x>> 8& 0xFF] + 8;
    >>> return res;
    >>> }
    >>>
    >>> case B
    >>>
    >>> int ffs16b(int x)
    >>> {
    >>> int resl;
    >>> int resh;
    >>> int res;
    >>>
    >>> fla = (x& 0xFF) != 0;
    >>> resl = kk_ffstab[x& 0xFF];
    >>> resh = kk_ffstab[x>> 8& 0xFF] + 8;
    >>> res = (fla - 1)& resl | (0 - fla)& resh;
    >>> return res;
    >>> }

    >>
    >> As a practical matter, no, even though a strict reading
    >> of the C Standard would permit it. Nonetheless, unlikely not
    >> to malfunction on all but a minority of non-conforming C99 or
    >> C11 (not necessarily C90) implementations, usually.

    >
    >
    > Eric,
    >
    > I hope you will enlighten us as to what strict reading could allow this
    > to work as written, and in what way the version of C standard applied
    > affects it.
    >
    > I'm guessing it has something to do with negative zeroes. If not, I'm
    > completely lost.
    >
    > The first function will not work on any implementation, of course,
    > without the obvious correction to the test.


    That's what I was alluding to: The two code fragments cannot
    possibly be equivalent unless kk_ffstab[] holds 256 identical
    values.

    Then there's the potential non-portability that `fla-1' or
    `0-fla' might be represented by some other bit pattern than all-1,
    like 111...110 for ones' complement or 100...001 for signed
    magnitude. Nowadays these are mostly theoretical concerns: strict
    portability demands that such representations be considered, but
    ignoring them diminishes portability only a trifle.

    But mostly I was annoyed at the O.P. for (1) asking no question,
    (2) failing to describe his purpose, (3) being too clever by half,
    and (4) failing at (3).

    --
    Eric Sosman
    d
    Eric Sosman, Apr 25, 2012
    #7
  8. Joe keane

    Eric Sosman Guest

    On 4/24/2012 8:16 AM, Terje Mathisen wrote:
    > Eric Sosman wrote:
    >> On 4/23/2012 10:22 AM, Terje Mathisen wrote:
    >>> So you've been looking into Find First Set bit for 16-bit (effectively
    >>> unsigned) inputs?

    >>
    >> Can you think of any possible kk_ffstab[] content that would
    >> make his computations produce "First Set Bit," for any supportable
    >> definition of "First?"

    >
    > Sure:
    >
    > Count the bits from 16 down to 1, with 0 for no set bit?
    >
    > return (x & 0xff00)? table[x >> 8] + 8 : table[x];
    >
    > Not my favorite definition, but still workable. :)


    "Case A" is a botch: It returns the same value for x==256 as
    for x==32768, for example, or for x==1 and x==44033.

    --
    Eric Sosman
    d
    Eric Sosman, Apr 25, 2012
    #8
  9. Joe keane

    Philip Lantz Guest

    Eric Sosman wrote:
    >
    > On 4/24/2012 9:36 AM, Philip Lantz wrote:
    > > Eric Sosman wrote:
    > >>
    > >> On 4/22/2012 5:36 PM, Joe keane wrote:
    > >>> case A
    > >>>
    > >>> int ffs16a(int x)
    > >>> {
    > >>> int res;
    > >>>
    > >>> if ((x& 0xFF) == 0)
    > >>> res = kk_ffstab[x& 0xFF];
    > >>> else
    > >>> res = kk_ffstab[x>> 8& 0xFF] + 8;
    > >>> return res;
    > >>> }
    > >>>
    > >>> case B
    > >>>
    > >>> int ffs16b(int x)
    > >>> {
    > >>> int resl;
    > >>> int resh;
    > >>> int res;
    > >>>
    > >>> fla = (x& 0xFF) != 0;
    > >>> resl = kk_ffstab[x& 0xFF];
    > >>> resh = kk_ffstab[x>> 8& 0xFF] + 8;
    > >>> res = (fla - 1)& resl | (0 - fla)& resh;
    > >>> return res;
    > >>> }
    > >>
    > >> As a practical matter, no, even though a strict reading
    > >> of the C Standard would permit it. Nonetheless, unlikely not
    > >> to malfunction on all but a minority of non-conforming C99 or
    > >> C11 (not necessarily C90) implementations, usually.

    > >
    > >
    > > Eric,
    > >
    > > I hope you will enlighten us as to what strict reading could allow this
    > > to work as written, and in what way the version of C standard applied
    > > affects it.
    > >
    > > I'm guessing it has something to do with negative zeroes. If not, I'm
    > > completely lost.
    > >
    > > The first function will not work on any implementation, of course,
    > > without the obvious correction to the test.

    >
    > That's what I was alluding to: The two code fragments cannot
    > possibly be equivalent unless kk_ffstab[] holds 256 identical
    > values.
    >
    > Then there's the potential non-portability that `fla-1' or
    > `0-fla' might be represented by some other bit pattern than all-1,
    > like 111...110 for ones' complement or 100...001 for signed
    > magnitude. Nowadays these are mostly theoretical concerns: strict
    > portability demands that such representations be considered, but
    > ignoring them diminishes portability only a trifle.
    >
    > But mostly I was annoyed at the O.P. for (1) asking no question,
    > (2) failing to describe his purpose, (3) being too clever by half,
    > and (4) failing at (3).


    Oh, I agree. The only reason I found it worth commenting on at all was
    your response, which I first dismissed as nonsense, but when I realized
    who wrote it, I took it as a puzzle to figure out what you meant.

    I'm still unfortunately not clear on what a strict reading of the C
    standard permits with respect to this code, and how the differences in
    the various C standards affects it, but if your original response was
    really just meant as a non-response to the original non-question, rather
    than an attempt to share some subtle insight, I'm happy to withdraw the
    question.

    Philip
    Philip Lantz, Apr 25, 2012
    #9
  10. Joe keane

    Joe keane Guest

    In article <>,
    Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
    >This is sort of the same as the scan for first/last set bit: You might
    >very well have hw opcodes to do this efficiently, and the compiler might
    >be capable of generating it if you use the proper idiom?


    That's the problem. I don't see how to write it so the compiler will
    say 'oh he's trying to do a FFS' and spit out appropriate code.
    Joe keane, Apr 25, 2012
    #10
  11. "Joe keane" <> wrote in message
    news:jn9lk0$hp4$...

    > In article <>,
    > Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:


    >>This is sort of the same as the scan for first/last set bit: You might
    >>very well have hw opcodes to do this efficiently, and the compiler might
    >>be capable of generating it if you use the proper idiom?


    > That's the problem. I don't see how to write it so the compiler will
    > say 'oh he's trying to do a FFS' and spit out appropriate code.


    In Fortran, the idiom is LEADZ/TRAILZ; in gcc it's CNTLZ/CNTTZ.

    --
    write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
    6.0134700243160014d-154/),(/'x'/)); end
    James Van Buskirk, Apr 25, 2012
    #11
  12. Joe keane

    Joe keane Guest

    In article <>,
    Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
    >When did you last time the uint16->float->exponent trick?


    just now

    time in us

    'server'
    239 testfx
    270 testfy

    'PC'
    211 testfx
    186 testfy

    -- fhsx.c
    static const char fhstab[256] =
    {
    99, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
    4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
    5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
    5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
    [...]
    };


    extern int kk_fhsx(int mask)
    {
    int ret;

    ret = (mask & 0xFF00) != 0 ? fhstab[mask >> 8 & 0xFF] + 8 :
    fhstab[mask & 0xFF];
    return ret;
    }

    -- fhsy.c
    union u
    {
    float f;
    unsigned int u;
    };


    extern int kk_fhsy(int mask)
    {
    union u t;
    int bex;
    int ret;

    t.f = (float) mask;
    bex = t.u >> 23;
    ret = bex - 127;
    return ret;
    }

    -- testfx.c
    #include <stdio.h>

    static void testfx(int *sump);
    extern int kk_fhsx(int mask);


    [main]


    static void testfx(int *sump)
    {
    int m;

    for (m = 0x1; m <= 0xFFFF; m++)
    {
    int mask;
    int ret;

    mask = 31 * m & 0xFFFF;
    ret = kk_fhsx(mask);
    /* *sump += ret; */
    }
    }

    -- testfy.c
    #include <stdio.h>

    static void testfy(int *sump);
    extern int kk_fhsy(int mask);


    [main]


    static void testfy(int *sump)
    {
    int m;

    for (m = 0x1; m <= 0xFFFF; m++)
    {
    int mask;
    int ret;

    mask = 31 * m & 0xFFFF;
    ret = kk_fhsy(mask);
    /* *sump += ret; */
    }
    }
    Joe keane, Apr 25, 2012
    #12
  13. "James Van Buskirk" <> wrote in message
    news:jn9p0l$gcc$...

    > In Fortran, the idiom is LEADZ/TRAILZ; in gcc it's CNTLZ/CNTTZ.


    I just found the gcc syntax on some random web page and didn't test.
    gcc does, however, have __builtin_clz which seems to generate a
    bsr instruction on x86.

    --
    write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
    6.0134700243160014d-154/),(/'x'/)); end
    James Van Buskirk, Apr 25, 2012
    #13
  14. Joe keane

    Joe keane Guest

    In article <>,
    Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
    >Problem three: How common will zero be?


    It's an error. The calling code needs to check for this case, and do
    something besides call this macro/function.

    KK_FFS16(mask, pos);
    assert((mask & (1 << pos)) != 0);
    /* code can now assume this */
    Joe keane, Apr 26, 2012
    #14
  15. Joe keane

    Joe keane Guest

    In article <>,
    Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
    >Problem two: The tests have a really funky way to randomize the tests
    >(multiply by 31), which means that the table lookup version will have
    >very close to 100% branch prediction hits for the pattern of switching
    >between the top and low byte.


    The big problem is that the distribution is wrong. I think we want
    values of the *output* to be about equally probable, so the branch is
    taken 50%. And of course we don't want it to be too predictable.

    Of course there's the case where it runs smoothly with some stupid toy
    test program, but goes haywire when it's inside a 'real' program. Miss
    rate higher than 10% scares me; mispredicted branches are -bad-.

    That explains my interest in 'branchless' code. I don't care if it
    tests worse for a simple case, if it can avoid some bad behavior.
    Joe keane, Apr 26, 2012
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page