case A int ffs16a(int x) { int res; if ((x & 0xFF) == 0) res = kk_ffstab[x & 0xFF]; else res = kk_ffstab[x >> 8 & 0xFF] + 8; return res; } case B int ffs16b(int x) { int resl; int resh; int res; fla = (x & 0xFF) != 0; resl = kk_ffstab[x & 0xFF]; resh = kk_ffstab[x >> 8 & 0xFF] + 8; res = (fla - 1) & resl | (0 - fla) & resh; return res; }

(Joe keane) writes: > case A > > int ffs16a(int x) > { [7 lines deleted] > } > > case B > > int ffs16b(int x) > { [9 lines deleted] > } Yes, and did you have a question? -- Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst> Will write code for food. "We must do something. This is something. Therefore, we must do this." -- Antony Jay and Jonathan Lynn, "Yes Minister"

On 4/22/2012 5:36 PM, Joe keane wrote: > case A > > int ffs16a(int x) > { > int res; > > if ((x& 0xFF) == 0) > res = kk_ffstab[x& 0xFF]; > else > res = kk_ffstab[x>> 8& 0xFF] + 8; > return res; > } > > case B > > int ffs16b(int x) > { > int resl; > int resh; > int res; > > fla = (x& 0xFF) != 0; > resl = kk_ffstab[x& 0xFF]; > resh = kk_ffstab[x>> 8& 0xFF] + 8; > res = (fla - 1)& resl | (0 - fla)& resh; > return res; > } As a practical matter, no, even though a strict reading of the C Standard would permit it. Nonetheless, unlikely not to malfunction on all but a minority of non-conforming C99 or C11 (not necessarily C90) implementations, usually. -- Eric Sosman d

On 4/23/2012 10:22 AM, Terje Mathisen wrote: > So you've been looking into Find First Set bit for 16-bit (effectively > unsigned) inputs? Can you think of any possible kk_ffstab[] content that would make his computations produce "First Set Bit," for any supportable definition of "First?" -- Eric Sosman d

Eric Sosman wrote: > > On 4/22/2012 5:36 PM, Joe keane wrote: > > case A > > > > int ffs16a(int x) > > { > > int res; > > > > if ((x& 0xFF) == 0) > > res = kk_ffstab[x& 0xFF]; > > else > > res = kk_ffstab[x>> 8& 0xFF] + 8; > > return res; > > } > > > > case B > > > > int ffs16b(int x) > > { > > int resl; > > int resh; > > int res; > > > > fla = (x& 0xFF) != 0; > > resl = kk_ffstab[x& 0xFF]; > > resh = kk_ffstab[x>> 8& 0xFF] + 8; > > res = (fla - 1)& resl | (0 - fla)& resh; > > return res; > > } > > As a practical matter, no, even though a strict reading > of the C Standard would permit it. Nonetheless, unlikely not > to malfunction on all but a minority of non-conforming C99 or > C11 (not necessarily C90) implementations, usually. Eric, I hope you will enlighten us as to what strict reading could allow this to work as written, and in what way the version of C standard applied affects it. I'm guessing it has something to do with negative zeroes. If not, I'm completely lost. The first function will not work on any implementation, of course, without the obvious correction to the test. Thanks! Philip

[read '(x & 0xFF) != 0' for '(x & 0xFF) == 0' and '(x & 0xFF) == 0' for '(x & 0xFF) != 0'] In article <>, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: >Should it be fast? AFAP >Small? I wouldn't consider using tables much larger than a kilobyte, but that's because i think it won't be fast. Code size is no issue. There are two problems, FHS and FLS. I thought we needed both, but maybe one is sufficient. I got far in coding 'FFS' before i realized that it"s the upside-down of what i had done a long time back. >Portable? A long time back, i considered using floating-point tricks, but i discarded it. I don't remember why: a) it actually tested slower b) the gain in performance was not sufficient to justify the more non-obvious code c) it's best to keep it portable[1] Anyway it's a general question, is it ever better to say cond = COND(x); val0 = VAL0(x); val1 = VAL1(x); res = (cond - 1) & val0 | (0 - cond) & val1; ? [and i think you said 'no'] Maybe it is the same because the compiler will do it anyway? [1] we assume that every machine is IEEE 754, and if the code is proof against endianness, it will work on any machine i care about

On 4/24/2012 9:36 AM, Philip Lantz wrote: > Eric Sosman wrote: >> >> On 4/22/2012 5:36 PM, Joe keane wrote: >>> case A >>> >>> int ffs16a(int x) >>> { >>> int res; >>> >>> if ((x& 0xFF) == 0) >>> res = kk_ffstab[x& 0xFF]; >>> else >>> res = kk_ffstab[x>> 8& 0xFF] + 8; >>> return res; >>> } >>> >>> case B >>> >>> int ffs16b(int x) >>> { >>> int resl; >>> int resh; >>> int res; >>> >>> fla = (x& 0xFF) != 0; >>> resl = kk_ffstab[x& 0xFF]; >>> resh = kk_ffstab[x>> 8& 0xFF] + 8; >>> res = (fla - 1)& resl | (0 - fla)& resh; >>> return res; >>> } >> >> As a practical matter, no, even though a strict reading >> of the C Standard would permit it. Nonetheless, unlikely not >> to malfunction on all but a minority of non-conforming C99 or >> C11 (not necessarily C90) implementations, usually. > > > Eric, > > I hope you will enlighten us as to what strict reading could allow this > to work as written, and in what way the version of C standard applied > affects it. > > I'm guessing it has something to do with negative zeroes. If not, I'm > completely lost. > > The first function will not work on any implementation, of course, > without the obvious correction to the test. That's what I was alluding to: The two code fragments cannot possibly be equivalent unless kk_ffstab[] holds 256 identical values. Then there's the potential non-portability that `fla-1' or `0-fla' might be represented by some other bit pattern than all-1, like 111...110 for ones' complement or 100...001 for signed magnitude. Nowadays these are mostly theoretical concerns: strict portability demands that such representations be considered, but ignoring them diminishes portability only a trifle. But mostly I was annoyed at the O.P. for (1) asking no question, (2) failing to describe his purpose, (3) being too clever by half, and (4) failing at (3). -- Eric Sosman d

On 4/24/2012 8:16 AM, Terje Mathisen wrote: > Eric Sosman wrote: >> On 4/23/2012 10:22 AM, Terje Mathisen wrote: >>> So you've been looking into Find First Set bit for 16-bit (effectively >>> unsigned) inputs? >> >> Can you think of any possible kk_ffstab[] content that would >> make his computations produce "First Set Bit," for any supportable >> definition of "First?" > > Sure: > > Count the bits from 16 down to 1, with 0 for no set bit? > > return (x & 0xff00)? table[x >> 8] + 8 : table[x]; > > Not my favorite definition, but still workable. "Case A" is a botch: It returns the same value for x==256 as for x==32768, for example, or for x==1 and x==44033. -- Eric Sosman d

Eric Sosman wrote: > > On 4/24/2012 9:36 AM, Philip Lantz wrote: > > Eric Sosman wrote: > >> > >> On 4/22/2012 5:36 PM, Joe keane wrote: > >>> case A > >>> > >>> int ffs16a(int x) > >>> { > >>> int res; > >>> > >>> if ((x& 0xFF) == 0) > >>> res = kk_ffstab[x& 0xFF]; > >>> else > >>> res = kk_ffstab[x>> 8& 0xFF] + 8; > >>> return res; > >>> } > >>> > >>> case B > >>> > >>> int ffs16b(int x) > >>> { > >>> int resl; > >>> int resh; > >>> int res; > >>> > >>> fla = (x& 0xFF) != 0; > >>> resl = kk_ffstab[x& 0xFF]; > >>> resh = kk_ffstab[x>> 8& 0xFF] + 8; > >>> res = (fla - 1)& resl | (0 - fla)& resh; > >>> return res; > >>> } > >> > >> As a practical matter, no, even though a strict reading > >> of the C Standard would permit it. Nonetheless, unlikely not > >> to malfunction on all but a minority of non-conforming C99 or > >> C11 (not necessarily C90) implementations, usually. > > > > > > Eric, > > > > I hope you will enlighten us as to what strict reading could allow this > > to work as written, and in what way the version of C standard applied > > affects it. > > > > I'm guessing it has something to do with negative zeroes. If not, I'm > > completely lost. > > > > The first function will not work on any implementation, of course, > > without the obvious correction to the test. > > That's what I was alluding to: The two code fragments cannot > possibly be equivalent unless kk_ffstab[] holds 256 identical > values. > > Then there's the potential non-portability that `fla-1' or > `0-fla' might be represented by some other bit pattern than all-1, > like 111...110 for ones' complement or 100...001 for signed > magnitude. Nowadays these are mostly theoretical concerns: strict > portability demands that such representations be considered, but > ignoring them diminishes portability only a trifle. > > But mostly I was annoyed at the O.P. for (1) asking no question, > (2) failing to describe his purpose, (3) being too clever by half, > and (4) failing at (3). Oh, I agree. The only reason I found it worth commenting on at all was your response, which I first dismissed as nonsense, but when I realized who wrote it, I took it as a puzzle to figure out what you meant. I'm still unfortunately not clear on what a strict reading of the C standard permits with respect to this code, and how the differences in the various C standards affects it, but if your original response was really just meant as a non-response to the original non-question, rather than an attempt to share some subtle insight, I'm happy to withdraw the question. Philip

In article <>, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: >This is sort of the same as the scan for first/last set bit: You might >very well have hw opcodes to do this efficiently, and the compiler might >be capable of generating it if you use the proper idiom? That's the problem. I don't see how to write it so the compiler will say 'oh he's trying to do a FFS' and spit out appropriate code.

"Joe keane" <> wrote in message news:jn9lk0$hp4$... > In article <>, > Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: >>This is sort of the same as the scan for first/last set bit: You might >>very well have hw opcodes to do this efficiently, and the compiler might >>be capable of generating it if you use the proper idiom? > That's the problem. I don't see how to write it so the compiler will > say 'oh he's trying to do a FFS' and spit out appropriate code. In Fortran, the idiom is LEADZ/TRAILZ; in gcc it's CNTLZ/CNTTZ. -- write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, & 6.0134700243160014d-154/),(/'x'/)); end

In article <>, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: >When did you last time the uint16->float->exponent trick? just now time in us 'server' 239 testfx 270 testfy 'PC' 211 testfx 186 testfy -- fhsx.c static const char fhstab[256] = { 99, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, [...] }; extern int kk_fhsx(int mask) { int ret; ret = (mask & 0xFF00) != 0 ? fhstab[mask >> 8 & 0xFF] + 8 : fhstab[mask & 0xFF]; return ret; } -- fhsy.c union u { float f; unsigned int u; }; extern int kk_fhsy(int mask) { union u t; int bex; int ret; t.f = (float) mask; bex = t.u >> 23; ret = bex - 127; return ret; } -- testfx.c #include <stdio.h> static void testfx(int *sump); extern int kk_fhsx(int mask); [main] static void testfx(int *sump) { int m; for (m = 0x1; m <= 0xFFFF; m++) { int mask; int ret; mask = 31 * m & 0xFFFF; ret = kk_fhsx(mask); /* *sump += ret; */ } } -- testfy.c #include <stdio.h> static void testfy(int *sump); extern int kk_fhsy(int mask); [main] static void testfy(int *sump) { int m; for (m = 0x1; m <= 0xFFFF; m++) { int mask; int ret; mask = 31 * m & 0xFFFF; ret = kk_fhsy(mask); /* *sump += ret; */ } }

"James Van Buskirk" <> wrote in message news:jn9p0l$gcc$... > In Fortran, the idiom is LEADZ/TRAILZ; in gcc it's CNTLZ/CNTTZ. I just found the gcc syntax on some random web page and didn't test. gcc does, however, have __builtin_clz which seems to generate a bsr instruction on x86. -- write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, & 6.0134700243160014d-154/),(/'x'/)); end

In article <>, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: >Problem three: How common will zero be? It's an error. The calling code needs to check for this case, and do something besides call this macro/function. KK_FFS16(mask, pos); assert((mask & (1 << pos)) != 0); /* code can now assume this */

In article <>, Terje Mathisen <"terje.mathisen at tmsw.no"> wrote: >Problem two: The tests have a really funky way to randomize the tests >(multiply by 31), which means that the table lookup version will have >very close to 100% branch prediction hits for the pattern of switching >between the top and low byte. The big problem is that the distribution is wrong. I think we want values of the *output* to be about equally probable, so the branch is taken 50%. And of course we don't want it to be too predictable. Of course there's the case where it runs smoothly with some stupid toy test program, but goes haywire when it's inside a 'real' program. Miss rate higher than 10% scares me; mispredicted branches are -bad-. That explains my interest in 'branchless' code. I don't care if it tests worse for a simple case, if it can avoid some bad behavior.