When shorts are longer than longs !

G

Guest

no

I've always thought it a shame that C never got BCPL's SLCT and OF
operators.  SLCT len:shift:eek:ffset creates a values that can be stored,
passed about etc, bit can then be used to access and/or set the
designated sub-portion of a word (BCPL is typeless) using the OF
operator.

A C version would, I think, be even more useful that the BCPL one
since the type of what would be the pointer argument to OF could be
used to make it more flexible.  Maybe it should be proposed! :)

you just can't beat CORAL-66

INTEGER joe;
INTEGER fred;

joe := 13721;
BITS[12,5]fred := BITS[12,0]joe;
 
B

BartC

no

I've always thought it a shame that C never got BCPL's SLCT and OF
operators. SLCT len:shift:eek:ffset creates a values that can be stored,
passed about etc, bit can then be used to access and/or set the
designated sub-portion of a word (BCPL is typeless) using the OF
operator.

A C version would, I think, be even more useful that the BCPL one
since the type of what would be the pointer argument to OF could be
used to make it more flexible. Maybe it should be proposed! :)

you just can't beat CORAL-66

INTEGER joe;
INTEGER fred;

joe := 13721;
BITS[12,5]fred := BITS[12,0]joe;

Meaning what exactly? Copy 12 bits starting from bit 0 of joe to bit 5 of
fred?

I use a syntax like this:

fred.[5..16] := joe.[0..11]
 
E

Eric Sosman

because they have a list of portability problems as long as your
arm.

That seems a little strong. If you use a bit-field
simply as a struct element that can hold an integer value,
the only portability problems I can think of are the issue
of how wide a bit-field can be (a "portability problem"
shared with `int'), and the question of whether an unadorned
bit-field is or isn't signed (easily avoided by writing `signed'
or `unsigned' explicitly).

The much-bemoaned "portability problems" crop up when
you try to use bit-fields to match a chosen representation.
It can be done (usually), but the C source to do it relies on
details that may differ between compilers, and that means the
code isn't portable even though it may be efficacious. But
if you simply need a seven-bit signed integer and don't care
what it looks like in memory, there's no portability problem.
 
J

James Kuyper

Ian Collins wrote:
....
Why do some people get so hung up on bitfields? Everywhere I've used
used them, the alternative was ugly code.

Bitfields are very convenient, when portability considerations don't
rule out their use. Unfortunately, portability considerations often rule
out their use, at least in my experience. Ugly code that actually works
everywhere it's supposed to work is better than pretty code that fails
catastrophically on some platforms that I need my code to be able to run on.
 
J

James Kuyper

Eric said:
That seems a little strong. If you use a bit-field
simply as a struct element that can hold an integer value,
the only portability problems I can think of are the issue
of how wide a bit-field can be (a "portability problem"
shared with `int'), and the question of whether an unadorned
bit-field is or isn't signed (easily avoided by writing `signed'
or `unsigned' explicitly).

Other portability problems include:
Not knowing which byte of the struct object actually contains the bit field.
Not knowing whether or not the bit-field is spread over multiple bytes
(not necessarily adjacent).
Not knowing which bits within that byte contain the bit-field.
The much-bemoaned "portability problems" crop up when
you try to use bit-fields to match a chosen representation.

In my experience, that is by far the most common reason for wanting to
use bit-fields (and being sorry that they are unsuitable for that task).
 
E

Eric Sosman

James said:
Other portability problems include:
Not knowing which byte of the struct object actually contains the bit
field.

... which only matters if you care about the representation
in addition to the value.
Not knowing whether or not the bit-field is spread over multiple bytes
(not necessarily adjacent).

... which only matters if you care about the representation
in addition to the value.
Not knowing which bits within that byte contain the bit-field.

... which only matters if you care about the representation
in addition to the value.
In my experience, that is by far the most common reason for wanting to
use bit-fields (and being sorry that they are unsuitable for that task).

They're suitable in the context of a particular compiler
and/or ABI, and quite often that's enough. Just don't get
fooled into thinking that this particular use is portable.

The use of bit-fields as a space-saver, although portable,
has its own drawbacks. You'll save a substantial amount only
if you have a fairly large number of bit-fields, and then you
run up against the fact that you can't form pointers to them
or make arrays of them. I mean, who wants to write a Gawdawful
big `switch' to access one of forty-two five-bit integers?
That's the main reason I find bit-fields impotent, and seldom
use them.
 
J

jameskuyper

Eric said:
... which only matters if you care about the representation
in addition to the value.

Which, in my experience, is usually the case when desiring to use bit-
fields.
They're suitable in the context of a particular compiler
and/or ABI, and quite often that's enough. Just don't get
fooled into thinking that this particular use is portable.

I almost never write code for which that exception would apply -
though I understand that many people do.
 
B

Ben Bacarisse

no

I've always thought it a shame that C never got BCPL's SLCT and OF
operators.  SLCT len:shift:eek:ffset creates a values that can be stored,
passed about etc, bit can then be used to access and/or set the
designated sub-portion of a word (BCPL is typeless) using the OF
operator.

A C version would, I think, be even more useful that the BCPL one
since the type of what would be the pointer argument to OF could be
used to make it more flexible.  Maybe it should be proposed! :)

you just can't beat CORAL-66

INTEGER joe;
INTEGER fred;

joe := 13721;
BITS[12,5]fred := BITS[12,0]joe;

Can the "selector" -- the BITS[12,5] -- be passed to function or
stored in a table? In the spirit of full disclosure, BCPL's SLCT
values must be what BCPL called manifest constants -- constant
expressions in effect -- so there are probably advantages and
disadvantages to both.
 
A

Antoninus Twink

I mean, who wants to write a Gawdawful big `switch' to access one of
forty-two five-bit integers?

That would certainly be a waste of time, since at most thirty-two of
them would be pairwise distinct.
 
I

Ian Collins

James said:
Ian Collins wrote:
....

Bitfields are very convenient, when portability considerations don't
rule out their use. Unfortunately, portability considerations often rule
out their use, at least in my experience.

I guess mine is different. Most places I've used bitfields are in
drivers, which tend not to be portable.
 
J

James Dow Allen

But if you feel like posting a better, faster version, fabulous! Do
it! That's precisely the kind of thing this group is for.

and in a follow up he wrote
Nevertheless! Thanks for playing. Seriously. Just because /I/ found
your code hard to read, that doesn't mean others will, and in any
case I found your willingness to share to be constructive and
helpful.

Like Richard, I would like to see examples of *good* code
in this ng. Don't know if my effort was "good" or "bad",
but in the latter case *someone* should post an improved version.


Recently we saw that c.l.c'ers have wide agreement on
one issue: Don't bother with bitfields!

But there *is* a need for "hand-rolled" bitfielding
macros or functions, i.e. getbits() and setbits().
These *should* be standard library functions.

In data compression, *many* statistical coding methods,
including Huffman, quasi-arithmetic, Golomb- and Elias-based
coding (but *not* arithmetic coding) use variable-length bit
strings, *cannot* use C bitfields in any straightforward
way even if they wanted to, and *must* use something like
getbits()/setbits().

As Mr. Sosman points out, bitfields are needed for certain
large memory-efficient tables, but even then "hand-rolled
bitfields" can be used, especially if one agrees that
tp[ix]->u_fb.ufb.foo_bar = FB_GREEN;
isn't really much more readable than
setfoobar(tp, ix, FB_GREEN);
(Here, setfoobar() might be a header-defined macro
which simply invokes the library setbits() routine.)

There are many expert programmers posting and lurking in
this newsgroup. I propose that this ng collectively
fashion model implementations of getbits() and setbits().
(Surely these routines are simple enough that any worry
of invention confidentiality needn't apply.)

Goals should include portability, standard compliance, speed,
and readability, probably in that order. The definition of
these routines is *very* straightforward, for example:

unsigned long getbits(unsigned long *p, int bcnt, int boff)

getbits returns, with zero-padding on the left, the
bcnt bits beginning boff bits from *p. bcnt can equal,
but not exceed the number of bits in 'unsigned long'.

Write your own implementation, or start with mine, identifying
its compliance and speed flaws. "Brownie points" to all who
contribute, and gold stars to all whose source-code fragment,
however small, ends up in the model implementation !

After my implementation (slightly improved compared to last
week's version) I'll make some brief comments on its (lack of)
"readability."

* * * * * * * * * * * * * *

/* Next two normally arrive via #includes */
#define CHAR_BIT 8
typedef unsigned long Bgroup;

#define BITS_BG (CHAR_BIT * sizeof(Bgroup))
#define B_One ((Bgroup)1)

#define SET_LBITS(p, k, v) \
(*(p) = *(p) & ((B_One << BITS_BG - (k)) - 1) \
| (v) << BITS_BG - (k))

#define SET_GBITS(p, k, n, v) \
(*(p) = *(p) & (~((B_One << BITS_BG - (n)) \
- (B_One << BITS_BG - (n) - (k)))) \
| (v) << BITS_BG - (n) - (k))

/* set |bcnt| bits to |datum|, offset |boff| bits from *|p| */
void setbits(Bgroup *p, int bcnt, int boff, Bgroup datum)
{
int rbit;

p += boff / BITS_BG;
boff %= BITS_BG;
rbit = bcnt + boff;
if (rbit > BITS_BG) {
/* Set p[0] rightside and p[1] leftside */
SET_GBITS(p, BITS_BG - boff, boff,
datum >> rbit - BITS_BG);
SET_LBITS(p + 1, rbit - BITS_BG, datum);
} else if (boff) {
/* Set p[0] */
SET_GBITS(p, bcnt, boff, datum);
} else {
/* Set p[0] leftside */
SET_LBITS(p, bcnt, datum);
}
}

/* return the |bcnt| bits at offset |boff| bits from *|p| */
Bgroup getbits(Bgroup *p, int bcnt, int boff)
{
int rbit;
Bgroup x;

p += boff / BITS_BG;
rbit = bcnt + boff % BITS_BG;
if (rbit > BITS_BG) {
x = p[0] << rbit - BITS_BG
| p[1] >> 2 * BITS_BG - rbit;
} else {
x = p[0] >> BITS_BG - rbit;
}
if (bcnt < BITS_BG) {
x &= (B_One << bcnt) - 1;
}
return x;
}


* * * * * * * * * * * * * * *

I found your code very hard going. No comments to speak of, cryptic
identifiers, overly-complex expressions.

I agree the code seems poor for readability, and *specific
changes* to address that are certainly of interest; however,
I doubt the code is really atypical for macros of this ilk.

I will certainly concede that *readability* is
*not* a general virtue of James Allen coding! :)
I won't bore you by making excuses, e.g.
I *did* make some effort to write getbits()/setbits()
"properly." I *hope* the posted code works with
Big-Endian, 1's-complement, etc.
but I tested it *only* on one Intel-based laptop.

(I do assume that the number 2*BITS_BG fits in 'int',
so the code may fail if, e.g., Bgroup is 64 bits and
int just 7 bits+sign.)

BTW, why don't good compilers offer, to aid
verification of portability, command-line options to
change sizeof(int), Endianness, etc.?
Isn't this asking for trouble if unsigned long contains padding
bits?

Well, now you've got me wondering! If it fails
for me, I'll venture it fails for many others
who have used similar idioms.

Call me simple-minded but I learn best by starting
with specific examples. (See quote from the great
George Boole in .sig below.) Can you give a specific
example of the possible problem in that #define ?
Perhaps there's a good
idea in there somewhere - I hope and trust that there is - but I
couldn't find it.

Don't know about any "good idea." The code is nothing more nor
less than what it claims to be, and I'd venture that many of
the programmers posting or lurking here would code up something
very similar. The functions need shifts, masks, bitwise booleans,
along with simple arithmetic to construct shift counts, and my
code does that (and, I hope, does it in a proper order :).


I considered, but quickly rejected, a faster but unportable
version of these routines: The bits which are get or set
may cross a 32-bit boundary, but that time-consuming
inconvenience could be avoided on many architectures by
replacing
Uint32 *p;
p += boff / BITS_BG;
boff %= BITS_BG;
with
Uint32 *p;
/* next is noncompliant and nonportable */
p = (Uint16 *)p + boff / (BITS_BG / 2);
boff %= BITS_BG / 2;

* * * * * * * * * * * * * *

Although readability is generally a weak point in
James Allen code, I will argue that my code is
sometimes misjudged:

A 10-line routine which takes as long to understand
as a 40-line routine is *not* a readability loss, if
the 10-line routine replaces the 40-line routine, but
often rough readability assessments are made without
making the relevant comparison.
I must try to avoid blowing my own horn, but consider
http://james.fabpedigree.com/wnim.htm
as a specific example of this last point. That program
may help show why I find programming to be fun.
I threw it together in less than two hours in response
to a rec.puzzles puzzle. c.l.c'ers will denounce this
program gleefully, but "proof of the pudding" is that
no one else solved the rec.puzzles puzzle, and no one
has ever bothered to improve on my 2-hour programming
effort.

James Dow Allen
 
G

Guest

you just can't beat CORAL-66
INTEGER joe;
INTEGER fred;
joe := 13721;
BITS[12,5]fred := BITS[12,0]joe;

Meaning what exactly? Copy 12 bits starting from bit 0 of joe to bit 5 of
fred?

yes. I used this as a reference
www.xgc.com/manuals/xgc-c66-rm/c723.html
(it being so long since I wrote Coral I wasn't certain of the syntax)

section 6.1.1.2.2. Part-Words
....
BITS[ Totalbits , Bitposition]
....

As I think its a fairly neat syntax I copy it as closely
as I can if I implement setbits() getbits in C

I use a syntax like this:

fred.[5..16] := joe.[0..11]

ah, but mine was based on a real (though old) language. Coral
still runs on real live applications.
 
G

Guest

you just can't beat CORAL-66
INTEGER joe;
INTEGER fred;
joe := 13721;
BITS[12,5]fred := BITS[12,0]joe;

Can the "selector" -- the BITS[12,5] -- be passed to function or
stored in a table?

Do you mean like some sort of function or operator pointer?
So it can be retrieved and applied later?
I don't think so. It's not really in Coral's spirit.
It was used in embedded system's and they weren't
trying to be *that* clever.

 In the spirit of full disclosure, BCPL's SLCT
values must be what BCPL called manifest constants -- constant
expressions in effect -- so there are probably advantages and
disadvantages to both.

Coral's had to be hard coded values.
 
B

BartC

(e-mail address removed) wrote:
you just can't beat CORAL-66
INTEGER joe;
INTEGER fred;
joe := 13721;
BITS[12,5]fred := BITS[12,0]joe;

Meaning what exactly? Copy 12 bits starting from bit 0 of joe to bit 5 of
fred?

yes. I used this as a reference
www.xgc.com/manuals/xgc-c66-rm/c723.html
(it being so long since I wrote Coral I wasn't certain of the syntax)

section 6.1.1.2.2. Part-Words
...
BITS[ Totalbits , Bitposition]
...

As I think its a fairly neat syntax I copy it as closely
as I can if I implement setbits() getbits in C

I use a syntax like this:

fred.[5..16] := joe.[0..11]

ah, but mine was based on a real (though old) language. Coral
still runs on real live applications.

My syntax is also based on a real (though not public) language of mine.
Can't quite remember where I pinched it from though.
 
J

James Dow Allen

Integer types other than unsigned chars are allowed to contain unused
padding bits, that do not contribute to the value (*).

Thank you very much for your reply! This is an area of C
about which I knew *nothing*. :-(
(It may have been mentioned in this ng in threads I've read,
but without the clear examples that work for me.)

To see if I understand, let me present a specific hypothetical
example; tell me if I'm on the right track.

A machine (which machine is it, by the way?) has 9-bit bytes,
4-byte integers, 9-bit arithmetic on chars, but only
32-bit arithmetic on ints, 4 bits being completely
invisible (unless you access their storage using a *(char *)
construct). Is this about right?

On such a machine, it would be a dilemma whether to
waste 1/9 of the bits for a time efficiency, but suppose
I did. Then I would want something like
#define BITS_IN_INT 32
and code that line myself if standard headers didn't
provide it.
I seem to remember threads from way back trying to come up with portable
macros for determining the width of an unsigned type at compile time.

Doesn't this "prove" that BITS_IN_INT etc. should be added
to the appropriate standard headers?
For some reason shifting by the width of a type is undefined behaviour.

<useless anecdote>
In code like
int a, b = 32 + 7;
foo(a << b);
some machines will end up with (a << 7).
I remember this allows a tiny micro-optimization
in bit-fielding code similar to that under discussion!
</useless anecdote>

James
 
J

jameskuyper

Eric Sosman wrote:
....
- The segment-and-offset addressing of the 8086 and its
compatible followers permit multiple representations for
the same memory address. I don't know whether any C
implementations took advantage of this, but if they did
you could have two pointers that compared equal but
looked different to memcmp().

As I remember it, I've used one that did, but that was a long time
ago, so I could be mistaken. However, even if I'm remembering
incorrectly and it actually made no use of that option, I'm sure that
it did give the users all the tools they needed to write their own
code which could take advantage of that possibility. This included
extensions such as:

a) Macros for extracting the segment and offset from a pointer value,
and for constructing a pointer value from a segment and offset.
b) Keywords that could be used to declare a 2-byte pointer containing
only the offset, which always implicitly used the same segment, with
several different options for determining which segment value to use.

I remember thinking: "How important would code performance have to be,
to justify putting up with the hassle of making use of these
features?" I played around with them in order to learn how they
worked, but never used them in actual production code.
 
J

James Dow Allen

I seem to remember threads from way back trying to come up with portable
macros for determining the width of an unsigned type at compile time. I
can't remember if anyone came up with a method of not....

(I realize the #define in the Subject doesn't do the trick,
but why not a simple #define .... 32 in a header?)

This whole discussion startled me. And intrigued me enough
to peruse some fsf/gnu source codes: I'll report
on this briefly below.

I was *not* startled by my own ignorance: My code is
mostly "boudoir" code that doesn't need portability;
and I often practice "just in time" learning. The only
time I can recall typing "make config" was to install
an mp3 player on my laptop.

I was *not* startled by this "weird" provision about
padding bits in the C standard. I'm sure that C
compilers run on some perverse machines, and also that
the Standards Committee had some perverse members.
(The interesting mystery is how to apportion the blame
between these two sources of perversity!)

What *did* startle me is that the Professional C
Programming Community seems to have lived with this
confusion without any standard solution. Surely a
lot of code needs to know how many bits are in an int.
The earlier discussion was nine years ago -- surely
that's enough time to add a line such as
#define BITS_PER_ULONG (8 * sizeof(unsigned long))
or, perhaps better yet for pre-processor ease,
#define BITS_PER_ULONG 32 /* change as necessary */
to the header files c.l.c'ers have control over.

(BTW, while browsing some gnu headers and configs just
now, I *think* I saw a comment to the effect that
something like *8 * sizeof" worked on all machines tested.)

I see that there is a
#define __WORDSIZE 32
line in bits/wordsize.h in my gcc environment; it's used
to "#if" things like the "typedef ... uint64_t;".
The uses all seem to assume __WORDSIZE will be 32 or 64
although only tiffconf.h throws an error otherwise.

I checked the FSF/GNU source for their gmp library.
It has lines *similar* to the above definition:
#define SIZEOF_UNSIGNED_LONG $ac_cv_sizeof_unsigned_long
...
#define BITS_PER_ULONG (8 * SIZEOF_UNSIGNED_LONG)

I suppose "$ac_cv_sizeof_unsigned_long" is developed
in the obvious way, but was dissuaded from further
investigation by
% wc gmp-4.2.4/config*
829 3306 24298 config.guess
513 2216 15259 config.in
143 609 4081 config.sub
1526 4998 44914 configfsf.guess
1673 4386 33704 configfsf.sub
34722 126885 1087352 configure
3220 13273 112578 configure.in
42626 155673 1322186 total

I glanced at some other gnu source:

math/atest-sincos.c has
#define mpbpl (CHAR_BIT * sizeof (mp_limb_t))
#define SZ (FRAC / mpbpl + 1)
typedef mp_limb_t mp1[SZ], mp2[SZ * 2];
I don't know if *this* CHAR_BIT should be 8.

soft-fp/soft-fp.h
#define SI_BITS (__CHAR_BIT__ * (int)sizeof(SItype))
By this time I'd lost interest, but what is __CHAR_BIT__ ?
Is it somehow "better" than CHAR_BIT ?

Grepping more gnu source I notice *both* "8 * sizeof"
constructs and "CHAR_BIT * sizeof" constructs. Perhaps they
made the right choice in each case, and the codes all work
wonderfully when CHAR_BIT != 8.

Didn't see any more instances of __CHAR_BIT__ except in
some C++ headers. One header,
/usr/include/c++/4.1.2/bits/stl_bvector.h
included the code
namespace _GLIBCXX_STD
{
typedef unsigned long _Bit_type;
enum { _S_word_bit = int(CHAR_BIT * sizeof(_Bit_type)) };
...
The CHAR_BIT seems wrong here, for the reason under discussion;
don't know if adding the 4 extra underscores would have helped.


Summary.
I will continue to use something like
#define BITS_PER(x) (8*sizeof(x))
If I get paranoid I'll call a test routine early in
main(), something like

void test_bits_per(void)
{
unsigned long x = 1, y = BITS_PER(x);
if (x << y || ! (x << y - 1)) {
fprintf(stderr, "Bits_per_ broken?\n");
exit(6);
}
}

It really is ironic that I, out of laziness, was already
using the above definition, and introduced a problem by
ignorantly changing "8" to "CHAR_BIT" to avoid objections
from the ng!

I've changed the Subject line, but please feel free to
pursue criticism of "Implementation of getbits()/setbits()"
now that we've agreed on
#define BITS_PER(x) (8*sizeof(x))
:) :) :)


James Dow Allen
 
B

Ben Bacarisse

Richard Heathfield said:
James Dow Allen said:


Yeah, it might be nice. I don't think you can do it at compile
time.

Already posted elsewhere, but it deserves wider circulation:

/* Number of bits in inttype_MAX, or in any (1<<b)-1 where 0 <= b < 3E+10 */
#define IMAX_BITS(m) ((m) /((m)%0x3fffffffL+1) /0x3fffffffL %0x3fffffffL *30 \
+ (m)%0x3fffffffL /((m)%31+1)/31%31*5 + 4-12/((m)%31+3))

Hallvard B Furuseth 30 Dec 2003, 18:38
comp.lang.c Message-ID: <[email protected]>

<snip>
 
K

Keith Thompson

James Dow Allen said:
soft-fp/soft-fp.h
#define SI_BITS (__CHAR_BIT__ * (int)sizeof(SItype))
By this time I'd lost interest, but what is __CHAR_BIT__ ?
Is it somehow "better" than CHAR_BIT ?

I can but speculate (well, I could look at those headers myself, but
I'm too lazy). CHAR_BIT is defined in <limits.h>, a standard
user-visible header. Probably this __CHAR_BIT__ macro is defined in
some lower-level system-specific header, one that's used *by* the
user-visible headers.
Grepping more gnu source I notice *both* "8 * sizeof"
constructs and "CHAR_BIT * sizeof" constructs. Perhaps they
made the right choice in each case, and the codes all work
wonderfully when CHAR_BIT != 8.

It's more likely that the code that uses "8 * sizeof ..." simply was
never intended to work on systems where CHAR_BIT != 8.

[...]
Summary.
I will continue to use something like
#define BITS_PER(x) (8*sizeof(x))
If I get paranoid I'll call a test routine early in
main(), something like

Sorry, I must have missed something. I'd definitely use CHAR_BIT, not
8, for two reasons. First, and perhaps *less* important, CHAR_BIT
isn't necessarily 8 on all systems (I say this may be less important
because I've never programmed for a system with CHAR_BIT != 8).
Second, it's clearer to use a symbolic name rather than a magic
number.

Why do you think 8 might be better than CHAR_BIT?

[...]

One perhaps interesting point: Nearly all current *hosted
implementations* that I know of have CHAR_BIT==8, and have no padding
bits in the predefined integer types. (I think some old Cray vector
systems may have had padding bits, but they never had C99
implementations.)

gcc-specific code needn't be portable; it can use whatever
gcc-specific assumptions it likes.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,277
Latest member
VytoKetoReview

Latest Threads

Top