Macro for supplying memset with an unsigned char

M

Martin Wells

I'm trying to come up with a fully-portable macro for supplying memset
with an unsigned char rather than an int. I'm going to think out loud
as I go along. . .

I'll take a sample system before I begin:

CHAR_BIT == 16
sizeof(short) == sizeof(int) == 1
Assume none of the integer types have padding bits
Sign-magnitude

Therefore we have:

UCHAR_MAX == 65535
INT_MIN = -32767
INT_MAX = 32767

Let's say we have an array of bytes and we want to set every byte to
65000. We CANNOT use:

memset(data, 65000, sizeof data);

because the conversion from unsigned integer types to signed integer
types "is implementation-defined or an implementation-defined signal
is raised" if the number is out of range.

Therefore we need to supply memset with an int value, which, went
converted to unsigned char, will yield the value we want.

The rules for converting from signed to unsigned are as follows:

| If the new type is unsigned, the value is converted
| by repeatedly adding or subtracting one more than
| the maximum value that can be represented in the
| new type until the value is in the range of the new type.

The addition method is easier to understand so we'll go with that one.
If we start off with a negative number like -1, then here's what will
happen:

char unsigned c = -1;

is equal to:

infinite_range_int x = -1; /* Let's pretend we have a signed
int type that can hold any number */

while (0 > x || UCHAR_MAX < x) x += UCHAR_MAX +
(infinite_range_int)1;

char unsigned c = x;

So on our own system, this is:

while (0 > x || 65535 < x) x += 65536;

Clearly, if x = -1, then it only takes one iteration of the loop to
yield 65535, i.e. UCHAR_MAX.

Therefore, if we want UCHAR_MAX-1, then we'd use (int)-2.
For UCHAR_MAX-2, we'd use (int)-3.

The entire set of data looks something like:

int char unsigned
-1 65535
-2 65534
-3 65533
-4 65532
-5 65531
-6 65530
-7 65529
-8 65528
-9 65527
-10 65526
-11 65525
-12 65524
....
....
-32764 32772
-32765 32771
-32766 32770
-32767 32769
-32768 32768 <--

Now I've just realised a problem. An unsigned char can store 65536
different combinations (i.e. 0 through 65535), but an int can only
store 65535 different combination (i.e. -32767 through 32767) if we're
using something other than two's complement. I don't know what I'll do
about that, but for now I'll try continue with the other two number
systems:

#if NUMBER_SYSTEM != SIGN_MAGNITUDE

#define UC_AS_INT(x) /* Whatever we're going to do */

#endif

My first thought is something like:

#define UC_AS_INT(x) UC_AS_INT_Internal( (char unsigned)(x) )

#define UC_AS_INT_Internal(x) ( x > INT_MAX \
? -(int)(UCHAR_MAX - x) - 1 \
: (int)x )

Anyway it's Friday an I've stuff to do, but if anyone wants to finish
it off then feel free! :)

If we can't get all 65536 combinations out of one's complement or sign-
magnitude, then we can just have a macro that changes it to:

char unsigned *p = data;
char unsigned const *const pover = data + sizeof data;
while (pover != p) *p++ = c;

Martin
 
P

pete

Martin said:
I'm trying to come up with a fully-portable macro for supplying memset
with an unsigned char rather than an int. I'm going to think out loud
as I go along. . .

I'll take a sample system before I begin:

CHAR_BIT == 16
sizeof(short) == sizeof(int) == 1
Assume none of the integer types have padding bits
Sign-magnitude

Therefore we have:

UCHAR_MAX == 65535
INT_MIN = -32767
INT_MAX = 32767

Let's say we have an array of bytes and we want to set every byte to
65000. We CANNOT use:

memset(data, 65000, sizeof data);

because the conversion from unsigned integer types to signed integer
types "is implementation-defined or an implementation-defined signal
is raised" if the number is out of range.

Whether or not you can set an unsigned char to 65000
is implementation defined,
so there's nothing wrong
with an implementation defined way of doing it.
 
M

Martin Wells

pete:
Whether or not you can set an unsigned char to 65000
is implementation defined,
so there's nothing wrong
with an implementation defined way of doing it.

The reason I mentioned concrete figures like 65535 instead of
UCHAR_MAX is that I think people find it easier to understand and
grasp.

The point wasn't whether we could assign 65000 to an int, but rather
whether we could assign (UCHAR_MAX - some_small_number) to an int and
have the same results on every implementation conceivable.

For clarity, I'll rewrite my original post taking out the concrete
numbers. Remember again, that the code is being written in the context
of it being FULLY portable (e.g. 97-Bit char's and sign-magnitude):


Let's say we have an array of bytes and we want to set every byte to
(UCHAR_MAX - 4). We CANNOT use:


memset(data, UCHAR_MAX - 4, sizeof data);


because the conversion from unsigned integer types to signed integer
types "is implementation-defined or an implementation-defined signal
is raised" if the number is out of range. (So in the context of fully
portable programming, the resultant int could have pretty much any
value because UCHAR_MAX might be bigger than INT_MAX).

Therefore we need to supply memset with an int value, which, went
converted to unsigned char, will yield the value we want.

The rules for converting from signed to unsigned are as follows:

| If the new type is unsigned, the value is converted
| by repeatedly adding or subtracting one more than
| the maximum value that can be represented in the
| new type until the value is in the range of the new type.

The addition method is easier to understand so we'll go with that
one.
If we start off with a negative number like -1, then here's what will
happen:


char unsigned c = -1;


is equal to:


infinite_range_int x = -1; /* Let's pretend we have a signed
int type that can hold any number */


while (0 > x || UCHAR_MAX < x) x += UCHAR_MAX +
(infinite_range_int)1;


char unsigned c = x;


So here's a few samples of what will happen on different systems:


while (0 > x || 255 < x) x += 256;
while (0 > x || 65535 < x) x += 65536;
while (0 > x || 4294967295 < x) x += 4294967296;
while (0 > x || 18446744073709551615 < x) x +=
18446744073709551616;

If x = -1, then it only takes one iteration of the loop to
yield UCHAR_MAX on any implementation.

Therefore, if we want UCHAR_MAX-1, then we'd use (int)-2.
For UCHAR_MAX-2, we'd use (int)-3.


The entire set of data looks something like:


int char unsigned
-1 UCHAR_MAX
-2 UCHAR_MAX-1
-3 UCHAR_MAX-2
-4 UCHAR_MAX-3
-5 UCHAR_MAX-4
-6 UCHAR_MAX-5
-7 UCHAR_MAX-6
-8 UCHAR_MAX-7
-9 UCHAR_MAX-8
-10 UCHAR_MAX-9
-11 UCHAR_MAX-10
-12 UCHAR_MAX-11
....
....


Now I've just realised a problem. Imagine a system where unsigned char
has the range 0 through 65535 and where int has -32767 through 32767.
The former has 65536 possible combinations while the latter only has
65535 combinations. We might have to resort to a loop if working with
something other than two's complement, but I'm not sure yet.

Anyway here's the code I have at the moment, I robbed some of it from
old posts of yours pete:

#define SIGNMAG 0
#define ONES 1
#define TWOS 2

#if -1 & 3 == 1
#define NUM_SYS SIGNMAG
#elif -1 & 3 == 2
#define NUM_SYS ONES
#else
#define NUM_SYS TWOS
#endif


#if NUM_SYS != TWOS /* ----------- */

#include <stddef.h>

static void *uc_memset(void *const pv,char unsigned const val,size_t
const len)
{
char *p = pv;
char const *const pover = p + len;

while (pover != p) *p++ = val;

return pv;
}

#define UC_MEMSET(p,uc,len) (uc_memset(p,uc,len))

#else /* ------------ */

#include <string.h>

#define UC_AS_INT(x) UC_AS_INT_Internal( (char unsigned)(x) )

#define UC_AS_INT_Internal(x) ( x > INT_MAX \
? -(int)(UCHAR_MAX - x) - 1 \
: (int)x )

#define UC_MEMSET(p,uc,len) (memset((p),UC_AS_INT((uc)),(len)))

#endif /* ----------- */



#include <limits.h>

int main(void)
{
char unsigned data[24];

UC_MEMSET(data, UCHAR_MAX, sizeof data);

return 0;
}

Feel free to make alterations if you see a better way of doing it!

Martin
 
C

Charlie Gordon

"Martin Wells" <[email protected]> a écrit dans le message de
(e-mail address removed)...
Now I've just realised a problem. Imagine a system where unsigned char
has the range 0 through 65535 and where int has -32767 through 32767.
The former has 65536 possible combinations while the latter only has
65535 combinations. We might have to resort to a loop if working with
something other than two's complement, but I'm not sure yet.

For this and other similar reasons, it would be difficult if not impossible
to implement a fully conformant hosted C envirinment on an architecture with
non twos-complement representation and sizeof(int) == 1 at the same time.

Luckily, non twos-complement architectures can only be found in museums
today.
Anyway here's the code I have at the moment, I robbed some of it from
old posts of yours pete:

#define SIGNMAG 0
#define ONES 1
#define TWOS 2

#if -1 & 3 == 1
#define NUM_SYS SIGNMAG
#elif -1 & 3 == 2
#define NUM_SYS ONES
#else
#define NUM_SYS TWOS
#endif

These tests are incorrect for two reasons:

* ``-1 & 3 == 1'' is interpreted as ``-1 & (3 == 1)'' which yields 0 for
all platforms.

* There is no guarantee that the preprocessing be performed with the same
representation as the target architecture. As a matter of fact, embedded
targets with unusual arithmetics are often targetted by cross compilers
running on different machines.

It is a sad fact that integer representation cannot be adequately tested at
the preprocessing stages. sizeof(int) == 1 cannot be evaluated be the
preprocessor.

One can only test the macros from <limits.h>:

#if INT_MIN == -INT_MAX
/* we are targetting a non twos-complement architecture */
# if INT_MAX < UCHAR_MAX
/* Houston, we have a problem! */
# define MEMSET_IS_INADEQUATE 1
# endif
# define NUM_SYS ONES_OR_SIGNMAG
#else
# define NUM_SYS TWOS
#endif
 
M

Martin Wells

Chqrlie:
For this and other similar reasons, it would be difficult if not impossible
to implement a fully conformant hosted C envirinment on an architecture with
non twos-complement representation and sizeof(int) == 1 at the same time.

Luckily, non twos-complement architectures can only be found in museums
today.


Unless it's prevented by the "laws of mathematics" or something like
that, I allow for every possiblity when writing portable code. (A
little ridiculous at times, I admit, but hey I don't make a sacrifice
unless it's a sacrifice worth making).

These tests are incorrect for two reasons:

* ``-1 & 3 == 1'' is interpreted as ``-1 & (3 == 1)'' which yields 0 for
all platforms.

Wups.


* There is no guarantee that the preprocessing be performed with the same
representation as the target architecture. As a matter of fact, embedded
targets with unusual arithmetics are often targetted by cross compilers
running on different machines.


Now I may be mistaken, but I think the requirement with C99 is that
the preprocessor int types be the same as the actual C int types
(including their use of number systems). Not sure if this applies to
C89.

It is a sad fact that integer representation cannot be adequately tested at
the preprocessing stages. sizeof(int) == 1 cannot be evaluated be the
preprocessor.

One can only test the macros from <limits.h>:

#if INT_MIN == -INT_MAX
/* we are targetting a non twos-complement architecture */
# if INT_MAX < UCHAR_MAX
/* Houston, we have a problem! */
# define MEMSET_IS_INADEQUATE 1
# endif
# define NUM_SYS ONES_OR_SIGNMAG
#else
# define NUM_SYS TWOS
#endif


Great idea! What about the following then:

#include <limits.h>

#if INT_MAX >= UCHAR_MAX

/* Normal memset will work just fine */
# define UC_MEMSET(p,uc,len) (memset((p),(char unsigned)(uc),
(len)))

#elif INT_MIN != -INT_MAX

/* We've got two's complement, we can still use memset */

# include <string.h>

# define UC_AS_INT_Internal(x) ( x > INT_MAX \
? -(int)(UCHAR_MAX - x) - 1 \
: (int)x )

# define UC_AS_INT(x) UC_AS_INT_Internal( (char unsigned)
(x) )


# define UC_MEMSET(p,uc,len) (memset((p),UC_AS_INT((uc)),(len)))


#else

/* int hasn't got enough unique value combinations, we can't use
memset :( */

# include <stddef.h>

static void *uc_memset(void *const pv,char unsigned const
val,size_t const len)
{
char *p = pv;
char const *const pover = p + len;

while (pover != p) *p++ = val;

return pv;
}


# define UC_MEMSET(p,uc,len) (uc_memset(p,uc,len))


#endif


int main(void)
{
char unsigned data[24];


UC_MEMSET(data, UCHAR_MAX, sizeof data);


return 0;
}


Martin
 
P

pete

#elif INT_MIN != -INT_MAX

/* We've got two's complement, we can still use memset */

The preprocessor directive is correct, but the comment is wrong.
What really matters is whether or not INT_MIN equals -INT_MAX.
INT_MIN is allowed to equal -INT_MAX on
implementations that use two's complement.
 
?

=?iso-2022-kr?q?Harald_van_D=0E=29=26=0Fk?=

The preprocessor directive is correct, but the comment is wrong. What
really matters is whether or not INT_MIN equals -INT_MAX. INT_MIN is
allowed to equal -INT_MAX on implementations that use two's complement.

Right, but INT_MIN is not allowed to differ from -INT_MAX on
implementations that don't use two's complement. So if the #elif block is
entered, you know you're dealing with two's complement. That info is not
actually useful, for the reason you stated, but it's not wrong either.
 
A

Army1987

I'm trying to come up with a fully-portable macro for supplying memset
with an unsigned char rather than an int. I'm going to think out loud
as I go along. . .

If you want to set every byte of an object to a value (other than
0 or a character constant in the basic character set), you know
what that does on that object. And since that depends on the
implementation, why do you want to do it fully-portably?
 
C

Charlie Gordon

Martin Wells said:
Chqrlie:



Unless it's prevented by the "laws of mathematics" or something like
that, I allow for every possiblity when writing portable code. (A
little ridiculous at times, I admit, but hey I don't make a sacrifice
unless it's a sacrifice worth making).

Well there are more important battles to be faught than this one.
Now I may be mistaken, but I think the requirement with C99 is that
the preprocessor int types be the same as the actual C int types
(including their use of number systems). Not sure if this applies to
C89.

Chapter and Verse ?

6.10.1p4 says for the purpose of evaluating preprocessing constant
expressions (#if / #elif)preprocessing numbers act as if they have the same
representation as intmax_t (or uintmax_t for unsigned variants). They leave
it implementation defined if character constants convert to the same numeric
value for proprocessing constant expressions and actual compilation. Could
it be possible that intmax_t use twos-complement and int use sign/magnitude
?

I think the Standard is not precise enough on this issue, and I don't even
have a copy of C89 to check if it applies there.

As for your ultimate proposal, I am still analysing it, but I don't think
you can refer to unsigned char as ``char unsigned''
 
M

Martin Wells

Army1987:
If you want to set every byte of an object to a value (other than
0 or a character constant in the basic character set), you know
what that does on that object. And since that depends on the
implementation, why do you want to do it fully-portably?

I'm writing portable code for an embedded system. The microcontroller
will output a byte value via ports consisting of individual pins which
will be either 5 volts or 0 volts to indicate binary 1 or 0. I want to
be easily able to set all ports to a given pattern (e.g. all zeros,
all ones, alternating ones and zeros, two zeros then a one, etc.).

Of course, the code that actually sets the pins values will be
micrcontroller, library and compiler specific, but there's no reason
to deportify the guts of the program.

Martin
 
M

Martin Wells

Chqrlie:
As for your ultimate proposal, I am still analysing it, but I don't think
you can refer to unsigned char as ``char unsigned''


6.7.2:

Constraints
2 At least one type specifier shall be given in the declaration
specifiers in each declaration, and in the specifier-qualifier list in
each struct declaration and type name. Each list of type specifiers
shall be one of the following sets (delimited by commas, when there is
more than one set on a line); the type specifiers may occur in any
order, possibly intermixed with the other declaration specifiers.
- void
- char
- signed char
- unsigned char
- short, signed short, short int, or signed short int
- unsigned short, or unsigned short int
- int, signed, or signed int- unsigned, or unsigned int
- long, signed long, long int, or signed long int
- unsigned long, or unsigned long int
- long long, signed long long, long long int, or signed long long int
- unsigned long long, or unsigned long long int
- float
- double
- long double
- _Bool
- float _Complex
- double _Complex
- long double _Complex
- struct or union specifier
- enum specifier
- typedef name


Martin
 
M

Martin Wells

Chqrlie:
6.7.2:

Constraints
2 At least one type specifier shall be given in the declaration
specifiers in each declaration, and in the specifier-qualifier list in
each struct declaration and type name. Each list of type specifiers
shall be one of the following sets (delimited by commas, when there is
more than one set on a line); the type specifiers may occur in any
order, possibly intermixed with the other declaration specifiers.
- void
- char
- signed char
- unsigned char
- short, signed short, short int, or signed short int
- unsigned short, or unsigned short int
- int, signed, or signed int
- unsigned, or unsigned int
- long, signed long, long int, or signed long int
- unsigned long, or unsigned long int
- long long, signed long long, long long int, or signed long long int
- unsigned long long, or unsigned long long int
- float
- double
- long double
- _Bool
- float _Complex
- double _Complex
- long double _Complex
- struct or union specifier
- enum specifier
- typedef name


Wups I probably should have red the list before I posted it :O

I know I've seen plenty of code that has "short unsigned", "long
unsigned", "char unsigned" in it, but I don't know what the Standard
has to say about it.

Martin
 
?

=?iso-2022-kr?q?=1B=24=29CHarald_van_D=0E=29=26=0F


Charlie Gordon did not quote the below paragraph. You did.
6.7.2:

Constraints
2 At least one type specifier shall be given in the declaration
specifiers in each declaration, and in the specifier-qualifier list in
each struct declaration and type name. Each list of type specifiers
shall be one of the following sets (delimited by commas, when there is
more than one set on a line); the type specifiers may occur in any
order, possibly intermixed with the other declaration specifiers. -
[...]
Wups I probably should have red the list before I posted it :O

I know I've seen plenty of code that has "short unsigned", "long
unsigned", "char unsigned" in it, but I don't know what the Standard has
to say about it.

You quoted "the type specifiers may occur in any order, possibly
intermixed with the other declaration specifiers." I think that's pretty
explicit.
 
M

Martin Wells

Harald:
You quoted "the type specifiers may occur in any order, possibly
intermixed with the other declaration specifiers." I think that's pretty
explicit.

If you look at the list I posted though, you'll see that they have
"unsigned char" all together, rather than "char" and "unsigned" listed
seperately.

I'm not trying to argue against "char unsigned" (after all, I use it
myself), but still I haven't read anything yet in the Standard that
solidly confirms that it's OK.

Martin
 
?

=?iso-2022-kr?q?=1B=24=29CHarald_van_D=0E=29=26=0F

Harald:


If you look at the list I posted though, you'll see that they have
"unsigned char" all together, rather than "char" and "unsigned" listed
seperately.

char is a type specifier. unsigned is a type specifier. unsigned char is
not a type specifier, it is two, which may occur in any order.
 
C

Charlie Gordon

Martin Wells said:
Army1987:


I'm writing portable code for an embedded system. The microcontroller
will output a byte value via ports consisting of individual pins which
will be either 5 volts or 0 volts to indicate binary 1 or 0. I want to
be easily able to set all ports to a given pattern (e.g. all zeros,
all ones, alternating ones and zeros, two zeros then a one, etc.).

Of course, the code that actually sets the pins values will be
micrcontroller, library and compiler specific, but there's no reason
to deportify the guts of the program.

For the specific cases all bits 0 and all bits 1, the solution is simple:

memset(array, 0, sizeof array); /* all bits 0 */
memset(array, -1, sizeof array); /* all bits 1 */

For arbitrary bit patterns, it may not be possible with memset on
architectures with non twos-complement arithmetics and sizeof(int) == 1.
But discussing these is a form of mental masturbation as they do not exist
in the real world. Most regulars here indulge in it almost daily, but only
in forums like this one, not in production code. Obfuscating calls to
memset to ensure protability to the DS9K is exactly that: obfuscation. It
makes your program harder to write, harder to read, more prone to bugs.
 
M

Martin Wells

Chqrlie:
For the specific cases all bits 0 and all bits 1, the solution is simple:

memset(array, 0, sizeof array); /* all bits 0 */
memset(array, -1, sizeof array); /* all bits 1 */

For arbitrary bit patterns, it may not be possible with memset on
architectures with non twos-complement arithmetics and sizeof(int) == 1.


The UC_MEMSET macro takes care of that by calling a function which has
a loop. If you ask me though, the C89 Standard is broken in that it
doesn't provide a UC_MEMSET itself. But the again, it makes more fun
for us to patch over the broken stuff :D

But discussing these is a form of mental masturbation as they do not exist
in the real world. Most regulars here indulge in it almost daily, but only
in forums like this one, not in production code.


Yes I can agree that if time is money, you're not going to be very
productive by accomodating sign-magnitude machines, but it still is a
bit of fun to make your code 100% portable to a certain standard. I'm
doing an embedded systems project at the moment, and most people would
start off as non-portable and keeping getting more and more non-
portable. Instead I've decided to got the portable route... and it's
going well so far :D

Obfuscating calls to
memset to ensure protability to the DS9K is exactly that: obfuscation. It
makes your program harder to write, harder to read, more prone to bugs.


Not if you hide the funky stuff in header files:

#include "broken_int_uc_fixes.h"

int main(void)
{
UC_MEMSET(whatever,UCHAR_MAX,sizeof whatever);
}

Martin
 
R

Richard

Charlie Gordon said:
For the specific cases all bits 0 and all bits 1, the solution is simple:

memset(array, 0, sizeof array); /* all bits 0 */
memset(array, -1, sizeof array); /* all bits 1 */

For arbitrary bit patterns, it may not be possible with memset on
architectures with non twos-complement arithmetics and sizeof(int) == 1.
But discussing these is a form of mental masturbation as they do not exist
in the real world. Most regulars here indulge in it almost daily, but only
in forums like this one, not in production code. Obfuscating calls to
memset to ensure protability to the DS9K is exactly that: obfuscation. It
makes your program harder to write, harder to read, more prone to
bugs.

Well said.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top