Read hex string to a buf

C

cppbeginner

Hello,

Newbie question: I have a string of nibbles (say 2b11cc4f5db20cd5)
that can be put as

#define DATA 0x2b11cc4f5db20cd5 or
#define DATA "2b11cc4f5db20cd5"

Need to read this into a uint8 array buf such that each byte
represents a hex digit (formed out of 2 nibbles in DATA). What do you
think is the best way to do this?

Thanks for your help.
 
I

Ian Collins

Hello,

Newbie question: I have a string of nibbles (say 2b11cc4f5db20cd5)
that can be put as

#define DATA 0x2b11cc4f5db20cd5 or
#define DATA "2b11cc4f5db20cd5"

Need to read this into a uint8 array buf such that each byte
represents a hex digit (formed out of 2 nibbles in DATA). What do you
think is the best way to do this?
Unless your system has an integer type that can hold 0x2b11cc4f5db20cd5
or whatever the biggest value you have to process is, the string is your
only viable option.
 
C

Christopher Layne

Hello,

Newbie question: I have a string of nibbles (say 2b11cc4f5db20cd5)
that can be put as

#define DATA 0x2b11cc4f5db20cd5 or
#define DATA "2b11cc4f5db20cd5"

Need to read this into a uint8 array buf such that each byte
represents a hex digit (formed out of 2 nibbles in DATA). What do you
think is the best way to do this?

Thanks for your help.

I can think of about a 100 different ways to do it depending on how goofy you
want to get with strtol, snprintf, etc. If you don't want to use any library
functions for the actual hex work:

/*
* deadbeef.c: goofy code to convert hex bytes to unsigned char
* input should be an even number of characters or you'll fry your cpu cache.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

const size_t minl = 2;
const size_t nibl = 4;
const char delim[2] = "0x";
const char set[2][16] = { "0123456789abcdef", "0123456789ABCDEF" };
unsigned char lookup[2][256];

int main(int argc, char **argv)
{
size_t i, l, n;
char *q;
unsigned char *p;

if (argc < 2) return -1;
if ((l = strlen((q = argv[1]))) < minl || (l & 0x01))
return -1;

for (i = sizeof set; i--; ) {
lookup[0][(size_t)set[0]] = i;
lookup[0][(size_t)set[1]] = i;
lookup[1] = set[0];
}

if (memcmp(q, delim, minl) == 0) {
q += minl; l -= minl;
}

if ((p = malloc(sizeof *p * l / 2)) == NULL)
return -1;

for (i = l; i; ) {
n = lookup[0][(size_t)q[--i]] << nibl * 0;
n += lookup[0][(size_t)q[--i]] << nibl * 1;
p[i / 2] = n;
}

for (l /= 2; i < l; i++) {
n = p;
fprintf(stdout, "p[%2.2u] == 0x%c%c %3.3u\n",
i,
lookup[1][(n & 0xf0) >> nibl * 1],
lookup[1][(n & 0x0f) >> nibl * 0],
n);
}

free(p);

return 0;
}

$ cc -O3 -W -Wall -Werror -pedantic -o hex2dec hex2dec.c
$ ./hex2dec 0xdeadbeef00aBadCafe00decafBAD000123456789abcdeffedcba9876543210
p[00] == 0xde 222
p[01] == 0xad 173
p[02] == 0xbe 190
p[03] == 0xef 239
p[04] == 0x00 000
p[05] == 0xab 171
p[06] == 0xad 173
p[07] == 0xca 202
p[08] == 0xfe 254
p[09] == 0x00 000
p[10] == 0xde 222
p[11] == 0xca 202
p[12] == 0xfb 251
p[13] == 0xad 173
p[14] == 0x00 000
p[15] == 0x01 001
p[16] == 0x23 035
p[17] == 0x45 069
p[18] == 0x67 103
p[19] == 0x89 137
p[20] == 0xab 171
p[21] == 0xcd 205
p[22] == 0xef 239
p[23] == 0xfe 254
p[24] == 0xdc 220
p[25] == 0xba 186
p[26] == 0x98 152
p[27] == 0x76 118
p[28] == 0x54 084
p[29] == 0x32 050
p[30] == 0x10 016
 
K

Keith Thompson

Christopher Layne said:
I can think of about a 100 different ways to do it depending on how goofy you
want to get with strtol, snprintf, etc. If you don't want to use any library
functions for the actual hex work:

/*
* deadbeef.c: goofy code to convert hex bytes to unsigned char
* input should be an even number of characters or you'll fry your cpu cache.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

const size_t minl = 2;
const size_t nibl = 4;
const char delim[2] = "0x";
const char set[2][16] = { "0123456789abcdef", "0123456789ABCDEF" };

You accept hexadecimal digits in either upper or lower case, but you
don't do the same for "0x" vs. "0X".
unsigned char lookup[2][256];

int main(int argc, char **argv)
{
size_t i, l, n;
char *q;
unsigned char *p;

if (argc < 2) return -1;
if ((l = strlen((q = argv[1]))) < minl || (l & 0x01))
return -1;

for (i = sizeof set; i--; ) {
lookup[0][(size_t)set[0]] = i;
lookup[0][(size_t)set[1]] = i;

[...]

These casts (like most casts) are unnecessary. An array index can be
of any integer type.

(I haven't really looked at most of your program; a couple of things
just jumped out at me.)
 
C

Christopher Layne

Keith said:
The <unistd.h> header is non-portable, and your program doesn't use it
anyway.
Habit.
const size_t minl = 2;
const size_t nibl = 4;
const char delim[2] = "0x";
const char set[2][16] = { "0123456789abcdef", "0123456789ABCDEF" };

You accept hexadecimal digits in either upper or lower case, but you
don't do the same for "0x" vs. "0X".

Sigh.

/* Make Keith Thompson Happy */
if (q[0] == '0' && (q[1] == 'x' || q[1] == 'X')) {
q += minl; l -= minl;
}
These casts (like most casts) are unnecessary. An array index can be
of any integer type.

$ cc -g3 -O3 -W -Wall -Werror -pedantic -o hex2dec hex2dec.c
cc1: warnings being treated as errors
hex2dec.c: In function 'main':
hex2dec.c:25: warning: array subscript has type 'char'
hex2dec.c:26: warning: array subscript has type 'char'
hex2dec.c:38: warning: array subscript has type 'char'
hex2dec.c:39: warning: array subscript has type 'char'
(I haven't really looked at most of your program; a couple of things
just jumped out at me.)

Good ole comp.lang.c. Quick to point out any error. Care to add anything else?
 
K

Keith Thompson

Christopher Layne said:
Keith Thompson wrote: [...]
const size_t minl = 2;
const size_t nibl = 4;
const char delim[2] = "0x";
const char set[2][16] = { "0123456789abcdef", "0123456789ABCDEF" };

You accept hexadecimal digits in either upper or lower case, but you
don't do the same for "0x" vs. "0X".

Sigh.

/* Make Keith Thompson Happy */
if (q[0] == '0' && (q[1] == 'x' || q[1] == 'X')) {
q += minl; l -= minl;
}

It doesn't make me either happy or unhappy.
$ cc -g3 -O3 -W -Wall -Werror -pedantic -o hex2dec hex2dec.c
cc1: warnings being treated as errors
hex2dec.c: In function 'main':
hex2dec.c:25: warning: array subscript has type 'char'
hex2dec.c:26: warning: array subscript has type 'char'
hex2dec.c:38: warning: array subscript has type 'char'
hex2dec.c:39: warning: array subscript has type 'char'

Whoops, I should have paid more attention; I didn't realize the
subscripts were of type char (which can be signed on some
implementations). The values you're using happen to be guaranteed to
be positive, but the compiler understandably didn't figure that out.

I'm not sure size_t is the best type to convert to, but it's not
unreasonable. Or you might declare "set" as an array of unsigned
char.
Good ole comp.lang.c. Quick to point out any error. Care to add
anything else?

Not at the moment. (In my opinion, being "quick to point out any
error" is a good thing; do you disagree?)
 
K

Kenny McCormack

Christopher Layne said:
Keith Thompson wrote: [...]
const size_t minl = 2;
const size_t nibl = 4;
const char delim[2] = "0x";
const char set[2][16] = { "0123456789abcdef", "0123456789ABCDEF" };

You accept hexadecimal digits in either upper or lower case, but you
don't do the same for "0x" vs. "0X".

Sigh.

/* Make Keith Thompson Happy */
if (q[0] == '0' && (q[1] == 'x' || q[1] == 'X')) {
q += minl; l -= minl;
}

It doesn't make me either happy or unhappy.

There's a rumor floating around that KT was happy. Once.
Sometime back in the mid 60s. Nobody's sure what caused it.
Many don't believe it ever happened.
 
F

Flash Gordon

Christopher Layne wrote, On 31/01/07 08:01:
Keith said:
The <unistd.h> header is non-portable, and your program doesn't use it
anyway.
Habit.
const size_t minl = 2;
const size_t nibl = 4;
const char delim[2] = "0x";
const char set[2][16] = { "0123456789abcdef", "0123456789ABCDEF" };
You accept hexadecimal digits in either upper or lower case, but you
don't do the same for "0x" vs. "0X".

Sigh.

/* Make Keith Thompson Happy */
if (q[0] == '0' && (q[1] == 'x' || q[1] == 'X')) {
q += minl; l -= minl;
}

Consistency of operation is important.
$ cc -g3 -O3 -W -Wall -Werror -pedantic -o hex2dec hex2dec.c
cc1: warnings being treated as errors
hex2dec.c: In function 'main':
hex2dec.c:25: warning: array subscript has type 'char'
hex2dec.c:26: warning: array subscript has type 'char'
hex2dec.c:38: warning: array subscript has type 'char'
hex2dec.c:39: warning: array subscript has type 'char'

So? Compilers are allowed to warn about perfectly good code. In my case,
for serious work, I have that warning disabled.
Good ole comp.lang.c. Quick to point out any error. Care to add anything else?

Maybe.
 
F

Flash Gordon

Christopher Layne wrote, On 31/01/07 04:32:
I can think of about a 100 different ways to do it depending on how goofy you
want to get with strtol, snprintf, etc. If you don't want to use any library
functions for the actual hex work:

/*
* deadbeef.c: goofy code to convert hex bytes to unsigned char
* input should be an even number of characters or you'll fry your cpu cache.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

Another poster pointed out that unistd.h is non-standard and not required.
const size_t minl = 2;
const size_t nibl = 4;
const char delim[2] = "0x";
const char set[2][16] = { "0123456789abcdef", "0123456789ABCDEF" };
unsigned char lookup[2][256];

Why are these globals? Not critical in a standalone program, but when it
gets put in to a library as this obviously would it starts becoming
annoying.
int main(int argc, char **argv)
{
size_t i, l, n;
char *q;
unsigned char *p;

if (argc < 2) return -1;
if ((l = strlen((q = argv[1]))) < minl || (l & 0x01))
return -1;

Non-standard return from main. If there are multiple different failure
codes there is reason for non-standard values, but that is not the case
here.
return EXIT_FAILURE;
for (i = sizeof set; i--; ) {

This you corrected in a subsequent post, but just in case that gets missed:
for (i = sizeof set[0]; i--; ) {
I would for clarity use:
for (i = sizeof set[0]; i > 0; i--) {
lookup[0][(size_t)set[0]] = i;
lookup[0][(size_t)set[1]] = i;
lookup[1] = set[0];
}

if (memcmp(q, delim, minl) == 0) {


This you changed to make it insensitive to case in response to another
poster.
q += minl; l -= minl;
}

if ((p = malloc(sizeof *p * l / 2)) == NULL)
return -1;

Again,
return EXIT_FAILURE;
for (i = l; i; ) {
n = lookup[0][(size_t)q[--i]] << nibl * 0;
n += lookup[0][(size_t)q[--i]] << nibl * 1;

If char is signed then yuck. Make q an unsigned char pointer and get rid
of the cast here. Admittedly that means you will need a cast further up,
but only the one.
p[i / 2] = n;

As a matter of style I do not see the point of n here.
}

for (l /= 2; i < l; i++) {
n = p;
fprintf(stdout, "p[%2.2u] == 0x%c%c %3.3u\n",
i,
lookup[1][(n & 0xf0) >> nibl * 1],
lookup[1][(n & 0x0f) >> nibl * 0],
n);


As a matter of style I do not see the point of n here either.
}

free(p);

return 0;
}

<snip>

Well, that seems to work, although it does have an undocumented
assumption of an 8 bit char and I think it could misbehave on a 1s
complement or sign-magnitude system. Fine for anything the OP is likely
to come across unless getting in to embedded systems though where you do
get 16, 24 and even 32 bit chars.
 
C

Christopher Layne

Keith said:
Whoops, I should have paid more attention; I didn't realize the
subscripts were of type char (which can be signed on some
implementations). The values you're using happen to be guaranteed to
be positive, but the compiler understandably didn't figure that out.

I'm not sure size_t is the best type to convert to, but it's not
unreasonable. Or you might declare "set" as an array of unsigned
char.

Yep. It's of type char as I'm using it as a simple character hash. Anyways, I
agree that just using an array of unsigned char rather than char is cleaner
and achieves the same goal in the end. Unfortunately, it's a 'choose your
battle' as I'm using the same technique further down and changing 'q' to
unsigned char means it's now time to fight with casting argv[1]. But this
is only 1 cast vs 4, so comparitively a lesser deal.

In reference to size_t, I've always used this in situations related to array
indexing as I have always been under the impression size_t is guaranteed
to be able to represent an index value. Of course, things wouldn't go so well
if one of the characters within that above set were actually a negative
value - then again, casting to int wouldn't help either.

(gdb) p /d (size_t)(char)129
$1 = 4294967169
(gdb) p /d (int)(char)129
$2 = -127

Although I did have direct control over the characters I used in the set,
still a better decision to use unsigned char, agreed.
 
C

Christopher Layne

Flash said:
const size_t minl = 2;
const size_t nibl = 4;
const char delim[2] = "0x";
const char set[2][16] = { "0123456789abcdef", "0123456789ABCDEF" };
unsigned char lookup[2][256];

Why are these globals? Not critical in a standalone program, but when it
gets put in to a library as this obviously would it starts becoming
annoying.

Non-standard return from main. If there are multiple different failure
codes there is reason for non-standard values, but that is not the case
here.
return EXIT_FAILURE;
Alrighty.

This you corrected in a subsequent post, but just in case that gets missed:
for (i = sizeof set[0]; i--; ) {
I would for clarity use:
for (i = sizeof set[0]; i > 0; i--) {

Unfortunately that is wrong and not the same effect as what I wrote.
p[i / 2] = n;

As a matter of style I do not see the point of n here.

Most likely because I was choosing to stick with an integer for integer
operations and then assign the result to the 1 byte character value. This
could be argued to be needless pre-opt and that n is superfluous.
Well, that seems to work, although it does have an undocumented
assumption of an 8 bit char and I think it could misbehave on a 1s
complement or sign-magnitude system. Fine for anything the OP is likely
to come across unless getting in to embedded systems though where you do
get 16, 24 and even 32 bit chars.

Yep. Definitely assumes an 8-bit char. What's your entirely portable
solution? :)
 
C

CBFalconer

Christopher said:
Keith Thompson wrote:
.... snip ...


$ cc -g3 -O3 -W -Wall -Werror -pedantic -o hex2dec hex2dec.c
cc1: warnings being treated as errors
hex2dec.c: In function 'main':
hex2dec.c:25: warning: array subscript has type 'char'
hex2dec.c:26: warning: array subscript has type 'char'
hex2dec.c:38: warning: array subscript has type 'char'
hex2dec.c:39: warning: array subscript has type 'char'

Yup. It's talking about set, and pointing out your error. Try
defining set as either signed or unsigned char. As usual, a cast
is probably an error.
Good ole comp.lang.c. Quick to point out any error. Care to add
anything else?

Yup. Wrong specification for printing an uncast size_t in your
printf statement. Here you have a real reason for using a cast, at
least in C90.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
 
C

Christopher Layne

CBFalconer said:
Yup. It's talking about set, and pointing out your error. Try
defining set as either signed or unsigned char. As usual, a cast
is probably an error.

Hah, you guys. We've hashed the angles about 4 different ways :).

Either way, something will be cast. If it's not the array indices, it's
strlen() and/or assigment from argv[1] to q.
Yup. Wrong specification for printing an uncast size_t in your
printf statement. Here you have a real reason for using a cast, at
least in C90.

You're right on that one - and I usually cast to unsigned long and use "%lu",
it just so coincidentally happened that I didn't use "%lu", and on my
host "%u" results in no warnings.
 
C

CBFalconer

Christopher said:
CBFalconer said:
Yup. It's talking about set, and pointing out your error. Try
defining set as either signed or unsigned char. As usual, a cast
is probably an error.

Hah, you guys. We've hashed the angles about 4 different ways :).

Either way, something will be cast. If it's not the array indices, it's
strlen() and/or assigment from argv[1] to q.
Yup. Wrong specification for printing an uncast size_t in your
printf statement. Here you have a real reason for using a cast,
at least in C90.

You're right on that one - and I usually cast to unsigned long and
use "%lu", it just so coincidentally happened that I didn't use
"%lu", and on my host "%u" results in no warnings.

Note that AFAICS all your errors are due to casts, the misuse of.
 
R

Richard

Flash Gordon said:
Christopher Layne wrote, On 31/01/07 08:01:
Keith said:
The <unistd.h> header is non-portable, and your program doesn't use it
anyway.
Habit.

const size_t minl = 2;
const size_t nibl = 4;
const char delim[2] = "0x";
const char set[2][16] = { "0123456789abcdef", "0123456789ABCDEF" };
You accept hexadecimal digits in either upper or lower case, but you
don't do the same for "0x" vs. "0X".

Sigh.

/* Make Keith Thompson Happy */
if (q[0] == '0' && (q[1] == 'x' || q[1] == 'X')) {
q += minl; l -= minl;
}

Consistency of operation is important.

That is a program design issue and is therefore off topic in this
NG. blah blah blah.
 
F

Flash Gordon

Christopher Layne wrote, On 31/01/07 10:17:
Flash said:
const size_t minl = 2;
const size_t nibl = 4;
const char delim[2] = "0x";
const char set[2][16] = { "0123456789abcdef", "0123456789ABCDEF" };
unsigned char lookup[2][256];
Why are these globals? Not critical in a standalone program, but when it
gets put in to a library as this obviously would it starts becoming
annoying.

This is not library code. This is "<OP> how do I do X?" Here's some X example
code on how to do it.

It isn't now, but it is most likely that in any real use it would become
a library function (for some value of library).

This you corrected in a subsequent post, but just in case that gets missed:
for (i = sizeof set[0]; i--; ) {
I would for clarity use:
for (i = sizeof set[0]; i > 0; i--) {

Unfortunately that is wrong and not the same effect as what I wrote.

True, I did not fully engage my brain.
p[i / 2] = n;
As a matter of style I do not see the point of n here.

Most likely because I was choosing to stick with an integer for integer
operations and then assign the result to the 1 byte character value. This
could be argued to be needless pre-opt and that n is superfluous.

It would have been done as int arithmetic anyway. The type of the
variable you are assigning an expression to has no effect on how the
expression is evaluated, only on what (if any) conversion occurs during
the assignment.

As I said, I consider it a style issue nothing more. Had I not had other
comments I would not have mentioned it.
Yep. Definitely assumes an 8-bit char. What's your entirely portable
solution? :)

Grab some code from a trusted source ;-)

Actually, I would want a tighter specification for what was required on
systems with larger char types. How many bits do you want packed in each
unsigned char, especially if its size is not a multiple of 4 bits?
 
C

Christopher Layne

Richard said:
**** off _again_, Kenny.

Richard

Haha, but I thought it was funny.

Alot of you guys don't like Kenny - but alot of times Kenny is also right.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top