Read hex string to a buf

cppbeginner · Jan 31, 2007

Hello,

Newbie question: I have a string of nibbles (say 2b11cc4f5db20cd5)
that can be put as

#define DATA 0x2b11cc4f5db20cd5 or
#define DATA "2b11cc4f5db20cd5"

Need to read this into a uint8 array buf such that each byte
represents a hex digit (formed out of 2 nibbles in DATA). What do you
think is the best way to do this?

Thanks for your help.

Ian Collins · Jan 31, 2007

Hello,

Newbie question: I have a string of nibbles (say 2b11cc4f5db20cd5)
that can be put as

#define DATA 0x2b11cc4f5db20cd5 or
#define DATA "2b11cc4f5db20cd5"

Need to read this into a uint8 array buf such that each byte
represents a hex digit (formed out of 2 nibbles in DATA). What do you
think is the best way to do this?

Unless your system has an integer type that can hold 0x2b11cc4f5db20cd5
or whatever the biggest value you have to process is, the string is your
only viable option.

Christopher Layne · Jan 31, 2007

Hello,

Newbie question: I have a string of nibbles (say 2b11cc4f5db20cd5)
that can be put as

#define DATA 0x2b11cc4f5db20cd5 or
#define DATA "2b11cc4f5db20cd5"

Need to read this into a uint8 array buf such that each byte
represents a hex digit (formed out of 2 nibbles in DATA). What do you
think is the best way to do this?

Thanks for your help.

I can think of about a 100 different ways to do it depending on how goofy you
want to get with strtol, snprintf, etc. If you don't want to use any library
functions for the actual hex work:

/*
* deadbeef.c: goofy code to convert hex bytes to unsigned char
* input should be an even number of characters or you'll fry your cpu cache.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

const size_t minl = 2;
const size_t nibl = 4;
const char delim[2] = "0x";
const char set[2][16] = { "0123456789abcdef", "0123456789ABCDEF" };
unsigned char lookup[2][256];

int main(int argc, char **argv)
{
size_t i, l, n;
char *q;
unsigned char *p;

if (argc < 2) return -1;
if ((l = strlen((q = argv[1]))) < minl || (l & 0x01))
return -1;

for (i = sizeof set; i--; ) {
lookup[0][(size_t)set[0]] = i;
lookup[0][(size_t)set[1]] = i;
lookup[1] = set[0];
}

if (memcmp(q, delim, minl) == 0) {
q += minl; l -= minl;
}

if ((p = malloc(sizeof *p * l / 2)) == NULL)
return -1;

for (i = l; i; ) {
n = lookup[0][(size_t)q[--i]] << nibl * 0;
n += lookup[0][(size_t)q[--i]] << nibl * 1;
p[i / 2] = n;
}

for (l /= 2; i < l; i++) {
n = p;
fprintf(stdout, "p[%2.2u] == 0x%c%c %3.3u\n",
i,
lookup[1][(n & 0xf0) >> nibl * 1],
lookup[1][(n & 0x0f) >> nibl * 0],
n);
}

free(p);

return 0;
}

$ cc -O3 -W -Wall -Werror -pedantic -o hex2dec hex2dec.c
$ ./hex2dec 0xdeadbeef00aBadCafe00decafBAD000123456789abcdeffedcba9876543210
p[00] == 0xde 222
p[01] == 0xad 173
p[02] == 0xbe 190
p[03] == 0xef 239
p[04] == 0x00 000
p[05] == 0xab 171
p[06] == 0xad 173
p[07] == 0xca 202
p[08] == 0xfe 254
p[09] == 0x00 000
p[10] == 0xde 222
p[11] == 0xca 202
p[12] == 0xfb 251
p[13] == 0xad 173
p[14] == 0x00 000
p[15] == 0x01 001
p[16] == 0x23 035
p[17] == 0x45 069
p[18] == 0x67 103
p[19] == 0x89 137
p[20] == 0xab 171
p[21] == 0xcd 205
p[22] == 0xef 239
p[23] == 0xfe 254
p[24] == 0xdc 220
p[25] == 0xba 186
p[26] == 0x98 152
p[27] == 0x76 118
p[28] == 0x54 084
p[29] == 0x32 050
p[30] == 0x10 016

Christopher Layne · Jan 31, 2007

Christopher Layne wrote:

Of course...

for (i = sizeof set; i--; ) {

for (i = sizeof set[0]; i--; ) {

Keith Thompson · Jan 31, 2007

Christopher Layne said:
I can think of about a 100 different ways to do it depending on how goofy you
want to get with strtol, snprintf, etc. If you don't want to use any library
functions for the actual hex work:

/*
* deadbeef.c: goofy code to convert hex bytes to unsigned char
* input should be an even number of characters or you'll fry your cpu cache.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

const size_t minl = 2;
const size_t nibl = 4;
const char delim[2] = "0x";
const char set[2][16] = { "0123456789abcdef", "0123456789ABCDEF" };

You accept hexadecimal digits in either upper or lower case, but you
don't do the same for "0x" vs. "0X".

unsigned char lookup[2][256];

int main(int argc, char **argv)
{
size_t i, l, n;
char *q;
unsigned char *p;

if (argc < 2) return -1;
if ((l = strlen((q = argv[1]))) < minl || (l & 0x01))
return -1;

for (i = sizeof set; i--; ) {
lookup[0][(size_t)set[0]] = i;
lookup[0][(size_t)set[1]] = i;

[...]

These casts (like most casts) are unnecessary. An array index can be
of any integer type.

(I haven't really looked at most of your program; a couple of things
just jumped out at me.)

Christopher Layne · Jan 31, 2007

Keith said:
The <unistd.h> header is non-portable, and your program doesn't use it
anyway.
Habit.

const size_t minl = 2;
const size_t nibl = 4;
const char delim[2] = "0x";
const char set[2][16] = { "0123456789abcdef", "0123456789ABCDEF" };

Click to expand...

You accept hexadecimal digits in either upper or lower case, but you
don't do the same for "0x" vs. "0X".

Sigh.

/* Make Keith Thompson Happy */
if (q[0] == '0' && (q[1] == 'x' || q[1] == 'X')) {
q += minl; l -= minl;
}

These casts (like most casts) are unnecessary. An array index can be
of any integer type.

$ cc -g3 -O3 -W -Wall -Werror -pedantic -o hex2dec hex2dec.c
cc1: warnings being treated as errors
hex2dec.c: In function 'main':
hex2dec.c:25: warning: array subscript has type 'char'
hex2dec.c:26: warning: array subscript has type 'char'
hex2dec.c:38: warning: array subscript has type 'char'
hex2dec.c:39: warning: array subscript has type 'char'

(I haven't really looked at most of your program; a couple of things
just jumped out at me.)

Good ole comp.lang.c. Quick to point out any error. Care to add anything else?

Keith Thompson · Jan 31, 2007

Christopher Layne said:
Keith Thompson wrote: [...]

const size_t minl = 2;
const size_t nibl = 4;
const char delim[2] = "0x";
const char set[2][16] = { "0123456789abcdef", "0123456789ABCDEF" };

Click to expand...

You accept hexadecimal digits in either upper or lower case, but you
don't do the same for "0x" vs. "0X".

Click to expand...

Sigh.

/* Make Keith Thompson Happy */
if (q[0] == '0' && (q[1] == 'x' || q[1] == 'X')) {
q += minl; l -= minl;
}

It doesn't make me either happy or unhappy.

$ cc -g3 -O3 -W -Wall -Werror -pedantic -o hex2dec hex2dec.c
cc1: warnings being treated as errors
hex2dec.c: In function 'main':
hex2dec.c:25: warning: array subscript has type 'char'
hex2dec.c:26: warning: array subscript has type 'char'
hex2dec.c:38: warning: array subscript has type 'char'
hex2dec.c:39: warning: array subscript has type 'char'

Whoops, I should have paid more attention; I didn't realize the
subscripts were of type char (which can be signed on some
implementations). The values you're using happen to be guaranteed to
be positive, but the compiler understandably didn't figure that out.

I'm not sure size_t is the best type to convert to, but it's not
unreasonable. Or you might declare "set" as an array of unsigned
char.

Good ole comp.lang.c. Quick to point out any error. Care to add
anything else?

Not at the moment. (In my opinion, being "quick to point out any
error" is a good thing; do you disagree?)

Kenny McCormack · Jan 31, 2007

Christopher Layne said:
Christopher Layne said:

Keith Thompson wrote: [...]

const size_t minl = 2;
const size_t nibl = 4;
const char delim[2] = "0x";
const char set[2][16] = { "0123456789abcdef", "0123456789ABCDEF" };

You accept hexadecimal digits in either upper or lower case, but you
don't do the same for "0x" vs. "0X".

Click to expand...

Sigh.

/* Make Keith Thompson Happy */
if (q[0] == '0' && (q[1] == 'x' || q[1] == 'X')) {
q += minl; l -= minl;
}

Click to expand...

It doesn't make me either happy or unhappy.

There's a rumor floating around that KT was happy. Once.
Sometime back in the mid 60s. Nobody's sure what caused it.
Many don't believe it ever happened.

Flash Gordon · Jan 31, 2007

Christopher Layne wrote, On 31/01/07 08:01:

Keith said:
Keith said:

The <unistd.h> header is non-portable, and your program doesn't use it
anyway.
Habit.

const size_t minl = 2;
const size_t nibl = 4;
const char delim[2] = "0x";
const char set[2][16] = { "0123456789abcdef", "0123456789ABCDEF" };

Click to expand...

You accept hexadecimal digits in either upper or lower case, but you
don't do the same for "0x" vs. "0X".

Click to expand...

Sigh.

/* Make Keith Thompson Happy */
if (q[0] == '0' && (q[1] == 'x' || q[1] == 'X')) {
q += minl; l -= minl;
}

Consistency of operation is important.

$ cc -g3 -O3 -W -Wall -Werror -pedantic -o hex2dec hex2dec.c
cc1: warnings being treated as errors
hex2dec.c: In function 'main':
hex2dec.c:25: warning: array subscript has type 'char'
hex2dec.c:26: warning: array subscript has type 'char'
hex2dec.c:38: warning: array subscript has type 'char'
hex2dec.c:39: warning: array subscript has type 'char'

So? Compilers are allowed to warn about perfectly good code. In my case,
for serious work, I have that warning disabled.

Good ole comp.lang.c. Quick to point out any error. Care to add anything else?

Maybe.

Flash Gordon · Jan 31, 2007

Christopher Layne wrote, On 31/01/07 04:32:

I can think of about a 100 different ways to do it depending on how goofy you
want to get with strtol, snprintf, etc. If you don't want to use any library
functions for the actual hex work:

/*
* deadbeef.c: goofy code to convert hex bytes to unsigned char
* input should be an even number of characters or you'll fry your cpu cache.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

Another poster pointed out that unistd.h is non-standard and not required.

const size_t minl = 2;
const size_t nibl = 4;
const char delim[2] = "0x";
const char set[2][16] = { "0123456789abcdef", "0123456789ABCDEF" };
unsigned char lookup[2][256];

Why are these globals? Not critical in a standalone program, but when it
gets put in to a library as this obviously would it starts becoming
annoying.

int main(int argc, char **argv)
{
size_t i, l, n;
char *q;
unsigned char *p;

if (argc < 2) return -1;
if ((l = strlen((q = argv[1]))) < minl || (l & 0x01))
return -1;

Non-standard return from main. If there are multiple different failure
codes there is reason for non-standard values, but that is not the case
here.
return EXIT_FAILURE;

for (i = sizeof set; i--; ) {

This you corrected in a subsequent post, but just in case that gets missed:
for (i = sizeof set[0]; i--; ) {
I would for clarity use:
for (i = sizeof set[0]; i > 0; i--) {

lookup[0][(size_t)set[0]] = i;
lookup[0][(size_t)set[1]] = i;
lookup[1] = set[0];
}

if (memcmp(q, delim, minl) == 0) {

This you changed to make it insensitive to case in response to another
poster.

q += minl; l -= minl;
}

if ((p = malloc(sizeof *p * l / 2)) == NULL)
return -1;

Click to expand...

Again,
return EXIT_FAILURE;

for (i = l; i; ) {
n = lookup[0][(size_t)q[--i]] << nibl * 0;
n += lookup[0][(size_t)q[--i]] << nibl * 1;

Click to expand...

If char is signed then yuck. Make q an unsigned char pointer and get rid
of the cast here. Admittedly that means you will need a cast further up,
but only the one.

p[i / 2] = n;

Click to expand...

As a matter of style I do not see the point of n here.

}

for (l /= 2; i < l; i++) {
n = p;
fprintf(stdout, "p[%2.2u] == 0x%c%c %3.3u\n",
i,
lookup[1][(n & 0xf0) >> nibl * 1],
lookup[1][(n & 0x0f) >> nibl * 0],
n);

Click to expand...

As a matter of style I do not see the point of n here either.

}

free(p);

return 0;
}

Click to expand...

<snip>

Well, that seems to work, although it does have an undocumented
assumption of an 8 bit char and I think it could misbehave on a 1s
complement or sign-magnitude system. Fine for anything the OP is likely
to come across unless getting in to embedded systems though where you do
get 16, 24 and even 32 bit chars.

Christopher Layne · Jan 31, 2007

Keith said:
Whoops, I should have paid more attention; I didn't realize the
subscripts were of type char (which can be signed on some
implementations). The values you're using happen to be guaranteed to
be positive, but the compiler understandably didn't figure that out.

I'm not sure size_t is the best type to convert to, but it's not
unreasonable. Or you might declare "set" as an array of unsigned
char.

Yep. It's of type char as I'm using it as a simple character hash. Anyways, I
agree that just using an array of unsigned char rather than char is cleaner
and achieves the same goal in the end. Unfortunately, it's a 'choose your
battle' as I'm using the same technique further down and changing 'q' to
unsigned char means it's now time to fight with casting argv[1]. But this
is only 1 cast vs 4, so comparitively a lesser deal.

In reference to size_t, I've always used this in situations related to array
indexing as I have always been under the impression size_t is guaranteed
to be able to represent an index value. Of course, things wouldn't go so well
if one of the characters within that above set were actually a negative
value - then again, casting to int wouldn't help either.

(gdb) p /d (size_t)(char)129
$1 = 4294967169
(gdb) p /d (int)(char)129
$2 = -127

Although I did have direct control over the characters I used in the set,
still a better decision to use unsigned char, agreed.

Christopher Layne · Jan 31, 2007

Flash said:
const size_t minl = 2;
const size_t nibl = 4;
const char delim[2] = "0x";
const char set[2][16] = { "0123456789abcdef", "0123456789ABCDEF" };
unsigned char lookup[2][256];

Click to expand...

Why are these globals? Not critical in a standalone program, but when it
gets put in to a library as this obviously would it starts becoming
annoying.

Non-standard return from main. If there are multiple different failure
codes there is reason for non-standard values, but that is not the case
here.
return EXIT_FAILURE;
Alrighty.

This you corrected in a subsequent post, but just in case that gets missed:
for (i = sizeof set[0]; i--; ) {
I would for clarity use:
for (i = sizeof set[0]; i > 0; i--) {

Unfortunately that is wrong and not the same effect as what I wrote.

p[i / 2] = n;

Click to expand...

As a matter of style I do not see the point of n here.

Most likely because I was choosing to stick with an integer for integer
operations and then assign the result to the 1 byte character value. This
could be argued to be needless pre-opt and that n is superfluous.

Well, that seems to work, although it does have an undocumented
assumption of an 8 bit char and I think it could misbehave on a 1s
complement or sign-magnitude system. Fine for anything the OP is likely
to come across unless getting in to embedded systems though where you do
get 16, 24 and even 32 bit chars.

Yep. Definitely assumes an 8-bit char. What's your entirely portable
solution?

CBFalconer · Jan 31, 2007

Christopher said:
Keith Thompson wrote:
.... snip ...

$ cc -g3 -O3 -W -Wall -Werror -pedantic -o hex2dec hex2dec.c
cc1: warnings being treated as errors
hex2dec.c: In function 'main':
hex2dec.c:25: warning: array subscript has type 'char'
hex2dec.c:26: warning: array subscript has type 'char'
hex2dec.c:38: warning: array subscript has type 'char'
hex2dec.c:39: warning: array subscript has type 'char'

Yup. It's talking about set, and pointing out your error. Try
defining set as either signed or unsigned char. As usual, a cast
is probably an error.

Good ole comp.lang.c. Quick to point out any error. Care to add
anything else?

Yup. Wrong specification for printing an uncast size_t in your
printf statement. Here you have a real reason for using a cast, at
least in C90.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews

Christopher Layne · Jan 31, 2007

CBFalconer said:
Yup. It's talking about set, and pointing out your error. Try
defining set as either signed or unsigned char. As usual, a cast
is probably an error.

Hah, you guys. We've hashed the angles about 4 different ways

.

Either way, something will be cast. If it's not the array indices, it's
strlen() and/or assigment from argv[1] to q.

Yup. Wrong specification for printing an uncast size_t in your
printf statement. Here you have a real reason for using a cast, at
least in C90.

You're right on that one - and I usually cast to unsigned long and use "%lu",
it just so coincidentally happened that I didn't use "%lu", and on my
host "%u" results in no warnings.

CBFalconer · Jan 31, 2007

Christopher said:
CBFalconer said:

Yup. It's talking about set, and pointing out your error. Try
defining set as either signed or unsigned char. As usual, a cast
is probably an error.

Click to expand...

Hah, you guys. We've hashed the angles about 4 different ways .

Either way, something will be cast. If it's not the array indices, it's
strlen() and/or assigment from argv[1] to q.

Yup. Wrong specification for printing an uncast size_t in your
printf statement. Here you have a real reason for using a cast,
at least in C90.

Click to expand...

You're right on that one - and I usually cast to unsigned long and
use "%lu", it just so coincidentally happened that I didn't use
"%lu", and on my host "%u" results in no warnings.

Note that AFAICS all your errors are due to casts, the misuse of.

Richard · Feb 2, 2007

Flash Gordon said:
Christopher Layne wrote, On 31/01/07 08:01:

Keith said:

The <unistd.h> header is non-portable, and your program doesn't use it
anyway.
Habit.

const size_t minl = 2;
const size_t nibl = 4;
const char delim[2] = "0x";
const char set[2][16] = { "0123456789abcdef", "0123456789ABCDEF" };
You accept hexadecimal digits in either upper or lower case, but you
don't do the same for "0x" vs. "0X".

Click to expand...

Sigh.

/* Make Keith Thompson Happy */
if (q[0] == '0' && (q[1] == 'x' || q[1] == 'X')) {
q += minl; l -= minl;
}

Click to expand...

Consistency of operation is important.

That is a program design issue and is therefore off topic in this
NG. blah blah blah.

Richard Bos · Feb 2, 2007

Richard said:
That is a program design issue and is therefore off topic in this
NG. blah blah blah.

**** off _again_, Kenny.

Richard

Default User · Feb 2, 2007

Richard wrote:

That is a program design issue and is therefore off topic in this
NG. blah blah blah.

Ok, that's enough of you.

*plonk*

Brian

Flash Gordon · Feb 2, 2007

Christopher Layne wrote, On 31/01/07 10:17:

Flash said:
Flash said:

const size_t minl = 2;
const size_t nibl = 4;
const char delim[2] = "0x";
const char set[2][16] = { "0123456789abcdef", "0123456789ABCDEF" };
unsigned char lookup[2][256];

Click to expand...

Why are these globals? Not critical in a standalone program, but when it
gets put in to a library as this obviously would it starts becoming
annoying.

Click to expand...

This is not library code. This is "<OP> how do I do X?" Here's some X example
code on how to do it.

It isn't now, but it is most likely that in any real use it would become
a library function (for some value of library).

This you corrected in a subsequent post, but just in case that gets missed:
for (i = sizeof set[0]; i--; ) {
I would for clarity use:
for (i = sizeof set[0]; i > 0; i--) {

Click to expand...

Unfortunately that is wrong and not the same effect as what I wrote.

True, I did not fully engage my brain.

p[i / 2] = n;

Click to expand...

As a matter of style I do not see the point of n here.

Click to expand...

Most likely because I was choosing to stick with an integer for integer
operations and then assign the result to the 1 byte character value. This
could be argued to be needless pre-opt and that n is superfluous.

It would have been done as int arithmetic anyway. The type of the
variable you are assigning an expression to has no effect on how the
expression is evaluated, only on what (if any) conversion occurs during
the assignment.

As I said, I consider it a style issue nothing more. Had I not had other
comments I would not have mentioned it.

Yep. Definitely assumes an 8-bit char. What's your entirely portable
solution?

Grab some code from a trusted source ;-)

Actually, I would want a tighter specification for what was required on
systems with larger char types. How many bits do you want packed in each
unsigned char, especially if its size is not a multiple of 4 bits?

Christopher Layne · Feb 2, 2007

Richard said:
**** off _again_, Kenny.

Richard

Haha, but I thought it was funny.

Alot of you guys don't like Kenny - but alot of times Kenny is also right.

How to try a range of hex values in C# code ?	0	Nov 19, 2022
Converting an Array to a String in JavaScript	7	Sep 22, 2023
convert byte array to hex string using BigInteger	21	Jun 20, 2013
Incrementing hex address	5	Nov 30, 2011
converting strings to hex	10	Apr 4, 2014
Reading a file of hex chars	6	Nov 22, 2009
problem with encryption function sending hex values	2	Nov 27, 2005
Problem Splitting Text String	2	Dec 29, 2022

Read hex string to a buf

cppbeginner

Ian Collins

Christopher Layne

Christopher Layne

Keith Thompson

Christopher Layne

Keith Thompson

Kenny McCormack

Flash Gordon

Flash Gordon

Christopher Layne

Christopher Layne

CBFalconer

Christopher Layne

CBFalconer

Richard

Richard Bos

Default User

Flash Gordon

Christopher Layne

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads