Convert HEX string to bin

The Real OS/2 Guy · Dec 1, 2003

Instead of all the complexity, and assuming that the OP will not
use the strto*() family for some reason, the digit conversions can
be done by:

#include <string.h>

/* Convert hex char to value, -1 for non-hex char */
int unhexify(char c)
{
static hexchars = "0123456789abcdefABCDEF";
char *p;
int val;

val = -1; /* default assume non-hex */
if ((p = strchr(hexchars, c))) {
val = p - hexchars;
if (val > 15) val = val - 6;
}
return val;
} /* unhexify, untested */

Which I believe to be fully portable, including char coding. Now
the OP can convert his (terminated) input string with:

Yes, but the OP was asking for a method to save all possible runtime.
And a table lookup is relatively slow. So it is justified to have a
more complex code but save runtime.

char *p
unsigned int v;
int d;

.....
p = &instring[0];
v = 0;
while ((d = unhexify(*p)) >= 0) {
v = 16 * v + d;
++p;
}
/* desired result in v */

Calling a function for each char is another brake. I had not used
pointers because I'm not sure that the OP is ready to understund
pointer arithmetc yet and any good compiler should create even the
same code.

By that, the separate p++ can be another brake as many real CPU can do
*p++ in an single instruction instead of *p and some times later p +=
1.

Sidney Cadot · Dec 1, 2003

GIR said:
Okay here's a little explenation and background info.

This is for a little project which somebody dumped on mu desk. The
idea is to have a 8051 derative (Infineon 80c515A) process audio at
8bit mono 8khz.

There is a preprocessor which converts the binary wavefile to hex and
strips the header/footer. Don't ask me why, it wasn't my idea. In my
opinion it would be much simpeler if the preprocessor would just dump
everything to the serialport, now we go from bin=>hex - serial -
hex=>bin. The preprocessor is out of my range. In the docs it's
specified that the preprocessor will check the stream for errors and
such, so I don't have to worry about that.

But as I understand it, the preprocessor is sitting on the other side of
the serial line. So what should happen if an 'A' is converted to a '@' ?
It only takes one bit-fault.

What happens when the microcontroller receives the data over the
serialport? First of all a interupt is generated, the data on the
serial line is placed in SBUF (a special function register) and the
processor jumps to the interuptvector depending on it's priority.

Why the 32k length of the buffer? I don't know why... The thing has
64k of memory so I just took half of that, remember this is just
version 0.000000001a. I was thinking of building a dynamic listm but
that seriously cuts in on the available memory. For instance:

struct mem_byte {
char mem;
char *next_byte;
};

So I need a byte for storing the info and I need a byte for storing
the adres of the next byte. That's double...

Make that triple. Since you're working with 16-bit addressable memory,
the pointer will be at least 2 chars. An then there is the heap manager
overhead. It's a good idea you dumped this idea.

Anywayz, I'll just run over to the guyz who are did the preprocessor
and "ask" (read yell and order) them if it isn't better for them to
just dump binary data on the line.

Most certainly. Yell and order them to include a CRC as well, every 256
bytes or so, for starters. Make sure you get a quantitive handle on the
number of bit-faults you get, and think of how to handle them.

Anywayz, tnx for your help. You guyz got a nice little group going
here

I'm quite sure you would be better of coding this little gadget in
assembly. It wouldn't be too hard.

Best regards,

Sidney

glen herrmannsfeldt · Dec 2, 2003

The Real OS/2 Guy wrote:

(snip)

The standard guarantees thet '0' to '9' are continous, but there is no
guarantee that 'a' - 'f' or 'A' - 'F' have the same continuity, so use
a switch for them makes it portable.

Yes, the standard doesn't guarantee that.

It is, however, true in both ASCII and EBCDIC.

Do you know anyone using a different character set?

-- glen

Richard Heathfield · Dec 2, 2003

glen said:
Yes, the standard doesn't guarantee that.

It is, however, true in both ASCII and EBCDIC.

Do you know anyone using a different character set?

That isn't the right criterion to apply here. Rather, we should ask
ourselves whether there is anyone we /don't/ know who is using a different
character set, and whether we want our code to work on their machine as
well.

CBFalconer · Dec 2, 2003

The said:
Yes, but the OP was asking for a method to save all possible
runtime. And a table lookup is relatively slow. So it is
justified to have a more complex code but save runtime.

Possibly, but try it first. strchr may be very efficient.

char *p
unsigned int v;
int d;

.....
p = &instring[0];
v = 0;
while ((d = unhexify(*p)) >= 0) {
v = 16 * v + d;
++p;
}
/* desired result in v */

Click to expand...

Calling a function for each char is another brake. I had not used
pointers because I'm not sure that the OP is ready to understund
pointer arithmetc yet and any good compiler should create even the
same code.

By that, the separate p++ can be another brake as many real CPU
can do *p++ in an single instruction instead of *p and some times
later p += 1.

That combination is a trap. It would allow the pointer to be
advanced past the known area of instring, which may or may not be
legitimate.

I rather doubt that speed is any great consideration for the OP -
he wants to minimize code size. I suspect that these will be used
in i/o, and thus limited by the i/o rates anyhow.

CBFalconer · Dec 2, 2003

glen said:
The Real OS/2 Guy wrote:

(snip)

Yes, the standard doesn't guarantee that.

It is, however, true in both ASCII and EBCDIC.

Do you know anyone using a different character set?

Yes

The Real OS/2 Guy · Dec 2, 2003

Okay here's a little explenation and background info.

This is for a little project which somebody dumped on mu desk. The
idea is to have a 8051 derative (Infineon 80c515A) process audio at
8bit mono 8khz.

Ok so far. But gets you an unordered stream of data or is there some
more logic behind that?

Maybe you can get the stream ordered by telegrams. That means you get
at first the size (one or two bytes and then the data block from
outside? Yes, when they can send you the data in binary it saves both
ends some time (converting nibbles to hex char and back, the number of
bytes needed to transfer.
Is the line trusted? When not packing a CRC in the telegram (with the
possibility to correct single (or multibit errors) would make an
untrusted line (nearly) trusted.

May be you should simply receive note by note - so your transfer
buffer can shrink significantly. As any byte you saves gives you more
freedom of use the limited memory for other things.

Why the 32k length of the buffer? I don't know why... The thing has
64k of memory so I just took half of that, remember this is just
version 0.000000001a. I was thinking of building a dynamic listm but
that seriously cuts in on the available memory. For instance:

When it is possible to break the stream into little telegrams you may
use a circular buffer. Wheras you works on a single telegram while the
next ones gets received.

The Real OS/2 Guy · Dec 2, 2003

The Real OS/2 Guy wrote:

(snip)

Yes, the standard doesn't guarantee that.

It is, however, true in both ASCII and EBCDIC.

Do you know anyone using a different character set?

I don't know any kind of computer having an C imlementation. There are
too many different kinds of processors on the world. So programming in
a manner that does not requires something the standard does not
guarantee is errornous whenever it is possible to do it ANSI
compilant.

Dan Pop · Dec 2, 2003

In said:
The OP is of course presuming ascii... c-'0'-7 to get 'A' to 10 is the clue.

The trivial, assumption free approach is to define the hex digits
yourself:

int hex2dec(char c)
{
char *hexdigs = "01234567890ABCDEF";
char *p = strchr(hexdigs, toupper((unsigned char)c));

if (p != NULL)
return p - hexdigs;
else
return -1;
}

Trivia quiz: find the bug in this code.

Dan

Johan Aurer · Dec 2, 2003

The trivial, assumption free approach is to define the hex digits
yourself:

int hex2dec(char c)
{
char *hexdigs = "01234567890ABCDEF";
char *p = strchr(hexdigs, toupper((unsigned char)c));

if (p != NULL)
return p - hexdigs;
else
return -1;
}

Trivia quiz: find the bug in this code.

Which one? The c == 0 bug?

Dan Pop · Dec 2, 2003

In said:
That isn't the right criterion to apply here. Rather, we should ask
ourselves whether there is anyone we /don't/ know who is using a different
character set, and whether we want our code to work on their machine as
well.

As usual, in real world programming, when there is a tradeoff between
portability and performance, the good programmer makes the right choice:
if the portable code is fast enough, it is used, otherwise the fast code
is used and the assumptions it relies upon are clearly documented.

Dan

Irrwahn Grausewitz · Dec 2, 2003

The trivial, assumption free approach is to define the hex digits
yourself:

int hex2dec(char c)
{
char *hexdigs = "01234567890ABCDEF";
char *p = strchr(hexdigs, toupper((unsigned char)c));

if (p != NULL)
return p - hexdigs;
else
return -1;
}

Trivia quiz: find the bug in this code.

You mean: the bugs (plural!).

1. failure to #include <string.h>

2. failure to #include <ctype.h>

3. "01234567890ABCDEF" should be "0123456789ABCDEF"

4. With 1.-3. corrected, hex2dev will return 16 if c equals zero,
because strchr will return a pointer to the terminating null
character of the digit string (the terminating null character is
considered to be part of the string).

What did I win?

Regards

Dan Pop · Dec 2, 2003

In said:
Instead of all the complexity, and assuming that the OP will not
use the strto*() family for some reason, the digit conversions can
be done by:

#include <string.h>

/* Convert hex char to value, -1 for non-hex char */
int unhexify(char c)
{
static hexchars = "0123456789abcdefABCDEF";
char *p;
int val;

val = -1; /* default assume non-hex */
if ((p = strchr(hexchars, c))) {
val = p - hexchars;
if (val > 15) val = val - 6;
}
return val;
} /* unhexify, untested */

It has the same bug that I deliberately left unfixed in my version, posted
to the original thread ;-)

Dan

Richard Bos · Dec 2, 2003

Irrwahn Grausewitz said:
You mean: the bugs (plural!).

1. failure to #include <string.h>

2. failure to #include <ctype.h>

3. "01234567890ABCDEF" should be "0123456789ABCDEF"

4. With 1.-3. corrected, hex2dev will return 16 if c equals zero,
because strchr will return a pointer to the terminating null
character of the digit string (the terminating null character is
considered to be part of the string).

5. In some locales, passing an accented character, which is not a hex
digit, to toupper() may result in an upper-case unaccented character,
which is.

Richard

Dan Pop · Dec 2, 2003

In said:
(e-mail address removed) (Dan Pop) wrote:

You mean: the bugs (plural!).

1. failure to #include <string.h>

2. failure to #include <ctype.h>

These are deliberate omissions. My code was not supposed to be a
complete translation unit.

3. "01234567890ABCDEF" should be "0123456789ABCDEF"

That's called a typo. I had to reread my string several times before
seeing it.

4. With 1.-3. corrected, hex2dev will return 16 if c equals zero,
because strchr will return a pointer to the terminating null
character of the digit string (the terminating null character is
considered to be part of the string).

What did I win?

Did I promise anything? ;-)

Dan

The Real OS/2 Guy · Dec 2, 2003

Possibly, but try it first. strchr may be very efficient.

Not so efficient than 2 if. Even not so efficieant than 2 if and a
switch of at least only 6 cases.

char *p
unsigned int v;
int d;

.....
p = &instring[0];
v = 0;
while ((d = unhexify(*p)) >= 0) {
v = 16 * v + d;
++p;
}
/* desired result in v */

Click to expand...

Calling a function for each char is another brake. I had not used
pointers because I'm not sure that the OP is ready to understund
pointer arithmetc yet and any good compiler should create even the
same code.

By that, the separate p++ can be another brake as many real CPU
can do *p++ in an single instruction instead of *p and some times
later p += 1.

Click to expand...

That combination is a trap. It would allow the pointer to be
advanced past the known area of instring, which may or may not be
legitimate.

And that is really wrong. As the standard allows to build the address
of the first member directly behind the array.

I rather doubt that speed is any great consideration for the OP -
he wants to minimize code size. I suspect that these will be used
in i/o, and thus limited by the i/o rates anyhow.

The OP required speed over size.

Irrwahn Grausewitz · Dec 2, 2003

These are deliberate omissions. My code was not supposed to be a
complete translation unit.

Fair enough.

That's called a typo. I had to reread my string several times before
seeing it.

That's the most dangerous kind of typos: the ones that aren't caught
by the compiler and are hard to find, even when you *know* they are
there (and yes, I know that you already knew that

.

Did I promise anything? ;-)

Nope, that's why I asked... ;D

Regards

glen herrmannsfeldt · Dec 2, 2003

Richard said:
glen herrmannsfeldt wrote:

That isn't the right criterion to apply here. Rather, we should ask
ourselves whether there is anyone we /don't/ know who is using a different
character set, and whether we want our code to work on their machine as
well.

I think I agree with Dan's answer here. Program in the real world, and
document what you do. As EBCDIC does have other characters between 'A'
and 'Z', (only '\\' and '}' on the chart I have) you can't depend on
just testing 'A' and 'Z' in a portable program if your program might run
on EBCDIC systems.

On the other hand, all these programs do assume that an alphabet with
characters like 'A' exists? People in some countries may disagree with
that assumption.

There is a program I know, written using EBCDIC characters, that has
comments indicating that if the source is translated to a different
character set it will process input in that character set. I don't
believe that has ever been done, but it is nice that they documented it.

I expect it is extremely unlikely that anyone will come up with a
character code based on the roman alphabet which doesn't have the
letters 'A' to 'F' in ascending order. It seems more likely that they
will want to use it with a non roman alphabet.

Oh, in EBCDIC '0' (zero, not oh) is greater than 'Z'. I believe one of
the posted programs assumed it was not.

-- glen

GIR · Dec 2, 2003

Ok so far. But gets you an unordered stream of data or is there some
more logic behind that?

Maybe you can get the stream ordered by telegrams. That means you get
at first the size (one or two bytes and then the data block from
outside? Yes, when they can send you the data in binary it saves both
ends some time (converting nibbles to hex char and back, the number of
bytes needed to transfer.
Is the line trusted? When not packing a CRC in the telegram (with the
possibility to correct single (or multibit errors) would make an
untrusted line (nearly) trusted.

May be you should simply receive note by note - so your transfer
buffer can shrink significantly. As any byte you saves gives you more
freedom of use the limited memory for other things.

Well the logic behind converting the binary file to HEX was that they
could convert it to Intel HEX and have a "easy" way of implementing a
checksum and such. Having 1 byte out of order isn't a big deal,
remember this is going at 8 khkz, that's 8000 times per sec. Having
just 1 borked block wouldn't make any difference, not to the untrained
ear anywayz. The quality is so low you wouldn't know the difference
anyway.

Error checking and such isn't a big priority rightnow, my job is too
look at what they already did (which is craphola to say the least) and
optimize it in such way that it's still useable and understandable to
them (So ASM is out). That means writing custom functions and methods
which can read/write to the serial interface without the use of the
standard functions.

I'm just going to stick to that job specification and let them handle
the rest. As you may have noticed, my job is not C

When it is possible to break the stream into little telegrams you may
use a circular buffer. Wheras you works on a single telegram while the
next ones gets received.

the buffer is already a circular buffer

I told them today that it would be easier for them to implement a
custom protocol. Like say 'C' followed by a string is a Command and
'D' followed by some binary data is Data.

I always wanted to say this and I quote: "Don't ask me I just work
here".

pete · Dec 2, 2003

glen said:
On the other hand, all these programs do assume that an alphabet with
characters like 'A' exists?
People in some countries may disagree with that assumption.

'A' exists in both the basic source and basic execution character sets.

convert string of hex characters to char	11	Oct 7, 2008
stringstream bin buffer to hex string conversion	4	Aug 28, 2008
Convert hex to bin	12	Mar 12, 2007
convert byte array to hex string using BigInteger	21	Jun 20, 2013
string to hex	3	Sep 20, 2005
Convert decimal values to hex and put in a string	6	Nov 20, 2007
convert dec number to HEX and oct and bin without c library function	7	Jun 10, 2005
convert string to hex	3	May 16, 2005

Convert HEX string to bin

The Real OS/2 Guy

Sidney Cadot

glen herrmannsfeldt

Richard Heathfield

CBFalconer

CBFalconer

The Real OS/2 Guy

The Real OS/2 Guy

Dan Pop

Johan Aurer

Dan Pop

Irrwahn Grausewitz

Dan Pop

Richard Bos

Dan Pop

The Real OS/2 Guy

Irrwahn Grausewitz

glen herrmannsfeldt

GIR

pete

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads