byte data manipulation

N

Niv (KP)

First, I haven't done any 'C' for a long time, (I'm VHDL conversant).

I need to read in a file that is all in hexadecimal, apart from a
colon line start, remove the preamble bytes & end checksum byte, and
then write the data bytes into a big array (2 megabytes total data).
The bytes are represented by two hex chars obviously. The preamble
contains info on the length of data in the line.

I then manipulate the array & write the data out again in a slightly
different format, calculating new preamble and checksums for the
output file.

I've done the whole job successfully in VHDL, but that needs a
simulator to run, and I've done it in Tcl, but it takes far too long
to run (hours!), so a simple exe file is needed really, but my 'C'
skills, little as they were, have all but gone!

Any help much appreciated.
 
J

James Kuyper

Niv (KP) wrote:
....
I need to read in a file that is all in hexadecimal,

Do you mean that it is a text file containing the data printed out as
hexidecimal numbers?
... apart from a
colon line start,

Does the file contain any other newline characters, or is it just one
long stream of hex digits?
... remove the preamble bytes & end checksum byte,

How do you know which bytes are the preamble bytes?
... and
then write the data bytes into a big array (2 megabytes total data).

What kind of array? If it's anything other than an array of unsigned
char, there will be additional questions that need to be answered.
The bytes are represented by two hex chars obviously. The preamble
contains info on the length of data in the line.

What format is that info in? Which line are you referring to? The only
line you've mentioned so far was the "colon line start".
I then manipulate the array & write the data out again in a slightly
different format, calculating new preamble and checksums for the
output file.

That paragraph covers what will probably be the most complicated part of
your program; but it's far too vague to be useful. What "manipulation"
are you performing on the array? The answer to that question will
determine what kind of array you want to use. What is the "slightly
different format"? How is the new preamble calculated? What algorithm
are you using for the checksum?
 
P

Paul

Niv said:
First, I haven't done any 'C' for a long time, (I'm VHDL conversant).

I need to read in a file that is all in hexadecimal, apart from a
colon line start, remove the preamble bytes & end checksum byte, and
then write the data bytes into a big array (2 megabytes total data).
The bytes are represented by two hex chars obviously. The preamble
contains info on the length of data in the line.

I then manipulate the array & write the data out again in a slightly
different format, calculating new preamble and checksums for the
output file.

I've done the whole job successfully in VHDL, but that needs a
simulator to run, and I've done it in Tcl, but it takes far too long
to run (hours!), so a simple exe file is needed really, but my 'C'
skills, little as they were, have all but gone!

Any help much appreciated.

When a data format is as old as the one you're referring to,
chances are a tool already exists. I suspect your format is
"Intel", while this program is named "srecord", but this
program understands multiple formats.

http://sourceforge.net/project/showfiles.php?group_id=72866&package_id=72815&release_id=662303

srecord-1.47.pdf
srecord-1.47.tar.gz

For example, something like this might work.

srec_cat infile -Intel -o outfile -binary

It is still going to be a challenging project, if, for
example, you want this to run under Windows. It might be
a quicker build, in a Linux/Unix environment. Putting together a
Windows environment using free tools, can take a while.

HTH,
Paul
 
F

Flash Gordon

Niv said:
First, I haven't done any 'C' for a long time, (I'm VHDL conversant).

I need to read in a file that is all in hexadecimal, apart from a
colon line start, remove the preamble bytes & end checksum byte, and
then write the data bytes into a big array (2 megabytes total data).

2meg might be larger than you can declare, so if you really need it all
in memory at the same time you might need to use malloc to grab the space.
The bytes are represented by two hex chars obviously. The preamble
contains info on the length of data in the line.

First you need a code to convert a hex digit to a number. A simple way
is to use an array of hex digits and convert the incoming characters to
a consistent case

const char * hexdigits="0123456789abcdef";
int pos = strchr(hexdigits,tolower(ch));
if (!pos) an error occured
else nibble = pos-hexdigits;

You can build up bytes as you did in vhdl with shifts.

If I was you I would validate the checksum on the incomming .
I then manipulate the array & write the data out again in a slightly
different format, calculating new preamble and checksums for the
output file.

That should be easy enough. There are format specifiers for specifying
hex output.
I've done the whole job successfully in VHDL, but that needs a
simulator to run, and I've done it in Tcl, but it takes far too long
to run (hours!), so a simple exe file is needed really, but my 'C'
skills, little as they were, have all but gone!

Any help much appreciated.

Write what you can and post it and I'm sure people will help. Sounds
like an easy enough task.
 
N

Niv (KP)

Niv (KP) wrote:

...


Do you mean that it is a text file containing the data printed out as
hexidecimal numbers?


Does the file contain any other newline characters, or is it just one
long stream of hex digits?


How do you know which bytes are the preamble bytes?


What kind of array? If it's anything other than an array of unsigned
char, there will be additional questions that need to be answered.


What format is that info in? Which line are you referring to? The only
line you've mentioned so far was the "colon line start".


That paragraph covers what will probably be the most complicated part of
your program; but it's far too vague to be useful. What "manipulation"
are you performing on the array? The answer to that question will
determine what kind of array you want to use. What is the "slightly
different format"? How is the new preamble calculated? What algorithm
are you using for the checksum?

Right, more details:

1. I what to create 2 arrays of 2Mbytes each.
2. Read in the input hex file, stripping of the preamble and checksum
bits, and putting the bytes (2 chars) into array_1.
3. Copy array_1 data to array_2, but in a non-linear fashion, using
lfsr or some such.
4. Write out array_2 in a linear address fashion, adding the preamble
(easy) and calculating a new checksum for each line, where the new
output lines have 16 bytes (32 chars). The checksum is just the 8 bit
sum of the preceding values in the preamble and the 16 bytes,
discarding overflows.

Is that enough info?
 
B

Ben Bacarisse

Niv (KP) said:
Right, more details:

1. I what to create 2 arrays of 2Mbytes each.
2. Read in the input hex file, stripping of the preamble and checksum
bits, and putting the bytes (2 chars) into array_1.
3. Copy array_1 data to array_2, but in a non-linear fashion, using
lfsr or some such.

Here you mean you don't access array_1 sequentially, yes? That
explains why you need the array.
4. Write out array_2 in a linear address fashion, adding the preamble
(easy) and calculating a new checksum for each line, where the new
output lines have 16 bytes (32 chars). The checksum is just the 8 bit
sum of the preceding values in the preamble and the 16 bytes,
discarding overflows.

Each line has 15 data bytes (30 hex chars) and one byte of checksum?

If so, may not need array_2 at all. The output of the LFSR[1] can be
checksumed as you go and the output generated on the fly. Of course
there may be a reason to copy the data, but its not clear from the
description so far.
Is that enough info?

Except for what bit you are having trouble with. I can offer a few
tips. Use unsigned char for the array (it will avoid problems with
bit-operations if char is signed on your system). Allocate it using
malloc since the size is not know until run-time:

unsigned char *buffer = malloc(<size in bytes goes here>);

The simplest way to read two hex chars and put the result into a
single byte is:

fscanf(file, "%2hhx", buffer + pos);

and increment pos as you go. The result of both malloc and fscanf
should be checked. If malloc returns NULL, the allocation failed.
If fscanf does not return 1 it did not read a number.

[1] Linear feedback shift register.
 
N

Niv (KP)

Right, more details:
1.  I what to create 2 arrays of 2Mbytes each.
2.  Read in the input hex file, stripping of the preamble and checksum
bits, and putting the bytes (2 chars) into array_1.
3.  Copy array_1 data to array_2, but in a non-linear fashion, using
lfsr or some such.

Here you mean you don't access array_1 sequentially, yes?  That
explains why you need the array.
4.  Write out array_2 in a linear address fashion, adding the preamble
(easy) and calculating a new checksum for each line, where the new
output lines have 16 bytes (32 chars).  The checksum is just the 8 bit
sum of the preceding values in the preamble and the 16 bytes,
discarding overflows.

Each line has 15 data bytes (30 hex chars) and one byte of checksum?

If so, may not need array_2 at all.  The output of the LFSR[1] can be
checksumed as you go and the output generated on the fly.  Of course
there may be a reason to copy the data, but its not clear from the
description so far.
Is that enough info?

Except for what bit you are having trouble with.  I can offer a few
tips.  Use unsigned char for the array (it will avoid problems with
bit-operations if char is signed on your system).  Allocate it using
malloc since the size is not know until run-time:

  unsigned char *buffer = malloc(<size in bytes goes here>);

The simplest way to read two hex chars and put the result into a
single byte is:

  fscanf(file, "%2hhx", buffer + pos);

and increment pos as you go.  The result of both malloc and fscanf
should be checked.  If malloc returns NULL, the allocation failed.
If fscanf does not return 1 it did not read a number.

[1] Linear feedback shift register.

The input file is in intel format "hexout", as follows;

colon, 1 byte data length, 2 bytes address, 1 byte data type, <length>
dtat bytes, checksum byte; all in hex,

e.g. : 20 0010 00 <32 data bytes> <checksum> (all without spaces
of course).

I need to read in all the data bytes, up to 2 meg, into an array, re-
arrange that array, easiest to copy to another array by indexing the
array pointer in a non-linear fashion, then write out the full 2 meg
of the new array, again in an intel format, with only 16 data bytes
per line. So, I'll need to calculate the new checksum over the length
byte, address bytes, type byte and data bytes; where the csum is just
the hex sum of all these, ignoring any carry.

Like I said, I have a fully working version in VHDL, which takes about
30 seconds to run, but ties up a simulator, if one was avaialable. So
I think an exe file is better. I can wrap the exe in a tcl frame to
get the input and output files/destination etc, but my C knowledge is
virtually nil. Trying to read "teach yourself in 24 hrs" but i need a
quick answer really, haven't got time to waste on learning allof C
right now, although I shall persevere with the learning!
 
B

Ben Bacarisse

Niv (KP) said:
The input file is in intel format "hexout", as follows;

Ah. Often best to include a link to a format description if it is a
well-know format.
colon, 1 byte data length, 2 bytes address, 1 byte data type, <length>
dtat bytes, checksum byte; all in hex,

e.g. : 20 0010 00 <32 data bytes> <checksum> (all without spaces
of course).

I need to read in all the data bytes, up to 2 meg,

Are you gong to ignore the addressing and record types? It seems you
can't get 2M data without handling these.

In case you are wondering why people keep asking questions, it's
because no one can even outline a solution unless that data is
properly understood. If you need to handle the various address record
types it gets a little more complicated.

On the plus side, once everything is pinned down, it might be a matter
of minutes to write the code so someone might even do that for you.
Of course, there are lots of programs that read this format. Maybe
one of them is open source?
 
G

Guest

On 25/04/09 11:59, Niv (KP) wrote:



If you know the preamble length is fixed, my suggestion would be to

open the original file for binary read and the new file for binary write

why? This doesn't sound like a binary file.

fread() the preamble into an array and work out the length of the line

malloc() an array of chars large enough for the line you're reading
malloc() an array of chars large enough for the line you're writing

fread() the rest of the line

do your manipulation

fwrite() the output line to the new file

fread() the preamble of the next line

continue until fread()ing the preamble returns EOF.

there may be some gotchas about end-of-line markers, I leave the detail
to the implementer... :)

whch would probably be avoided by using a text file
 
F

Flash Gordon

Jack said:
Two potential problems in the line above. First, since the OP is
using a specific format, Intel Hex format, the requirement is to check
for ASCII characters 0 through 9, and A, B, C, D, E, F. Regardless of
the source or execution character sets. The format specifically
defines ASCII.

The Intel format may well, but the OP did not specify that.
The second, as you might have noticed, is that the ASCII characters
for 10 through 15 are required by the standard to be upper case. Lower
case ones should be treated as an error.

Again, this was not specified by the OP.
But that is really related to the data format, not the C standard.

The format did sound familiar to me (having dealt with various formats
myself and done the odd conversion program), but I don't keep the
formats to hand.
 
K

kid joe

In C these days, I usually just define a local array of chars with a
length of about 1,000.

As for two arrays of 2 MB, on today's desktop platforms I would
probably define them at file scope.

Hi Jack,

I believe it is very risky to allocate such large arrays either statically
or on the stack. Its good practise to make big allocations on the heap
with malloc (or new in C++).

Cheers,
Joe


--

...................... o _______________ _,
` Good Evening! , /\_ _| | .-'_|
`................, _\__`[_______________| _| (_|
] [ \, ][ ][ (_|
 
B

Barry Schwarz

Hi Jack,

I believe it is very risky to allocate such large arrays either statically
or on the stack. Its good practise to make big allocations on the heap
with malloc (or new in C++).

On those systems where it matters, file scope variables and otters
declared static are usually not on the stack.
 
N

Nate Eldredge

Barry Schwarz said:
On those systems where it matters, file scope variables and otters
declared static are usually not on the stack.

True. The risk in such cases is that the array may cause the executable
to be very large, especially if it is initialized with anything other
than all zeros.
 
F

Flash Gordon

Joe said:
The OP was neither clear nor complete about the format he was dealing
with. Some of us, including Jack, determined it to be Intel Hex.
Agreed.

I pointed the OP to the Intel Specification and Jack explained certain
aspects of the format we and the OP might not know.
Agreed.

What's up Flash?

Nothing. I was just explaining why I answered as I did. I accept that
there are probably better answers when you have a full spec. Also I'm
sure I used to have third-party programs for loading and playing about
with these formats once, but they were probably not free.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,563
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top