Conversion of 24bit binary to int

I

Idar

Is there an effecient/fast way in python to convert binary data from file
(24bit hex(int) big endian) to 32bit int (little endian)? Have seen
struct.unpack, but I am unsure how and what Python has to offer. Idar

The orginal data format is stored in blocks of 512 words
(1536B=3Bytes/word) on the form Ch1: 1536B (3B*512), the binary (hex) data
is big endian
Ch2: 1536B (3B*512)
Ch3: 1536B (3B*512)
and so on

The equivalent c++ program looks like this:
for(i=0;i<nchn;i++)
{
for(k=0;k<segl;k++)
{
ar24[k]=0;//output array=32 bit int array->Mt24 fmt
pdt=(unsigned char *)(&ar24[k]);
*pdt =*(a+2);
*(pdt+1)=*(a+1);
*(pdt+2)=*(a+0);
a+=3;
ar24[k]-=DownloadDataOffset;
// printf("%d\n",ar24[k]);//this is the number on 32 bit format
}
}
 
P

Peter Hansen

Idar said:
Is there an effecient/fast way in python to convert binary data from file
(24bit hex(int) big endian) to 32bit int (little endian)? Have seen
struct.unpack, but I am unsure how and what Python has to offer. Idar

I think the question is unclear. You say you've seen struct.unpack.
So what then? Don't you think struct.unpack will work? What do you
mean you are unsure how and what Python has to offer? The documentation
which is on the web site clearly explains how and what struct.unpack
has to offer...

Please clarify.

-Peter
 
M

Mike C. Fletcher

If I'm understanding correctly, hex has nothing to do with this and the
data is really binary, so what you're looking for is probably:
little-endian integer format
'\x02\x01\x00\x00'

There are faster ways if you have a lot of such data (e.g. PIL would
likely have something to manipulate RGB to RGBA images), similarly, you
could use Numpy to add large numbers of rows simultaneously (all 512 if
I understand your description of the data correctly). Without knowing
what type of data is being loaded it's hard to give a better recommendation.

HTH,
Mike

Is there an effecient/fast way in python to convert binary data from
file (24bit hex(int) big endian) to 32bit int (little endian)? Have
seen struct.unpack, but I am unsure how and what Python has to offer.
Idar

....
_______________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://members.rogers.com/mcfletch/
 
P

Patrick Maupin

Idar said:
Is there an effecient/fast way in python to convert binary data from file
(24bit hex(int) big endian) to 32bit int (little endian)? Have seen
struct.unpack, but I am unsure how and what Python has to offer. Idar

As Peter mentions, you haven't _really_ given enough information
about what you need, but here is some code which will do what
I _think_ you said you want...

This code assumes that you have a string (named teststr here)
in the source format you describe. You can get a string
like this in several ways, e.g. by reading from a file object.

This code then swaps every 3 characters and inserts a null
byte between every group of three characters.

The result is in a list, which can easily be converted back
to a string by ''.join() as shown in the test printout.

I would expect that either the array module or Numpy would
work faster with _exactly_ the same technique, but I'm
not bored enough to check that out right now.

If this isn't fast enough after using array or NumPy (or
after Alex, Tim, et al. get through with it), I would
highly recommend Pyrex -- you can do exactly the same
sorts of coercions you were doing in your C++ code.


teststr = ''.join([chr(i) for i in range(128,128+20*3)])

result = len(teststr) * 4 // 3 * [chr(0)]
for x in range(3):
result[2-x::4] = teststr[x::3]

print repr(''.join(result))


Regards,
Pat
 
I

Idar

I think the question is unclear. You say you've seen struct.unpack.
So what then? Don't you think struct.unpack will work? What do you
mean you are unsure how and what Python has to offer? The documentation
which is on the web site clearly explains how and what struct.unpack
has to offer...

It is due to slack reading........

The doc says "Standard size and alignment are as follows: no alignment is
required for any type (so you have to use pad bytes)................"

It was unclear (at the time of reading) in the sence that I didn't see the
above text + there was no example on how to handle odd-byte/padding
conversion and the test program crashed!

But if you know how to convert this format (the file is about 6MB)
effeciently, pls do give me a hint. The data is stored binary with the
format:
Ch1: 1536B (512*3B)
...
Ch6 1536B (512*3B)
Then it is repeated again until end:
Ch1 1536B (512*3B)
...
Ch6 1536B (512*3B)
 
I

Idar

If I'm understanding correctly, hex has nothing to do with this and the
data is really binary, so what you're looking for is probably:

Thanks for the hint!! and sorry - i ment binary!
little-endian integer format
'\x02\x01\x00\x00'

There are faster ways if you have a lot of such data (e.g. PIL would
likely have something to manipulate RGB to RGBA images), similarly, you
could use Numpy to add large numbers of rows simultaneously (all 512 if I
understand your description of the data correctly). Without knowing what
type of data is being loaded it's hard to give a better recommendation.

It is binary with no formating characters to indicate start/end of each
block (fixed size).
A file is about 6MB (and about 300 of them again...),
Ch1: 1536B (512*3B) - the 3B are big endian (int)
...
Ch6: 1536B (512*3B)
And then it is repeated till the end:
Ch1: 1536B (512*3B)
...
Ch6: 1536B (512*3B)

ciao, idar
 
I

Idar

Thanks for the example!

The format is binary with no formating characters to indicate start/end of
each block (fixed size).
A file is about 6MB (and about 300 of them again...), so

Ch1: 1536B (512*3B) - the 3B are big endian (int)
...
Ch6: 1536B (512*3B)
And then it is repeated till the end (say Y sets of Ch1 (the same for
Ch2,3,4,5,6)):
Ch1,Y: 1536B (512*3B)
...
Ch6,Y: 1536B (512*3B)

And idealy I would like to convert it to this format:
Ch1: Y*512*4B (normal int with little endian)
Ch2
Ch3
Ch4
Ch5
Ch6
And that is the end :)
Idar
This code assumes that you have a string (named teststr here)
in the source format you describe. You can get a string
like this in several ways, e.g. by reading from a file object.

This code then swaps every 3 characters and inserts a null
byte between every group of three characters.

The result is in a list, which can easily be converted back
to a string by ''.join() as shown in the test printout.

I would expect that either the array module or Numpy would
work faster with _exactly_ the same technique, but I'm
not bored enough to check that out right now.

If this isn't fast enough after using array or NumPy (or
after Alex, Tim, et al. get through with it), I would
highly recommend Pyrex -- you can do exactly the same
sorts of coercions you were doing in your C++ code.


teststr = ''.join([chr(i) for i in range(128,128+20*3)])

result = len(teststr) * 4 // 3 * [chr(0)]
for x in range(3):
result[2-x::4] = teststr[x::3]

print repr(''.join(result))


Regards,
Pat
 
A

Alex Martelli

Idar said:
Thanks for the example!

The format is binary with no formating characters to indicate start/end of
each block (fixed size).
A file is about 6MB (and about 300 of them again...), so

Ch1: 1536B (512*3B) - the 3B are big endian (int)
..
Ch6: 1536B (512*3B)
And then it is repeated till the end (say Y sets of Ch1 (the same for
Ch2,3,4,5,6)):
Ch1,Y: 1536B (512*3B)
..
Ch6,Y: 1536B (512*3B)

And idealy I would like to convert it to this format:
Ch1: Y*512*4B (normal int with little endian)
Ch2
Ch3
Ch4
Ch5
Ch6
And that is the end :)

So, you don't really need to convert binary to int or anything, just
shuffle bytes around, right? Your file starts with (e.g.), using a
letter for each arbitrary binary byte:

A B C D E F G H I ...

and you want to output the bytes

C B A 0 F E D 0 I H G 0 ...

I.e, swap 3 bytes, insert a 0 byte for padding, and proceed (for all
Ch1, which is spread out in the original file -- then for all Ch2, and
so on). Each file fits comfortably in memory (3MB for input, becoming
4MB for output due to the padding). You can use two instances of
array.array('B'), with .read for input and .write for output (just
remember .read _appends_ to the array, so make a new empty one for
each file you're processing -- the _output_ array you can reuse).

It's LOTS of indexing and single-byte moving, so I doubt the Python
native performance will be great. Still, once you've implemented and
checked it out you can use psyco or pyrex to optimize it, if needed.

The primitive you need is typically "copy with swapping and padding
a block of 1536 input bytes [starting from index SI] to a block of
2048 output bytes" [starting from index SO -- the 0 bytes in the
output you'll leave untouched after at first preparing the output
array with OA = array.array('B', Y*2048*6*'\0') of course].
That's just (using predefined ranges for speed, no need to remake
them every time):

r512 = xrange(512)

def doblock(SI, SO, IA, OA, r512=r512):
ii = SI
io = SO
for i in r512:
OA[io:io+3] = IA[ii+2:ii-1:-1]
ii += 3
io += 4

so basically it only remains to compute SI and SO appropriately
and loop ditto calling this primitive (or some speeded-up version
thereof) 6*Y times for all the blocks in the various channels.


Alex
 
P

Patrick Maupin

Alex Martelli wrote:
r512 = xrange(512)

def doblock(SI, SO, IA, OA, r512=r512):
ii = SI
io = SO
for i in r512:
OA[io:io+3] = IA[ii+2:ii-1:-1]
ii += 3
io += 4


It's my guess this would be faster using array.array
in combination with extended slicing, as per the list
example I gave in a previous message, even though I'm
still not bored enough to time it :) (The for loop
in my previous example only requires 3 interations,
rather than 512 as in this example.)

Pat
 
P

Patrick Maupin

Idar said:
Thanks for the example!

The format is binary with no formating characters to indicate start/end of
each block (fixed size).
A file is about 6MB (and about 300 of them again...), so

Ch1: 1536B (512*3B) - the 3B are big endian (int)
..
Ch6: 1536B (512*3B)
And then it is repeated till the end (say Y sets of Ch1 (the same for
Ch2,3,4,5,6)):
Ch1,Y: 1536B (512*3B)
..
Ch6,Y: 1536B (512*3B)

And idealy I would like to convert it to this format:
Ch1: Y*512*4B (normal int with little endian)
Ch2
Ch3
Ch4
Ch5
Ch6
And that is the end :)
Idar

OK, now that I have a beer and a specification, here is some code
which (I think) should do what (I think) you are asking for.
On my Athlon 2200+ (marketing number) computer, with the source
file cached by the OS, it operates at around 10 source megabytes/second.

(That should be about 3 minutes plus actual file I/O operations
for the 300 6MB files you describe.)

Verifying that it actually produces the data you expect is up to you :)

Regards,
Pat


import array

def mungeio(srcfile,dstfile, numchannels=6, blocksize=512):
"""
This function converts 24 bit RGB into 32 bit BGR0,
and simultaneously de-interleaves video from multiple
sources. The parameters are:

srcfile -- an file object opened with 'rb'
(or similar object)
dstfile -- a file object opened with 'wb'
(or similar object)
numchannels -- the number of interleaved video channels
blocksize -- the number of pixels per channel on
each interleaved block (interleave factor)

This function reads all the data from srcfile and writes
it to dstfile. It is up to the caller to close both files.

The function asserts that the amount of data to be read
from the source file is an integral multiple of
blocksize*numchannels*3.

This function assumes that multiple copies of the data
will easily fit into RAM, as the target file size is
6MB for the source files and 8MB for the destination
files. If this is not a good assumption, it should
be rearchitected to output to one file per channel,
and then stitch the output files together at the end.
"""

srcblocksize = blocksize * 3
dstblocksize = blocksize * 4

def mungeblock(src,dstarray=array.array('B',dstblocksize*[0])):
"""
This function accepts a string representing a single
source block, and returns a string representing a
single destination block.
"""
srcarray = array.array('B',src)
for i in range(3):
dstarray[2-i::4] = srcarray[i::3]
return dstarray.tostring()

channellist = [[] for i in range(numchannels)]

while 1:
for channel in channellist:
data = srcfile.read(srcblocksize)
if len(data) != srcblocksize:
break
channel.append(mungeblock(data))
else:
continue # (with while statement)
break # Propagate break from 'for' out of 'while'

# Check that input file length is valid (no leftovers),
# and then write the result.

assert channel is channellist[0] and not len(data)
dstfile.write(''.join(sum(channellist,[])))


def mungefile(srcname,dstname):
"""
Actual I/O done in a separate function so it can
be more easily unit-tested.
"""
srcfile = open(srcname,'rb')
dstfile = open(dstname,'wb')
mungeio(srcfile,dstfile)
srcfile.close()
dstfile.close()
 
P

Patrick Maupin

I just realized that, according to your spec, it ought to be possible
to do the rgb -> bgr0 conversion on the entire file all at one go
(no nasty headers or block headers to get in the way:)

So I wrote a somewhat more comprehensible (for one thing, it gets rid
of that nasty sum() everybody's been complaining about :), somewhat more
memory-intensive version of the program. On my machine it executes at
approximately the same speed as the original one I wrote (10 source
megabytes/second), but this one might be more amenable to profiling
and further point optimizations if necessary.

The barebones (no comments or error-checking) functions are below.

Pat


import array

def RgbToBgr0(srcstring):
srcarray = array.array('B',srcstring)
dstarray = array.array('B',len(srcstring) * 4 // 3 * chr(0))
for i in range(3):
dstarray[2-i::4] = srcarray[i::3]
return dstarray.tostring()

def deinterleave(srcstring,numchannels=6,pixelsperblock=512):
bytesperblock = pixelsperblock*4
totalblocks = len(srcstring) // bytesperblock
blocknums = []
for i in range(numchannels):
blocknums.extend(range(i,totalblocks,numchannels))
return ''.join([srcstring[i*bytesperblock:(i+1)*bytesperblock]
for i in blocknums])

def mungefile(srcname,dstname):
srcfile = open(srcname,'rb')
dstfile = open(dstname,'wb')
dstfile.write(deinterleave(RgbToBgr0(srcfile.read())))
srcfile.close()
dstfile.close()
 
C

Christos TZOTZIOY Georgiou

But if you know how to convert this format (the file is about 6MB)
effeciently, pls do give me a hint. The data is stored binary with the
format:
Ch1: 1536B (512*3B)
..
Ch6 1536B (512*3B)
Then it is repeated again until end:
Ch1 1536B (512*3B)
..
Ch6 1536B (512*3B)

So it's some audio file with 6 channels, right? (I missed the first
post)

I would take every chunk of 512*3 bytes, and for every 3 bytes,
struct.unpack('i', _3_bytes+'\0')[0] is the 32bit value (assuming
Intel's little endianness).

Hope this helps (no, really :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top