how to do reading of binary files?

J

jvdb

Hi all,

I need some help on the following issue. I can't seem to solve it.

I have a binary (pcl) file.
In this file i want to search for specific codes (like <0C>). I have
tried to solve it by reading the file character by character, but this
is very slow. Especially when it comes to files which are large
(>10MB) this is consuming quite some time.
Does anyone has a hint/clue/solution on this?

thanks already!

Jeroen
 
D

Diez B. Roggisch

jvdb said:
Hi all,

I need some help on the following issue. I can't seem to solve it.

I have a binary (pcl) file.
In this file i want to search for specific codes (like <0C>). I have
tried to solve it by reading the file character by character, but this
is very slow. Especially when it comes to files which are large
(>10MB) this is consuming quite some time.
Does anyone has a hint/clue/solution on this?

What has the searching to do with the reading? 10MB easily fit into the
main memory of a decent PC, so just do


contents = open("file").read() # yes I know I should close the file...

print contents.find('\x0c')

Diez
 
J

jvdb

jvdb schrieb: .......
What has the searching to do with the reading? 10MB easily fit into the
main memory of a decent PC, so just do

contents = open("file").read() # yes I know I should close the file...

print contents.find('\x0c')

Diez

True. But there is another issue attached to the one i wrote.
When i know how much this occurs, i know the amount of pages in the
file. After that i would like to be able to extract a given amount of
data:
file x contains 20 <0C>. then for example i would like to extract from
instance 5 to instance 12 from the file.
The reason why i want to do this: The 0C stands for a pagebreak in PCL
language. This way i would be absle to extract a certain amount of
pages from the file.
 
D

Diez B. Roggisch

jvdb said:
True. But there is another issue attached to the one i wrote.
When i know how much this occurs, i know the amount of pages in the
file. After that i would like to be able to extract a given amount of
data:
file x contains 20 <0C>. then for example i would like to extract from
instance 5 to instance 12 from the file.
The reason why i want to do this: The 0C stands for a pagebreak in PCL
language. This way i would be absle to extract a certain amount of
pages from the file.

And? Finding the respective indices by using

last_needle_position = 0
positions = []
while last_needle_position != -1:
last_needle_position = contents.find(needle, last_needle_position+1)
if last_needle_position != -1:
positions.append(last_needle_position)


will find all the pagepbreaks. then just slice contents appropriatly.
Did you read the python tutorial?

diez
 
M

Marc 'BlackJack' Rintsch

jvdb said:
True. But there is another issue attached to the one i wrote.
When i know how much this occurs, i know the amount of pages in the
file. After that i would like to be able to extract a given amount of
data:
file x contains 20 <0C>. then for example i would like to extract from
instance 5 to instance 12 from the file.
The reason why i want to do this: The 0C stands for a pagebreak in PCL
language. This way i would be absle to extract a certain amount of
pages from the file.

And? Finding the respective indices by using

last_needle_position = 0
positions = []
while last_needle_position != -1:
last_needle_position = contents.find(needle, last_needle_position+1)
if last_needle_position != -1:
positions.append(last_needle_position)


will find all the pagepbreaks. then just slice contents appropriatly.
Did you read the python tutorial?

Maybe splitting at '\x0c', selecting/slicing the wanted pages and joining
them again is enough, depending of the size of the files and memory of
course.

One problem I see is that '\x0c' may not always be the page end. It may
occur in "rastered image" data too I guess.

Ciao,
Marc 'BlackJack' Rintsch
 
J

jvdb

And? Finding the respective indices by using
last_needle_position = 0
positions = []
while last_needle_position != -1:
last_needle_position = contents.find(needle, last_needle_position+1)
if last_needle_position != -1:
positions.append(last_needle_position)
will find all the pagepbreaks. then just slice contents appropriatly.
Did you read the python tutorial?

Maybe splitting at '\x0c', selecting/slicing the wanted pages and joining
them again is enough, depending of the size of the files and memory of
course.

One problem I see is that '\x0c' may not always be the page end. It may
occur in "rastered image" data too I guess.

Ciao,
Marc 'BlackJack' Rintsch

Hi,

your last comment is also something i have noticed. There are a number
of occasions where this will happen. I also have to deal with this.
I will dive into this on monday, after this hot weekend.

cheers,
Jeroen
 
G

Grant Edwards

I have a binary (pcl) file.
In this file i want to search for specific codes (like <0C>). I have
tried to solve it by reading the file character by character, but this
is very slow. Especially when it comes to files which are large
(>10MB) this is consuming quite some time.
Does anyone has a hint/clue/solution on this?

I'd memmap the file.

http://docs.python.org/lib/module-mmap.html

If you prefer it to appear as an array of bytes instead of a
string, the various numeric/array packags can do that.

Numarray: http://stsdas.stsci.edu/numarray/numarray-1.5.html/module-numarray.memmap.html
Vmaps: http://snafu.freedom.org/Vmaps/Vmaps.html
Numpy: <documentation is not free>

Since I can't point you to Numpy docs, here's a link to a
newsgroup thread with an example for numpy:

http://groups.google.com/group/comp.lang.python/browse_frm/thread/c63c3e281df99897/2336baa98386d5e7
 
R

Roger Miller

What has the searching to do with the reading? 10MB easily fit into the
main memory of a decent PC, so just do

contents = open("file").read() # yes I know I should close the file...

print contents.find('\x0c')

Diez

Better make that 'open("file", "rb").
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,023
Latest member
websitedesig25

Latest Threads

Top