writing binary files

M

mermadak

I am trying to convert an ANSI encoded ASCII text file to a binary file. I
have looked at the b2a_qp( data[, quotetabs, istext, header]) function at
http://aspn.activestate.com/ASPN/docs/ActivePython/2.3/python/lib/module-binascii.html
but I am not sure if it will do what I need it to or how set it up to take
the data.

Also, the parts of this that really make it an issue is that the data is
coming off of a DOS machine (so endian is a concern here right?) and is a
rather large text file with a ton of scientific data points (from 500k to
5MB files).

Any help would be greatly appreciated.

Thanks,
Dennis Aust
 
J

John Bokma

mermadak said:
I am trying to convert an ANSI encoded ASCII text file to a binary
file. I have looked at the b2a_qp( data[, quotetabs, istext, header])
function at
http://aspn.activestate.com/ASPN/docs/ActivePython/2.3/python/lib/modul
e-binascii.html but I am not sure if it will do what I need it to or
how set it up to take the data.

Python... hmmm....
Also, the parts of this that really make it an issue is that the data
is coming off of a DOS machine (so endian is a concern here right?)
and is a rather large text file with a ton of scientific data points
(from 500k to 5MB files).

So basically you want to convert numbers in a text file to some short
binary notation?

5MB... you are aware that the current year is 2005? :)
 
M

mermadak

John Bokma said:
So basically you want to convert numbers in a text file to some short
binary notation?

Exactly... any ideas?
5MB... you are aware that the current year is 2005? :)

Does that mean 5MB shouldn't be a problem???
I originally tried writing a program to simply maniplute these files in my
native programming languages of VB and C++ which would hang due to the size
of these files. I finally found a PERL script that would handle parsing this
much data.

Dennis Aust
 
J

John Bokma

mermadak said:
Exactly... any ideas?

Python or Perl, since your post referred to Python :-D
Does that mean 5MB shouldn't be a problem???

Yup, your computer probably has 100 times as much memory.
I originally tried writing a program to simply maniplute these files
in my native programming languages of VB and C++ which would hang due
to the size of these files.

If a C++ program would hang on 5MB files, how can programs handle 10M
MP3 files, or 700 MB movies?
I finally found a PERL script that would

PERL is not an acronym :)
handle parsing this much data.

Again: 5MB is not much. My best guess is that you should rethink your
algoritm(s).
 
M

mermadak

Python or Perl, since your post referred to Python :-D

Perl... preferrably. My point there is that I am grasping at straws at this
point...

I looked at the pack function that was also recommended but I am not sure
how to use it. Could anyone possibly give me an example? Mainly it looks as
though my data can only contain strings, floating point decimals, or fixed
point decimals but not a combination there of. My data is ASCII format but
would it be considered string data even though a data string may look like
"2005-08-05, 13:36:06, 3236.453232, 11123.456, 0.0, 21, 224.332" for
purposes of conversion to raw binary format? Also, the function says it
calls for a TEMPLATE variable to be passed to it. Is this required? And this
looks as though it would require a template character to be passed for every
character in the file??? This seems like it will be very processor intensive
as well as nearly impractical from a code writing perspective, as I would
have to build an array of the TEMPLATE characters and then build a
comparison function to check which character matches the TEMPLATE
designation and then convert each character to binary at that point. Am I
way off base here? Just seems like there would be a more practical way to
achieve this.
Yup, your computer probably has 100 times as much memory.

True... but what does that have to do with process intensity and the
capabilities of the tools?
If a C++ program would hang on 5MB files, how can programs handle 10M
MP3 files, or 700 MB movies?

I agree with your point. Admittedly it was probably due to poor programming.
I have only been coding for 3 years now and only part time at that. But I
would be glad to send you the programs I was working on and see if you make
them work. ;-) Although, I did finally get that covered with Perl so it not
much of a concern at the moment.
Again: 5MB is not much. My best guess is that you should rethink your
algoritm(s).

Agreed, see above. Thank you for pointing out all of the obvious problems
here. Perhaps you would be so kind as to make some suggestions on how I
could actually accomplish this now?
 
J

John Bokma

mermadak said:
Perl... preferrably. My point there is that I am grasping at straws at
this point...

I looked at the pack function that was also recommended but I am not
sure how to use it. Could anyone possibly give me an example? Mainly
it looks as though my data can only contain strings, floating point
decimals, or fixed point decimals but not a combination there of. My
data is ASCII format but would it be considered string data even
though a data string may look like "2005-08-05, 13:36:06, 3236.453232,
11123.456, 0.0, 21, 224.332" for purposes of conversion to raw binary
format?

A better question is: is compression really required? What is causing
the current problem(s). I am sure it's not managing 5 MB of data, which
is on a recent PC close to nothing.
Also, the function says it calls for a TEMPLATE variable to be
passed to it. Is this required?

The whole idea of pack is that it packs data according to a TEMPLATE, so
guess :)
And this looks as though it would
require a template character to be passed for every character in the
file???

More or less, yes.
This seems like it will be very processor intensive as well as
nearly impractical from a code writing perspective, as I would have to
build an array of the TEMPLATE characters and then build a comparison
function to check which character matches the TEMPLATE designation and
then convert each character to binary at that point. Am I way off base
here? Just seems like there would be a more practical way to achieve
this.

Yup: the most practical problem is: find the real bottle neck of your
problem. If you just require compression, use a compression solution.
Pack indeed needs to "know" what is in the string you want to be packed.
So if you want to pack a date followed by 3 floats on line 1 and 4
floats and a fixed number on line 2, you have to provide the correct
template to pack.
True... but what does that have to do with process intensity and the
capabilities of the tools?

That there shouldn't be any problem reading 5 MB of data into memory and
use it.

Regarding pack: if your lines don't follow a fixed format (e.g. a date
followed by exactly 5 floats, and 2 fixed point nrs), you already have
to do some parsing in your program. You can use the same parsing set up
to compress/convert your data to binary. If you only want to use the
output in Perl, you might consider writing out the compact version using
Storable.

If you have access to the program that creates those "big" files, and
it's written in Perl, you just have to tweak the output part, since that
part decides the structure of the output file. If it's not written in
Perl, you have to create a compatible binary output format (which is not
that hard). However, I recommend, especially if your files are around 5
MB, to stick with ASCII. It's human readable :)
I agree with your point. Admittedly it was probably due to poor
programming. I have only been coding for 3 years now and only part
time at that. But I would be glad to send you the programs I was
working on and see if you make them work. ;-)

No problem. I do such things professionally (ie. for money ;-) ). It
might save you a lot of time and trouble.
Although, I did finally
get that covered with Perl so it not much of a concern at the moment.


Agreed, see above. Thank you for pointing out all of the obvious
problems here. Perhaps you would be so kind as to make some
suggestions on how I could actually accomplish this now?

If handling 5 MB of data is a problem for your program, why is it a
problem?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top