How can I parse binary files?

J

Joel VanderWerf

Fabio said:
This time I'm trying to write a binary file.

Q1: why does the structure MRKMessageFlags does not get the apropriate
values?

The problem is, unfortunately, not something that can easily be fixed in
bit-struct's nested fields. When you do assignments like the following:
msg.flags.flagSeen = 1
msg.flags.flagAnswered = 0
msg.flags.flagFlagged = 1

you are operating on a *copy* of the flags structure. The reason is that
msg.flags returns a copy of that structure. It doesn't return a
reference to a subfield of msg. It might be possible to return an object
that delegates back to the msg structure, but I think that might
actually be more confusing than the way it is now.

Your best bet is, as Daniel suggested:

flags = msg.flags
flags.flagSeen = 1
flags.flagAnswered = 0
flags.flagFlagged = 1
msg.flags = flags

I will try to add a warning about this in the docs.
 
J

Joel VanderWerf

ChrisH said:
BTW, I noticed that all the fields in the BitStruct had endianess
specified.
Is there a way to set the endianess for the whole structure? Would you

have a strucutre with mixed endianess?

Good point about setting the default endianness for all fields in a
BitStruct. That's probably the more useful case than field-by-field
settings. I'll do that in the next release...
 
J

Joel VanderWerf

Daniel said:
Actually, going through this exercise has pointed up some features
that I would like to add to BitStruct, since in many cases it almost
but not quite completely was exactly what the poster wanted. It would
be nice to have an easy, obvious, and supported way to define extra
padding in a structure (as we needed to here). It would be nice to
have an easier, supported syntax for reading a structure from a binary
file. Finally, it would be nice to allow an easy way to define data
wrappers, as was done with the date property.

1. Padding.

You can use "char" fields, but that puts extra junk in the inspect
output. I'll add an "ignore" or "pad" field type in the next release,
which will behave like char but will not define accessors and will not
show up in inspect. (Pad fields will have to show up in #to_s output, in
order to preserve alignment, of course.)

2. Reading BitStructs from a file.

I'm not sure there's an easier way to do it than in your code, Daniel,
but I'm open to suggestions. All of ruby's IO (including sockets) uses
Strings, so we always have to read a String and then construct a
BitStruct from that string using BitStruct.new.

3. Defining data wrappers.

Hm.... I've needed that too. I'll think about it.
 
D

Daniel Martin

Joel VanderWerf said:
2. Reading BitStructs from a file.

I'm not sure there's an easier way to do it than in your code, Daniel,
but I'm open to suggestions. All of ruby's IO (including sockets) uses
Strings, so we always have to read a String and then construct a
BitStruct from that string using BitStruct.new.

Well, the main issue is that I thought it was a bit clumsy to have to
both ask for the byte length and do the reading. Now, I'm not sure
this can be done for a structure with a "rest" parameter, but for
anything else it should be possible for the user of bit-struct to do
the read and structure init in one step, say:
mrk_head = MRKHeader.read(f)
or
mrk_head = MRKHeader.new
mrk_head << f
Actually, those two steps should be combinable into something like:
mrk_head = MRKHeader.new << f

The point is that the user of bit-struct should be able to avoid
knowing the actual byte length if at all possible.

This might even be possible to do with something that has a rest
parameter, so long as you have a maximum size limitation on "rest",
and then just use however many bytes you get back in one read call
(reasonable behavior for sockets).
3. Defining data wrappers.

Hm.... I've needed that too. I'll think about it.

I can think of at least two ways:

one is for most built-in field types to have overrideable data_in and
data_out procedures that take a single argument and by default just
return their argument but someone who needs, say, a date stored as
seconds-since-1970 could then just subclass BitStruct::UnsignedField
and override data_in and data_out.

The other is to add an extra data_in and data_out option to all fields
whose values should be Procs.

Actually, it's probably feasible to implement both.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,614
Members
45,288
Latest member
Top CryptoTwitterChannels

Latest Threads

Top