reading binary data - C like bit field idiom

S

stevemflanagan

I have written up a technique for reading reading data using java.
Once the fields are defined, the actual reading can be automated. The
key is the use of Annotations to add information to the fields of the
class.

It emulates some of the C idiom of reading data directly into struct's
with pragma pack(1) that is miss so much in java.

So, if you do a lot of reading of different binary data blobs, this
technique might be useful. It can certainly be basis something much
more elegant that a bunch of getInt's, getByte's, etc, etc.

So have a look

http://codify.flansite.com/2009/05/c-struct-like-parsing-of-binary-data-with-java/

This would have certainly helped me on past projects...

Cheers!
Steve
 
J

Joshua Cranmer

So, if you do a lot of reading of different binary data blobs, this
technique might be useful. It can certainly be basis something much
more elegant that a bunch of getInt's, getByte's, etc, etc.

As far as I know, with the exclusion of compression formats, most binary
protocols have field sizes that are multiples of octets, so the ability
to do bit-level reading is probably more complexity than it is normally
worth.

Some other notes is that the field ordering is significant--which is
probably not a terribly big deal, but it can open up incompatibility
between JVMs, if the order of fields in reflective calls is modified.
It's also rather error-intolerant, relying on the fact that all fields
are part of the structure.

All-in-all, I'm not sure how useful it is, considering that reading (in
particular) many binary protocols requires turning some references from
numerical offsets or whatnot into references to higher-order objects. At
the very least, many have dynamic-length field members which also render
the code as presented difficult to use.

The last binary protocol I had time to play around with was the Java
class file specification, which I would consider typical of a "modern"
binary protocol. The internal structure is a bit messy, as a lot of data
points are indexes into a common pool, which is a pool of
variable-length members.

Oh yeah, and I would like to hurt whomever decided that Long and Double
constant elements were two entries long instead of one. They make
writing that code annoying.


On the bright side, that's another innovative use of annotations. Much
better than my use--as a way to set default options--which unexpectedly
involved much pain with a class loader that I would rather not have had
to experience.
 
S

stoodle

As far as I know, with the exclusion of compression formats, most binary
protocols have field sizes that are multiples of octets, so the ability
to do bit-level reading is probably more complexity than it is normally
worth.

Some other notes is that the field ordering is significant--which is
probably not a terribly big deal, but it can open up incompatibility
between JVMs, if the order of fields in reflective calls is modified.
It's also rather error-intolerant, relying on the fact that all fields
are part of the structure.

All-in-all, I'm not sure how useful it is, considering that reading (in
particular) many binary protocols requires turning some references from
numerical offsets or whatnot into references to higher-order objects. At
the very least, many have dynamic-length field members which also render
the code as presented difficult to use.

The last binary protocol I had time to play around with was the Java
class file specification, which I would consider typical of a "modern"
binary protocol. The internal structure is a bit messy, as a lot of data
points are indexes into a common pool, which is a pool of
variable-length members.

Oh yeah, and I would like to hurt whomever decided that Long and Double
constant elements were two entries long instead of one. They make
writing that code annoying.

On the bright side, that's another innovative use of annotations. Much
better than my use--as a way to set default options--which unexpectedly
involved much pain with a class loader that I would rather not have had
to experience.

I agree that bit fields are rarely used. But, we did run into this
very problem. I always miss ed C idiom to do this - it was just so
simple.

There are certainly devils in these details. Annotations could be
written for very specific things such as particular IEEE float
formats, UTF-8 vs UNICODE, etc. Here we are dealing with the bit
fields found in handset protocol IS801.

I would not recommend using arcane bit twiddled protocols but on
something like a handset where every single cycle is golden, a
standard you must comply with, a job to be done now; it becomes a
necessary evil. At least the server side can be written with higher
abstractions.

I do not recommend them, but admire them. No school like the old
school, but I agree, I'm glad most of my time is not spent dealing
with them anymore.

Thanks for the thoughts,
Steve
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top