Binary file: SAT

  • Thread starter Alessandro Barracco
  • Start date
7

7stud --

Alessandro Barracco wrote in post #994136:
Hi all. I never work before with binary file, and I'm a bit
confused.....

Both numbers and characters are stored as integers in file(or anywhere
on a computer). One method of storing characters in a file is with the
ASCII encoding. For instance, in the ASCII encoding 'a' is stored as
the integer 67, taking up one byte total. Note that you could also
store the integer 67 in 4 bytes--the other three bytes would just be all
0's.

You may also want to store the count of the number of banks in New York,
which is 67. You could also store that in one byte. So the question
becomes, how do you know whether a 67 you read from the file is supposed
to be the count of banks or the letter 'a'? The answer is: you have to
know how the data in the file is supposed to be interpreted.

If the integer in the first byte in a file is supposed to be an integer,
than you read in the integer as is; and if the integer in the second
byte in the file is supposed to be a letter, then you need to convert
the integer to a letter. In other words, you have to know what each
byte in the file is supposed to represent.
 
7

7stud --

7stud -- wrote in post #994163:
Once you are familiar with what each byte in your file represents, you
can use String#unpack to tell ruby how many bytes each integer occupies,
and how to interpret the integer.

But, I can't get a simple unpack() example to work, so what do I know:

str = "\x00\x00\x00\x61" #97 in hex, taking up 4 bytes

results = str.unpack("L")
p results

--output:--
[1627389952]
 
R

Roger Braun

Hi,

2011/4/21 7stud -- said:
7stud -- wrote in post #994163:
Once you are familiar with what each byte in your file represents, you
can use String#unpack to tell ruby how many bytes each integer occupies,
and how to interpret the integer.

But, I can't get a simple unpack() example to work, so what do I know:

str =3D "\x00\x00\x00\x61" =C2=A0#97 in hex, taking up 4 bytes

results =3D str.unpack("L")
p results

--output:--
[1627389952]

It's the correct result. L uses your systems endianness, which seems
to be little-endian. If you force big-endian by using N instead of L,
you will get your expected 97.

ruby-1.9.2-p180 :008 > str =3D "\x61\x00\x00\x00"
=3D> "a\u0000\u0000\u0000"
ruby-1.9.2-p180 :009 > results =3D str.unpack("L")
=3D> [97]

ruby-1.9.2-p180 :011 > str =3D "\x00\x00\x00\x61"
=3D> "\u0000\u0000\u0000a"
ruby-1.9.2-p180 :012 > results =3D str.unpack("N")
=3D> [97]


--=20
Roger Braun
rbraun.net | humoralpathologie.de
 
A

Alessandro Barracco

Thanx you all. I'm beginning to understand a bit....

These are the first 20 lines of the binary-block in the file:
------------------------------------------------------------------------

1
mogoo mih m o
1
_ll P/:1 [:,681 ^ 336>1<: ^ \VL ]*63;:- _nk ^ \VL mogqoo QK _mk H:; ^ /- =

mo mmeogemi monn
1
n fqfffffffffffffffj:rooh n:rono
1
,27:>;:- {rn rn _nm mnmqoqoqjgmm |
1
=3D0;& {m rn {rn {l {rn {rn |
1
-:9@)+r:&:r>++-6=3D {rn rn {rn {rn {n {k {j |
1
3*2/ {i rn {rn {rn {h {n |
1
:&:mad:-:961:2:1+ {rn rn _j 8-6; n _l +-6 n _k ,*-9 o _l >;5 o _k 8->; o =

_f /0,+<7:<4 o _k ,+03 oqoohilhjnjnfgklfljfh _k 1+03 lo _k ;,63 o _g =

93>+1:,, o _h /6'>-:> o _k 72>' o _i 8-6;>- o _j 28-6; looo _j *8-6; o =

_j )8-6; o _no :1;@96:3;, |
1
):-+:'@+:2/3>+: {rn rn l o n g |
1
-:9@)+r:&:r>++-6=3D {rn rn {rn {rn {l {k {j |

-------------------------------------------------------------------------=


It consists of pairs of lines: the first is a code (always 1), the =

second is the data. I think that the latter is wrote according to the =

SAT format (well to the SAB format, it's binary....).

ACIS supports two kinds of save files, SAT and SAB, which stand for =

=E2=80=9CStandard ACIS Text=E2=80=9D
and =E2=80=9CStandard ACIS Binary=E2=80=9D, respectively. Although one is=
ASCII text and =

the other is binary
data, the model data information stored in the two formats is identical

A SAB file has a .sab file extension. A SAB file uses delimiters
between elements and binary tags, without additional formatting.
The binary formats supported are:
int . . . . . . . . . . 4--byte 2s complement (as long)
long . . . . . . . . . 4--byte 2s complement
double . . . . . . . 8--byte IEEE
char . . . . . . . . . 1--byte ASCII
where =E2=80=9Cbyte=E2=80=9D is eight bits, and files are considered to b=
e byte strings. =

For multi--byte data
items, byte order normally just matches that of the processor being =

used, but a specific order
may be imposed by compiling with the preprocessor macro BIG_ENDIAN or
LITTLE_ENDIAN defined.

-- =

Posted via http://www.ruby-forum.com/.=
 
7

7stud --

Alessandro Barracco wrote in post #994230:
Thanx you all. I'm beginning to understand a bit....

These are the first 20 lines of the binary-block in the file:

Binary files aren't human readable, i.e. they look like nonsense.
It consists of pairs of lines: the first is a code (always 1), the
second is the data.

Do not think of binary files as containing lines. A binary file is a
long continuous sequence of bytes. And you have to know exactly what
each byte means to read the data. For instance, you have to know that
the first 4 bytes is the count of banks in New York, and the next byte
is a letter, and the next 2 bytes is the year, and the next 2 bytes is
the month, etc.
 
7

7stud --

Suppose your file contains this data:

"\x00\x00\x00\x01"

Scenario 1:
The four bytes could represent the number of widgets sold (=1).

Scenario 2:
Or the first two bytes could represent the number of widgets sold(=0),
the third bytes is the number of widgets in inventory(=0), and the
fourth byte is the number of widgets in transit to the factory(=1).

So unless you know what each byte in the file is supposed to represent,
you cannot read the file correctly. If someone hands you the file with
the above data in it, and says, "Here's your data. Get cracking!", and
the person walks out the door, how would you know if Scenario 1 or
Scenario 2 is the way the data is laid out?
 
A

Alessandro Barracco

Do not think of binary files as containing lines. A binary file is a
long continuous sequence of integers contained in a varying number of
bytes.

That's OK. but the file I need to parse is a special txt file (DXF
format) that consist of couple-of-line: the 1st is a code, that specify
an objectt-property (the colour of a line, the center of a circle, the
hieght of a text, etc), the 2nd is the value associated with it.
Well, there is a special object, the 3dsolid, that have 4 or 5 copules
like above, and a long series of couple that have the 1st line always 1
and the 2nd one as binary data.

Group code Description
8 Layer name
70 Modeler format version number (currently = 1)
... ....
1 Proprietary data (multiple lines < 255 characters
each)
3 Additional lines of proprietary data (if previous
group 1 string is greater than 255 characters)(optional)

For exanple, the following draws a line, in the layer "Walls", from the
point (16.5, 12.5,0.0) to (46.5,12.5,0.0).

0
LINE
8
Walls
10
16.5
20
12.5
30
0.0
11
46.5
21
12.5
31
0.0


My task is to "understand" the object "3dsolid" that have also the
"Proprietary data", ie the binary data. Searching in Google I found that
this data are set according to the ACIS *.sab standard (the link in the
first post), so I think I can read that binary..... isn't it?
 
7

7stud --

Alessandro Barracco wrote in post #994473:
That's OK. but the file I need to parse is a special txt file (DXF
format) that consist of couple-of-line:

Binary files do not have lines. Until you can understand that, you
cannot proceed. Binary files consist of blocks of bytes. Each block
contains some data. Each block consists of a different number of bytes.
 
W

William Rutiser

Alessandro Barracco wrote in post #994473:
Binary files do not have lines. Until you can understand that, you
cannot proceed. Binary files consist of blocks of bytes. Each block
contains some data. Each block consists of a different number of bytes.
Its not to helpful to someone trying to deal with DXF files to make such
a strong distinction between binary and text files. I haven't worked
with them and hope I never have to. A quick look at the Wikipedia
article and the most recent Autocad spec suggests that the files may be
best thought of as a mixture of binary and ASCII data. The original DXF
files were text files where each line was a key value pair with the
value generally a decimal representation of a floating point number.
There is now an optional file format that contains binary
representations of the numbers to reduce precision losses caused by
repeated conversions and save some space. Most of the 270 page
specification appears to describe the ASCII format with the binary
format introduced on page 242.


You can get a recent DXF spec at:
http://images.autodesk.com/adsk/files/autocad_2012_pdf_dxf-reference_enu.pdf

This may give a helpful overview:
http://en.wikipedia.org/wiki/Dxf

Alessandro's problem is to read and parse a file that contains small fields to be interpreted as ASCII text, binary integers, floating point numbers, etc. Just what will come next is determined by what came just before with reference to a 270 page document which has a few
examples in Visual Basic 6.

I would proceed as follows:

* Figure out which kinds of primitive data are expected in the files of interest.

* For each kind, write and test a function to read and convert one such item.

* Write a function to read the next entity record from the file. Its likely that this function
should return a Ruby object that represents the particular kind of entity.

The ACIS spec says "The header is followed by a sequence of entity records.
Each entity record consists of a sequence number (optional), an entity type identifier,
the entity data, and a terminator."

So to read an entity record, first read the sequence number if present, then read the type identifier. The type identifier should be used to select an appropriate function to read the data part of the entity record. Then read the terminator unless it was already used to end the entity data.


Essential tools:

Something to examine and print pieces of the data in hexadecimal. Use this to explore the
data and resolve questions about byte order, number encoding, etc.

The ruby String pack and unpack functions.

Possibly an assortment of colored pencils to mark up printed hex dumps of the data.

There may be some Ruby tools specifically intended for this kind of work.



Caveat:
I may have written more than I know about some of the details but I think the general ideas are correct.


-- Bill
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top