Another topic on how to read a binary file.

K

Knitter

Hi,
I know this has been asked a few zillion times but I couldn't find a
good answer for my problem.

I have a binary file, the Ant Movie Catalog database if anyone knows
the software. It is a file where information about movies is stores,
the software was created in Delphi so the binary files contains Pascal
strings and integers.

I know the file format, for example, I know that the strings are store
using a 4 bytes integer representing the string length, followed by
the actually string. What I'm falling to understand is how to read the
file.

I've been using BufferedInputStream created with a FileInputStream.
If I use the read(byte[]) method, that fills the passed array with the
array.lenght how can I transform that array of bytes into the integer
that I need?

I'm creating a simple test applications to learn how to read the
binary file. I'm starting with the header that is represented as:

strFileHeader35 = ' AMC_3.5 Ant Movie Catalog 3.5.x www.buypin.com
www.antp.be ';
OwnerName: string;
OwnerSite: string;
OwnerMail: string;
OwnerDescription: string;

So I thought of reading the 4 byte that tells me how long each string
is, convert the array with the 4 bytes into the needed integer and
then reading the string into another array with the size of the
integer I have found.

I'm stuck with how to correctly read the file, how to convert the
bytes into integers.

I'm I going the wrong way?

Thanks.
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Knitter said:
I have a binary file, the Ant Movie Catalog database if anyone knows
the software. It is a file where information about movies is stores,
the software was created in Delphi so the binary files contains Pascal
strings and integers.

I know the file format, for example, I know that the strings are store
using a 4 bytes integer representing the string length, followed by
the actually string. What I'm falling to understand is how to read the
file.

I've been using BufferedInputStream created with a FileInputStream.
If I use the read(byte[]) method, that fills the passed array with the
array.lenght how can I transform that array of bytes into the integer
that I need?
So I thought of reading the 4 byte that tells me how long each string
is, convert the array with the 4 bytes into the needed integer and
then reading the string into another array with the size of the
integer I have found.

I'm stuck with how to correctly read the file, how to convert the
bytes into integers.

If you wrap your stream in a BinaryReader you will get some
convenient methods.

Arne
 
J

Joshua Cranmer

Knitter said:
Hi,
I know this has been asked a few zillion times but I couldn't find a
good answer for my problem.

I have a binary file, the Ant Movie Catalog database if anyone knows
the software. It is a file where information about movies is stores,
the software was created in Delphi so the binary files contains Pascal
strings and integers.

I know the file format, for example, I know that the strings are store
using a 4 bytes integer representing the string length, followed by
the actually string. What I'm falling to understand is how to read the
file.

Little-endian, big-endian, or what other composition of integer? There
is a big difference between the various representations that can
horribly break a program.
I've been using BufferedInputStream created with a FileInputStream.
If I use the read(byte[]) method, that fills the passed array with the
array.lenght how can I transform that array of bytes into the integer
that I need?

Assuming a big-endian format (where the integer 0x0a0b0c0d is stored as
the bytes 0x0a, 0x0b, 0x0c, 0x0d), then java.io.DataInput can be used to
parse the data.

If it is little-endian, or another less-standard format, then some magic
will have to be used:

byte[] temp = new byte[4];
in.read(temp);
return ((temp[0] & 0xff)) | ((temp[1] & 0xff) << 8) |
((temp[2] & 0xff) << 16) | ((temp[3] & 0xff) << 24);
 
G

Gordon Beaton

I've been using BufferedInputStream created with a FileInputStream.
If I use the read(byte[]) method, that fills the passed array with
the array.lenght how can I transform that array of bytes into the
integer that I need?

If you have an array of 4 bytes, you can convert it to an integer by
combining the 4 values in the right order (which depends on the byte
order in the file):

int n = arr[0] + (arr[1] << 8) + (arr[2] << 16) + (arr[3] << 24);

or

int n = arr[3] + (arr[2] << 8) + (arr[1] << 16) + (arr[0] << 24);

DataInputStream has methods to do this for you (if your data order is
big endian).

/gordon

--
 
K

Knitter

Isn't BinaryReader C#? I can't find that class on the Java API, maybe
I'm not seeing the correct package.
I'll look more closely into it, thanks.
 
K

Knitter

Thanks for all the replies. My previous post should have been the
third reply as I was replying to Arne post.

"Little-endian, big-endian" that is something I'll have to really
learn. Have never had to understand the issues in the integer
representation.
I know that the the file was created with Delphi 7 for the 32bit
Windows platform, does that help? :)

Thanks again, I believe I can manage now, that byte to int conversion
helped but I'm seeing my lack of knowledge getting in the way.

Best regards,

Sergio
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Knitter said:
Isn't BinaryReader C#? I can't find that class on the Java API, maybe
I'm not seeing the correct package.
I'll look more closely into it, thanks.

Sorry.

Yes.

DataInputStream in Java !

Arne
 
K

Knitter

This is what the help on the Ant Movie Catalog file format states:

All types are Pascal types(...)

"Each "string" field is preceded by an integer (4 bytes, signed) that
gives the size of the vector (size = 0 if no vector, i.e. empty
string). So strings are string (char = 1 byte, unsigned) without
ending delimiter."

If I try to read 4 bytes, using a byte array with size 4, or if I try
to read an integer, using the DataInputStream's readInt() method, I
get a nunber that can't represent any string present in the catalog, I
get "541150531". I haven't tried to create a byte array with that
number but I think that is not a correct length for a string, I'll
most likely end up with an out of memory exception :)

I really can't see how to read the file... thanks, I'll go and think
about this a bit...
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Arne said:
Sorry.

Yes.

DataInputStream in Java !

Note that Java DataInputStream uses network order (big
endian) while .NET BinaryReader uses Little Endian.

Arne
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Knitter said:
If I try to read 4 bytes, using a byte array with size 4, or if I try
to read an integer, using the DataInputStream's readInt() method, I
get a nunber that can't represent any string present in the catalog, I
get "541150531". I haven't tried to create a byte array with that
number but I think that is not a correct length for a string, I'll
most likely end up with an out of memory exception :)

541150531 is " ADC".

Arne
 
K

Knitter

Note that Java DataInputStream uses network order (big
endian) while .NET BinaryReader uses Little Endian.

Arne

..Net? I'm not using any .Net type. I'm trying to read Delphi types,
from Delphi 7, not Delphi.Net. So I'm trying to read Pascal types in
Java :)
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Knitter said:
.Net? I'm not using any .Net type. I'm trying to read Delphi types,
from Delphi 7, not Delphi.Net. So I'm trying to read Pascal types in
Java :)

I know.

I brought the .NET in and just wanted to clarify that there are a
small difference between .NET BinaryReader and Java DataInputStream.

My guess is that Delphi would save in little endian, but the
docs should really say so.

Arne
 
K

Knitter

541150531 is " ADC".

That number should represent the length of the string, not the string.
Either way that string does not exist in the catalog.
The format is: <length as a 4 byte integer><string as 1 char sequence,
no ending delimiter>.
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Knitter said:
That number should represent the length of the string, not the string.
Either way that string does not exist in the catalog.
The format is: <length as a 4 byte integer><string as 1 char sequence,
no ending delimiter>.

I think you must have misread. Unless the length is supposed
to be in the hundreds of MB range.

And I find it suspiciously that it matches ASCII text.

Arne
 
M

Mark Space

Gordon said:
If you have an array of 4 bytes, you can convert it to an integer by
combining the 4 values in the right order (which depends on the byte
order in the file):

int n = arr[0] + (arr[1] << 8) + (arr[2] << 16) + (arr[3] << 24);

or

int n = arr[3] + (arr[2] << 8) + (arr[1] << 16) + (arr[0] << 24);

DataInputStream has methods to do this for you (if your data order is
big endian).

Does the static method Integer.reverseBytes(int) do the same thing? It
seems like it should, although I didn't check it. DataInputStream seems
like the better choice for this case, just wanted to point out there was
another method...
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Mark said:
Gordon said:
If you have an array of 4 bytes, you can convert it to an integer by
combining the 4 values in the right order (which depends on the byte
order in the file):

int n = arr[0] + (arr[1] << 8) + (arr[2] << 16) + (arr[3] << 24);

or

int n = arr[3] + (arr[2] << 8) + (arr[1] << 16) + (arr[0] << 24);

DataInputStream has methods to do this for you (if your data order is
big endian).

Does the static method Integer.reverseBytes(int) do the same thing? It
seems like it should, although I didn't check it. DataInputStream seems
like the better choice for this case, just wanted to point out there was
another method...

big endian : DataInputStream readInt

little endian : DataInputStream readInt + Integer reverseBytes

would do nicely.

Arne
 
K

Knitter

I might have misread it but I have copy and pasted the help on the
file format a few post above, I'll post it here again:

"Each "string" field is preceded by an integer (4 bytes, signed) that
gives the size of the vector (size = 0 if no vector, i.e. empty
string). So strings are string (char = 1 byte, unsigned) without
ending delimiter."

The above text is what is written in Ant Movie Catalog format. I may
be reading it wrong though.
 
M

Mark Space

Mark said:
Does the static method Integer.reverseBytes(int) do the same thing? It
seems like it should, although I didn't check it. DataInputStream seems
like the better choice for this case, just wanted to point out there was
another method...

Answer: yes it does, although my quick test seems to have been bit by a
silent cast conversion. To the language designers: dear sweet Jesus and
God in Heaven, why?

package integertest;
import java.lang.Integer;

public class Main
{
public static void main(String[] args)
{
System.out.println( "128 reversed is " +
Integer.reverseBytes(128));
System.out.println( "-1 reversed is " + Integer.reverseBytes(-1));
System.out.println( "-129 reversed is " +
Integer.reverseBytes(-129));
System.out.println( "256 reversed is " +
Integer.reverseBytes(256));
}
}

ompile:
run:
128 reversed is -2147483648
-1 reversed is -1
-129 reversed is 2147483647
256 reversed is 65536
BUILD SUCCESSFUL (total time: 0 seconds)
 
M

Mark Space

Mark said:
To the language designers: dear sweet Jesus and
God in Heaven, why?
256 reversed is 65536


Oops, math error on my part, I thought 256 reversed would result in a
different bit pattern. Wailing and moaning at Sun can now cease. Move
along, nothing to see here...
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Mark said:
Answer: yes it does, although my quick test seems to have been bit by a
silent cast conversion. To the language designers: dear sweet Jesus and
God in Heaven, why?

package integertest;
import java.lang.Integer;

public class Main
{
public static void main(String[] args)
{
System.out.println( "128 reversed is " +
Integer.reverseBytes(128));
System.out.println( "-1 reversed is " + Integer.reverseBytes(-1));
System.out.println( "-129 reversed is " +
Integer.reverseBytes(-129));
System.out.println( "256 reversed is " +
Integer.reverseBytes(256));
}
}

ompile:
run:
128 reversed is -2147483648
-1 reversed is -1
-129 reversed is 2147483647
256 reversed is 65536
BUILD SUCCESSFUL (total time: 0 seconds)

Which one is puzzling you ?

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,162
Latest member
GertrudeMa
Top