Read binary data file

W

Windsor.Locks

I am a C++ programmer, working on a java program. I need to read a
binary file using Java.

Here is how I read it in C++,

Struct SOME_DATA
{
unsigned long data1;
unsigned short data2;
unsigned short data3;
unsigned long data4;
}

struct SOME_DATA someData;

and read using

fread(&someData, 12, 1, inputFile);


Please give me some pointers, how do i read this using Java? Thanks.
BTW, those are not the variable names I use in my program.
 
J

Joshua Cranmer

I am a C++ programmer, working on a java program. I need to read a
binary file using Java.

InputStream is = new FileInputStream("file/name.txt");
byte[] data = new byte[12];
is.read(data);

That reads 12 bytes of data into data. Alternatively, you can grab
byte-by-byte or use only part of the buffer. See the JavaDocs for
java.io.InputStream for more information.
 
S

shakah

I am a C++ programmer, working on a java program. I need to read a
binary file using Java.

Here is how I read it in C++,

Struct SOME_DATA
{
unsigned long data1;
unsigned short data2;
unsigned short data3;
unsigned long data4;

}

struct SOME_DATA someData;

and read using

fread(&someData, 12, 1, inputFile);

Please give me some pointers, how do i read this using Java? Thanks.
BTW, those are not the variable names I use in my program.

It's never a good idea portability-wise to write structs in binary
format (e.g. how do you deal with packing, different CPU
architectures, etc.?), but ignoring that for now you could naively do
something like the following. Note that this only works on big-endian
machines, and is probably unreliable there anyway.

jc@soyuz:~/tmp/binrw$ cat main.cpp
#include <stdio.h>

int main(int /*argc*/, char **argv) {
struct SOME_DATA {
unsigned long data1 ;
unsigned short data2 ;
unsigned short data3 ;
unsigned long data4 ;
} ;

SOME_DATA someData = { 1, 2, 3, 4 } ;

FILE *fh = fopen(argv[1], "wb") ;
fwrite(&someData, sizeof(someData), 1, fh) ;
fclose(fh) ;

return 0 ;
}
jc@soyuz:~/tmp/binrw$ g++ -W -Wall -pedantic -o test main.cpp
jc@soyuz:~/tmp/binrw$ ./test test2.file
jc@soyuz:~/tmp/binrw$ cat test.java
public class test {
public static void main(String [] args)
throws java.io.IOException {
java.io.DataInputStream dis
= new java.io.DataInputStream(
new java.io.FileInputStream(
new java.io.File(
args[0]
)
)
) ;
System.out.println("data1: " + dis.readInt()) ;
System.out.println("data2: " + dis.readShort()) ;
System.out.println("data3: " + dis.readShort()) ;
System.out.println("data4: " + dis.readInt()) ;
}
}
jc@soyuz:~/tmp/binrw$ javac test.java
jc@soyuz:~/tmp/binrw$ java -classpath . test test2.file
data1: 1
data2: 2
data3: 3
data4: 4

For reference, duplicating the above on an Intel box yields:
jc@jc-ubuntu:~/tmp/binrw$ java test test.file
data1: 16777216
data2: 512
data3: 768
data4: 67108864
 
M

Mike Schilling

I am a C++ programmer, working on a java program. I need to read a
binary file using Java.

Here is how I read it in C++,

Struct SOME_DATA
{
unsigned long data1;
unsigned short data2;
unsigned short data3;
unsigned long data4;
}

struct SOME_DATA someData;

and read using

fread(&someData, 12, 1, inputFile);


Please give me some pointers, how do i read this using Java? Thanks.
BTW, those are not the variable names I use in my program.

Java doesn't allow you to read into (or write from) a structure this way.
Say you create a Java class:

class SomeData
{
long data1;
short data2;
short data3;
long data4;
}

Unlike in C or C++, there's really no defined order for the fields, and thus
no way to issue one read that fills all of them. You need to read into each
one individually. See java.io.DataInoutStream for how to do this.
 
W

Windsor.Locks

It's never a good idea portability-wise to write structs in binary
format (e.g. how do you deal with packing, different CPU
architectures, etc.?), but ignoring that for now you could naively do
something like the following. Note that this only works on big-endian
machines, and is probably unreliable there anyway.


Thanks for your reply. I do not have any say in the file format or how
the file is written. My requirement is read this file and get the data
out of it. There is nothing more I can do.
 
H

Hunter Gratzner

I am a C++ programmer, working on a java program. I need to read a
binary file using Java.

Here is how I read it in C++,

Struct SOME_DATA
{
unsigned long data1;
unsigned short data2;
unsigned short data3;
unsigned long data4;

}

struct SOME_DATA someData;

and read using

fread(&someData, 12, 1, inputFile);

This is already a stupid idea in C++, since there is no guarantee that
sizeof(SOME_DATA) == 12. Since this is a Java group I'd like to
recommend that you consult some C++ resource regarding struct
alignment and padding, data type size, and (network) byte order.

In Java (assuming you have fixed you C++ problem), one would read this
e.g. with a DataInputStream:

/*
* Read data using network byte-order, aka big-endian
* byte-order (MSB first), and no padding/alignment
* between the data.
*/
class Data {
/*
* Note, Java has no unsigned data types.
* Therefore in this example I store the unsigned short
* in a (signed) int, and the unsigned long in a BigInteger
* Typically, in a carefully designed application this
* can be avoided, but I do it here to avoid discussion
* of using signed data types to handle unsigned types.
*/
private BigInteger data1; // data format: unsigned long64
private int data2; // data format: unsigned short16
private int data3; // data format: unsigned short16
private BigInteger data4; // data format: unsigned long64

public void read(DataInputStream in) throws IOException {
byte ulong2big[] = new byte[5];
ulong2big[0] = 0; // ensure MSB is always zero, so
// we get an unsigned interpretation
// of the following 4 byte data
// when converting the array to a
// BigInteger

// Read four bytes and convert them to a BigInteger
// In carefully designed applications a
// data1 = in.readLong() would do.
in.read(ulong2big, 1, 4); // TODO: check return value
data1 = new BigInteger(ulong2big);

// Read the unsigned short into an int
data2 = in.readUnisgnedShort();
// in.skipByte(...) in case padding needs to be skipped

data3 = in.readUnsignedShort();
// in.skipByte(...) in case padding needs to be skipped

// Read four bytes and convert them to a BigInteger
// In carefully designed applications a
// data4 = in.readLong() would do.
in.read(ulong2big, 1, 4); // TODO: check return value
data4 = new BigInteger(ulong2big);
}
}
 
H

Hunter Gratzner

Thanks for your reply. I do not have any say in the file format or how
the file is written. My requirement is read this file and get the data
out of it. There is nothing more I can do.

Then the one "defining" this data format has no fucking clue. C/C++
structs have no well defined binary layout, except the order of
elements. C/C++ integer data types have no well defined binary
representation and no well defined size, except a minimum value range.
 
K

~kurt

Hunter Gratzner said:
Then the one "defining" this data format has no fucking clue. C/C++
structs have no well defined binary layout, except the order of
elements. C/C++ integer data types have no well defined binary
representation and no well defined size, except a minimum value range.

And you are missing the point. There are many legacy systems out there
that make plenty of assumptions, and have been working just fine for longer
than Java has even existed.

Instead of telling us the one defining the data format has no clue (which
you are wrong about), why don't you explain your solution to writing a binary
file in C/C++, FORTRAN, or whatever, that will solve all the academic issues
you have just brought up.

Reading binary files is almost always tricky, especially when you move from
one platform, OS, or language to the next. There is no way to circumvent
this. It is the price you pay to have the data in a binary format. Java
does, at least, make it portable across platforms and OSs - but not languages.
If you are reading a binary file created outside of Java, then you are going
to need to create a custom reader for this data. What is really annoying is
when you don't even know the endian or the size of the values (16 bit,
32 bit?) and need to experiment to get it right.

I've had to do this numerous times myself. The worst was for one application
that was written in a version of FORTRAN that would put an arbitrary sized
(arbitrary as far as I could tell) header after each record that was written
(turning off this header was a compile time option, that I seem to remember
would make accessing the file less efficient, or something). I wanted to
read it directly into Matlab - not an easy task.

- Kurt
 
W

Windsor.Locks

I am a C++ programmer, working on a java program. I need to read a
binary file using Java.

Here is how I read it in C++,

Struct SOME_DATA
{
unsigned long data1;
unsigned short data2;
unsigned short data3;
unsigned long data4;

}

struct SOME_DATA someData;

and read using

fread(&someData, 12, 1, inputFile);

Please give me some pointers, how do i read this using Java? Thanks.
BTW, those are not the variable names I use in my program.

Thank you for all who tried to help. I got it working and in the
interest of future programmers here is how I did it.

Of course this is my crappy program with crappy variable names etc,
which I am going to rewrite. Also, the
arr2long function is from here

http://www.captain.at/howto-java-convert-binary-data.php



public class Convert {

public static void main(String [] args) {

int crap = 0, doublecrap = 0, counter = 0;

try {
String file = "/opt/workspace/blahblah/binary.file";
FileInputStream fis = new FileInputStream(file);
DataInputStream dis = new DataInputStream(fis);

int numberBytes = 4;
byte data1[] = new byte[numberBytes];
byte data2 [] = new byte[2];
byte data3 [] = new byte[2];
byte data4 [] = new byte[numberBytes];



while (true) {

int retval = dis.read(data1);
dis.read(data2);
dis.read(data3);
dis.read(data4);

if(retval == -1)
break;

long stuff = arr2long(data1, 0);
long stuff1 = arr2long(data4, 0);
System.out.println(stuff + " : " + stuff1);
counter ++;

}

// fis.close();
}
catch (IOException ioex) {

}
finally {
System.out.println("number of records read : " + counter);
}
}

public static long arr2long (byte[] arr, int start) {
int i = 0;
int len = 4;
int cnt = 0;
byte[] tmp = new byte[len];
for (i = start; i < (start + len); i++) {
tmp[cnt] = arr;
cnt++;
}
long accum = 0;
i = 0;
for ( int shiftBy = 0; shiftBy < 32; shiftBy += 8 ) {
accum |= ( (long)( tmp & 0xff ) ) << shiftBy;
i++;
}
return accum;
}
}
 
L

Lew

Here is how I read it in C++,

Struct SOME_DATA
{
unsigned long data1;
unsigned short data2;
unsigned short data3;
unsigned long data4;

}

struct SOME_DATA someData;

and read using

fread(&someData, 12, 1, inputFile);

Please give me some pointers, how do i read this using Java? Thanks.
....
public class Convert {

public static void main(String [] args) {

int crap = 0, doublecrap = 0, counter = 0;
etc.
}
}

java.nio.ByteOrder will help you if you use the java.nio package as Roedy
suggested.

Please do not embed TABs in Usenet posts; it really fubars the alignment.
 
R

Roedy Green

Thank you for all who tried to help. I got it working and in the
interest of future programmers here is how I did it.

You are trying to read little-endian data. It is a lot easier with
LEDatastream.

float f = dis.readFloat();
double d = dis.readDouble();
int i = dis.readInt();
 
W

Windsor.Locks

You are trying to read little-endian data. It is a lot easier with
LEDatastream.

float f = dis.readFloat();
double d = dis.readDouble();
int i = dis.readInt();

Well, that actually does not work. See the reply above by "shakah"
 
H

Hunter Gratzner

And you are missing the point.

No, I don't. A C struct is not a suitable, unambiguous format
specification, binary or otherwise. That's the whole point. Giving
someone just a C struct and telling him to implement it in Java is a
pointless stupid act. It indicates that the one giving this file
format "definition" has no fucking clue what he is doing.
Instead of telling us the one defining the data format has no clue (which
you are wrong about), why don't you explain your solution to writing a binary
file in C/C++, FORTRAN, or whatever, that will solve all the academic issues
you have just brought up.

It did that previously in this same thread, but you are apparently
more interested in picking a fight.
Reading binary files is almost always tricky, especially when you move from
one platform, OS, or language to the next. There is no way to circumvent
this.

Sure it is. By having an unambiguous format specification. A C struct
is not an unambiguous format specification.
It is the price you pay to have the data in a binary format.

No, it is the price to pay when some fuckwit thinks that writing C
structs 1:1 to memory is a good idea.

There is no difference between a binary and a text format if you need
to move between platforms. Either the format is unambiguously defined,
then it's a straight forward job to implement it, or it isn't.
What is really annoying is
when you don't even know the endian or the size of the values (16 bit,
32 bit?) and need to experiment to get it right.

And why do you then think a C struct is a good definition of a binary
format?
 
M

Mike Schilling

Hunter Gratzner said:
C/C++ integer data types have no well defined binary
representation and no well defined size, except a minimum value range.

And the presence or absence of between-field padding isn't always
guaranteed. Still, if the files don't have to be cross-platform, reading
and writing structs will work just fine. Note: the *application* can be
portable across platforms, so long as the (for example) Solaris/Sparc
version won't have to read files written by the Windows/Intel version.
 
K

~kurt

Hunter Gratzner said:
No, I don't. A C struct is not a suitable, unambiguous format
specification, binary or otherwise. That's the whole point. Giving
someone just a C struct and telling him to implement it in Java is a
pointless stupid act. It indicates that the one giving this file
format "definition" has no fucking clue what he is doing.

It is hardly pointless. Most of the time, there is no format specification
because binary data is often not written with the intention of being used
outside of the application that writes it. Only later does an outside user
have a need for the data, and then one has to often reverse engineer a
solution. A C struct at least gives you an idea as to what type of data is in
the file. Knowing what platform it was written in helps out even more.
It did that previously in this same thread, but you are apparently
more interested in picking a fight.

I'm put off by your attitude that what the OP has to work with is due to
someone who has no clue. If you are saying a C structure makes a bad
ICD, then I agree with you. But, binary files are often not written with
portability in mind, and the implementation details exist only in the code
that reads/writes the data. There is nothing wrong with that when the
original intent of the data was for internal use only - and that is often
the case. Then, seeing how the data is read into a C structure is invaluable.

The soultion I saw you post was an example of how to read the data. I didn't
see anything but bitching regarding the data source.
No, it is the price to pay when some fuckwit thinks that writing C
structs 1:1 to memory is a good idea.

It is often the only reasonable idea, depending on the orignal intent of
the data. Like I said, I didn't see a better solution posted by you
on how to do this. Creating unecessary ICDs is a bad thing.
And why do you then think a C struct is a good definition of a binary
format?

It works as good as anything else for many uses. If you write a specification
describing how many bytes a number is supposed to take up, and the endian, and
the data is only to be used internally, then you are creating extra work for
youself when you port the code to other platforms (of course, you want to call
sizeoff() when reading in the structure instead of hard coding the size).

- Kurt
 
C

Charles

It is hardly pointless. Most of the time, there is no format specification
because binary data is often not written with the intention of being used
outside of the application that writes it. Only later does an outside user
have a need for the data, and then one has to often reverse engineer a
solution. A C struct at least gives you an idea as to what type of data is in
the file. Knowing what platform it was written in helps out even more.



I'm put off by your attitude that what the OP has to work with is due to
someone who has no clue. If you are saying a C structure makes a bad
ICD, then I agree with you. But, binary files are often not written with
portability in mind, and the implementation details exist only in the code
that reads/writes the data. There is nothing wrong with that when the
original intent of the data was for internal use only - and that is often
the case. Then, seeing how the data is read into a C structure is invaluable.

The soultion I saw you post was an example of how to read the data. I didn't
see anything but bitching regarding the data source.


It is often the only reasonable idea, depending on the orignal intent of
the data. Like I said, I didn't see a better solution posted by you
on how to do this. Creating unecessary ICDs is a bad thing.


It works as good as anything else for many uses. If you write a specification
describing how many bytes a number is supposed to take up, and the endian, and
the data is only to be used internally, then you are creating extra work for
youself when you port the code to other platforms (of course, you want to call
sizeoff() when reading in the structure instead of hard coding the size).

- Kurt

Dear Friends (when did you guys become my friends?)

Let's review what the OP stated

A struct is given in C++

Data needs to read from a file in Java.

You have the following data types

unsigned long
unsigned short

As previously stated by other posters the Endianness of the operating
system should affect how the output file is encoded. I assume this to
be true but have not verified it to be true.

We assume all unsigned longs and unsigned short will ALWAYS have the
same bytesize.

The complete struct is given as

unsigned long data1;
unsigned short data2;
unsigned short data3;
unsigned long data4;

Can we also assume that the data will always be sequenced as described
in the STRUCT?
I don't see any argument why the data will be out of sequence as
defined in the STRUCT.

Does the input file get modified when it is transported from one
operating system to another?
I assume NO. This is not verified.

Are there equivalents of unsigned long and unsigned short in Java?
Are they the same byte size?
Do they encode the data the same?

Try to read in Java and verify with known data. If you don't know any
of the data values this becomes a harder task.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,143
Latest member
SterlingLa
Top