How to read binary data

D

Dominik G

Hi,

I have to use Java to read a binary file. This looks easy (yes, I've
read the tutorial on Sun's website). The problem is that the files
were created by a C program. I don't know the exact file format, but I
have the C source code.

The first entry is grabbed like that:

fread(query_length, sizeof(long), 1, file)

Since query_length is a pointer to something of type long (four-byte
signed integer), I expect that this in Java should be:
int seqLen = myDataInputStream.readInt();

.... but unfortunately it does not work. I am getting 172777843 instead
of 56.

Does anybody have an idea what I am doing wrong? Maybe some ideas for
reverse engineering? As I said, I don't know the file format, but I
have a working C code (including source) that I may use to encode my
own data.

Best regards,
James
 
E

Eric Sosman

Dominik said:
Hi,

I have to use Java to read a binary file. This looks easy (yes, I've
read the tutorial on Sun's website). The problem is that the files
were created by a C program. I don't know the exact file format, but I
have the C source code.

The first entry is grabbed like that:

fread(query_length, sizeof(long), 1, file)

Since query_length is a pointer to something of type long (four-byte
signed integer), I expect that this in Java should be:
int seqLen = myDataInputStream.readInt();

... but unfortunately it does not work. I am getting 172777843 instead
of 56.

Does anybody have an idea what I am doing wrong? Maybe some ideas for
reverse engineering? As I said, I don't know the file format, but I
have a working C code (including source) that I may use to encode my
own data.

A mere endianness mismatch doesn't explain it: 56=38(16)
would yield 38000000(16)=939524096, not 172777843 on a "little-
endian"/"big-endian" clash. Even a "middle-endian" 00380000(16)
=3670016 doesn't fit. So I think you'll need to show us a short,
complete, program and not merely a one-line fragment.
 
J

Joshua Cranmer

Dominik said:
The first entry is grabbed like that:

fread(query_length, sizeof(long), 1, file)

Since query_length is a pointer to something of type long (four-byte
signed integer), I expect that this in Java should be:
int seqLen = myDataInputStream.readInt();

... but unfortunately it does not work. I am getting 172777843 instead
of 56.

The first thing to look at is endianness issues. It is also possible
that you're missing some offset into the file.
 
R

Roedy Green

I have to use Java to read a binary file. This looks easy (yes, I've
read the tutorial on Sun's website). The problem is that the files
were created by a C program. I don't know the exact file format, but I
have the C source code.

there are two tools: DataInputStream for big-endian data and
LEDataInputStream for little-endian data.

http://mindprod.com/jgloss/endian.html

http://mindprod.com/products1.html#LEDATASTREAM

Use the File I/O amanunesis to generate sample code for either
approach:

http://mindprod.com/applet/quoter.html

--
Roedy Green Canadian Mind Products
http://mindprod.com

"Deer hunting would be fine sport, if only the deer had guns."
~ William S. Gilbert of Gilbert and Sullivan
 
R

Roedy Green

I have to use Java to read a binary file. This looks easy (yes, I've
read the tutorial on Sun's website). The problem is that the files
were created by a C program. I don't know the exact file format, but I
have the C source code.

examine the file with a hex viewer to make sure it has the format you
think it does. Sometimes you might find short fields, pad fields etc
in a C struct.
--
Roedy Green Canadian Mind Products
http://mindprod.com

"Deer hunting would be fine sport, if only the deer had guns."
~ William S. Gilbert of Gilbert and Sullivan
 
D

Dominik G

So I think you'll need to show us a short,
Ok, so I've pasted a Perl script below since is shorter and simpler
than a relevant C source. It uses Perl's unpack() function and looks
extremely simple. There is an example input file:

http://www.mediafire.com/?tckevmw2oiv

(hopefully it works) The first two lines of the parsed result should
be:
Data length: 56
0.0526502029133332 0.0109779760611708 0.0173353929211734
0.0293806562130121 0.0361268426980054 0.0272151105379117
0.0119821633792096 0.0708810218509405 0.0335045170115268
0.167782421275881 0.202791113265232 0.0198617711540251
0.0205737644831438 0.0343759949939553 0.0297890505981974
0.0397544225699747 0.0432511571243995 0.0762415705584173
0.00757165579280845 0.0212506125299986
(the second line is a row of 20 doubles)

Thanks a lot for your help!
James

#!/usr/bin/perl

$DEBUG=1;

sub parse_file
{
my $filename = "file.dat";
my $buf;
my $seqlen;
my $seqstr;
my $i;
my $j;
my @mapping = (0,4,3,6,13,7,8,9,11,10,12,2,14,5,1,15,16,19,17,18);
my @w;
my @output;

open(INPUT, $filename) || die ("Couldn't open $filename for reading.
\n");

read(INPUT, $buf, 4) || die ("Couldn't read $filename!\n");
$seqlen = unpack("i", $buf);

(!$DEBUG) || print "Data length: $seqlen\n";

read(INPUT, $buf, $seqlen) || die ("Premature end: $filename.\n");
$seqstr = unpack("a$seqlen", $buf);

for ($i = 0; $i < $seqlen; ++$i) {
read(INPUT, $buf, 160) || die("Premature end: $filename, line: $i
\n");
@w = unpack("d20", $buf);

for ($j = 0; $j < 20; ++$j) {
$output[$i][$j] = $w[$mapping[$j]];
(!$DEBUG) || print $output[$i][$j]," ";
}
(!$DEBUG) || print "\n";
}

return @output;
}

parse_file;
 
J

Joshua Cranmer

Dominik said:
Ok, so I've pasted a Perl script below since is shorter and simpler
than a relevant C source. It uses Perl's unpack() function and looks
extremely simple. There is an example input file:

Looking at the file in HD, the aforementioned number never shows up. The
most likely scenario is that you're not reading what you think you're
reading.

Oh, and DataInputStream won't help you here, as the endianness is
incorrect (the first 4 bytes are 0x38000000, which is how
DataInputStream would read it, instead of the correct 0x38).
 
D

Dominik G

OK, so now I do this:
for(int i = 0;i < 10;i++) {
ch = is.readByte();
System.out.print(ch + " ");
}
and the result is:
56 0 0 0 77 84 89 75 76 73 76 78 71 75 84 76 75 71 69 84
In particular my first four bytes contain the 56 I am looking for. But
what is the "official" way to get that value (with no hacking)? I may
hack on this first integer, i.e. to read bytes and convert them to int
by hand in such a way that gives mi a proper result, but I am afraid
this doesn't work in the case of doubles which are later in the file.

Best,
James
 
L

Lew

Peter said:
AFAIK, Java has no built-in support for endian conversion,

<http://java.sun.com/javase/6/docs/api/java/nio/ByteOrder.html>
used by
Primitive values are translated to (or from) sequences of bytes according to
the buffer's current byte order, which may be retrieved and modified via the
order methods. Specific byte orders are represented by instances of the
ByteOrder class. The initial order of a byte buffer is always BIG_ENDIAN.

See in particular
<http://java.sun.com/javase/6/docs/api/java/nio/ByteBuffer.html#order()>
<http://java.sun.com/javase/6/docs/api/java/nio/ByteBuffer.html#order(java.nio.ByteOrder)>
 
K

Knute Johnson

Dominik said:
OK, so now I do this:
for(int i = 0;i < 10;i++) {
ch = is.readByte();
System.out.print(ch + " ");
}
and the result is:
56 0 0 0 77 84 89 75 76 73 76 78 71 75 84 76 75 71 69 84
In particular my first four bytes contain the 56 I am looking for. But
what is the "official" way to get that value (with no hacking)? I may
hack on this first integer, i.e. to read bytes and convert them to int
by hand in such a way that gives mi a proper result, but I am afraid
this doesn't work in the case of doubles which are later in the file.

Best,
James

Is it BCD? Does 56 01 00 00 represent 156?
 
R

Roedy Green

0.0526502029133332 0.0109779760611708 0.0173353929211734
0.0293806562130121 0.0361268426980054 0.0272151105379117
0.0119821633792096 0.0708810218509405 0.0335045170115268
0.167782421275881 0.202791113265232 0.0198617711540251
0.0205737644831438 0.0343759949939553 0.0297890505981974
0.0397544225699747 0.0432511571243995 0.0762415705584173
0.00757165579280845 0.0212506125299986
(the second line is a row of 20 doubles)

This is not hex/binary. These look like strings of Ascii digits,
three doubles per line. There are many ways to read them. One way is
to use readLine, see http://mindprod.com/applet/fileio.html for code,
then a Regex split or parse with indexOf. Then convert each String to
double.

See http://mindprod.com/applet/converter.html for how.

Some work studying the C program is in order here to find out just
what it is writing, and if it is packing in some way.

If you are stuck, some of us, myself including, will sort it out for
you for a fee.
--
Roedy Green Canadian Mind Products
http://mindprod.com

"Deer hunting would be fine sport, if only the deer had guns."
~ William S. Gilbert of Gilbert and Sullivan
 
R

Roedy Green

AFAIK, Java has no built-in support for endian conversion, but it
shouldn't be hard to handle this yourself.

there is some endian support in NIO, but it is much easier just to use
LEDataInputStream which works identically to DataInputStream except it
is little endian.
--
Roedy Green Canadian Mind Products
http://mindprod.com

"Deer hunting would be fine sport, if only the deer had guns."
~ William S. Gilbert of Gilbert and Sullivan
 
J

John B. Matthews

Dominik G said:
OK, so now I do this:
for(int i = 0;i < 10;i++) {
ch = is.readByte();
System.out.print(ch + " ");
}
and the result is:
56 0 0 0 77 84 89 75 76 73 76 78 71 75 84 76 75 71 69 84
In particular my first four bytes contain the 56 I am looking for. But
what is the "official" way to get that value (with no hacking)? I may
hack on this first integer, i.e. to read bytes and convert them to int
by hand in such a way that gives mi a proper result, but I am afraid
this doesn't work in the case of doubles which are later in the file.

You can get seqlen using readIntLittleEndian():

<http://mindprod.com/jgloss/endian.html#INT>

Skip that many bytes of seqstr
Read the data into a byte[][]
Unscramble the bytes using a mapping[]
Convert each double's byte[] to a long
Convert the long using Double.longBitsToDouble()

Less flexibly, you can just exec the perl and parse stdout using
Double.valueOf() or Scanner:

<code>
import java.io.*;
/** @author John B. Matthews */
class ExecTest {

public static void main (String[] args) {
String s;
try {
Process p = Runtime.getRuntime().exec("./parse.pl");
// read from the process's stdout
BufferedReader stdout = new BufferedReader (
new InputStreamReader(p.getInputStream()));
while ((s = stdout.readLine()) != null) {
// Parse using Double.valueOf() or Scanner
}
// read from the process's stderr
BufferedReader stderr = new BufferedReader (
new InputStreamReader(p.getErrorStream()));
while ((s = stderr.readLine()) != null) {
System.err.println(s);
}
p.getInputStream().close();
p.getOutputStream().close();
p.getErrorStream().close();
System.err.println("Exit value: " + p.waitFor());
}
catch (Exception e) {
e.printStackTrace();
}
}
}
</code>
 
M

markspace

Dominik said:
.... but unfortunately it does not work. I am getting 172777843 instead
of 56.

I don't know where you are getting the value 172777843 from, perhaps you
are reading the wrong file, or the wrong offset in the file. What you
posted up thread reads fine for me, however. To swap endian for an int,
use Integer.reverseBytes(int):

<code>
DataInputStream din = new DataInputStream(
new FileInputStream( "src/fubar/file.dat") );
for( int i, x=0; (i = din.readInt()) != -1 && x < 10; x++ ) {
System.out.print( Integer.reverseBytes(i)+", ");
}
in.close();
</code>

<output>
56, 1264145485, 1313622348, 1280592711, 1413826379, 1095062612,
1094796374, 1262829908, 1363887702, 1145979225,
</output>

Note the first value, 56 ,is correct. If 56 itself is byte swapped, I
get 939524096 decimal, I have no idea where your value came from.
Perhaps there is a problem in code you haven't shown us....

Are the remaining values doubles? floats? I confess I'm happily Perl
illiterate. Your code example did nothing for me.
 
D

Dominik G

Thanks a lot, now it works! I followed the suggestions by Lew and used
ByteOrder and ByteBuffer classes.

Best regards,
Dominik
 
A

Arne Vajhøj

Roedy said:
there is some endian support in NIO, but it is much easier just to use
LEDataInputStream which works identically to DataInputStream except it
is little endian.

It is bad advice to use a third party library when the functionality
is present in the standard Java library.

And it is not really easier.

Arne
 
R

Roedy Green

It is bad advice to use a third party library when the functionality
is present in the standard Java library.

That depends. If it is a one-short internal conversion problem you
are best to use LEDataStream because you will likely get it coded 5
times faster with LEDatastream than with NIO presuming you are
unfamiliar with NIO, as O.P. appears to be. There is no learning
curve. Further LEDataStream comes with source and is quite a small
class, so the biggest problem with relying on a third party library
does not apply, namely having the third party discontinue it.
--
Roedy Green Canadian Mind Products
http://mindprod.com

"Deer hunting would be fine sport, if only the deer had guns."
~ William S. Gilbert of Gilbert and Sullivan
 
N

Nigel Wade

Roedy said:
That depends. If it is a one-short internal conversion problem you
are best to use LEDataStream because you will likely get it coded 5
times faster with LEDatastream than with NIO presuming you are
unfamiliar with NIO, as O.P. appears to be. There is no learning
curve. Further LEDataStream comes with source and is quite a small
class, so the biggest problem with relying on a third party library
does not apply, namely having the third party discontinue it.

But when that third party code comes with a restrictive license it is better not
to restrict your own code to that other license, over which you have no
control.
 
N

Nigel Wade

Arne said:
It is bad advice to use a third party library when the functionality
is present in the standard Java library.

And it is not really easier.

Quite.

Read a block into a buffer using DataInputStream. Wrap that buffer into a
ByteBuffer. Set the ByteBuffer to LITTLE_ENDIAN. Read from the ByteBuffer.

Rinse and repeat...

It's hardly rocket science.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top