read bytes from file

David McDivitt · Sep 15, 2005

With Java I must read a file written in VB. The file contains the following
user defined type:

Public Type ProcRec
LastScanTime As Date
NextExecuteTime As Date
NextExecuteTimeSet As Integer
MinutesToRun As Long
TaskID As Double
ID As String * 4
ProcScheduler As String * 1
ProcName As String * 30
Active As Integer
PassArguments As Integer
OwnSection As String * 4
OwnUnit As String * 4
Executable As String * 256
ExecutableType As Integer
TextOffset(60, 1) As Long
End Type

The fixed length data structure is written at the front of the file as a
header. The remainder of the file contains variable length text, with
offsets and lengths contained in the TextOffset array. TextOffset is zero
based having 61 x 2 elements.

I looked at the BufferredReader and FileInputStream classes. They will read
into a character array, but I have to define the character array size before
the read. I could make an array having exceptional size to make sure it's
big enough, but I want a method which will return a character array having
the exact size of the file.

If someone has code to convert the numeric values I'd appreciate it. Would
save a little time but I can do that. Thanks

David McDivitt · Sep 15, 2005

From: David McDivitt said:
Date: Thu, 15 Sep 2005 09:00:33 -0500
Lines: 35

With Java I must read a file written in VB. The file contains the following
user defined type:

Public Type ProcRec
LastScanTime As Date
NextExecuteTime As Date
NextExecuteTimeSet As Integer
MinutesToRun As Long
TaskID As Double
ID As String * 4
ProcScheduler As String * 1
ProcName As String * 30
Active As Integer
PassArguments As Integer
OwnSection As String * 4
OwnUnit As String * 4
Executable As String * 256
ExecutableType As Integer
TextOffset(60, 1) As Long
End Type

The fixed length data structure is written at the front of the file as a
header. The remainder of the file contains variable length text, with
offsets and lengths contained in the TextOffset array. TextOffset is zero
based having 61 x 2 elements.

I looked at the BufferredReader and FileInputStream classes. They will read
into a character array, but I have to define the character array size before
the read. I could make an array having exceptional size to make sure it's
big enough, but I want a method which will return a character array having
the exact size of the file.

If someone has code to convert the numeric values I'd appreciate it. Would
save a little time but I can do that. Thanks

I can read the file with the following code. Ideas to convert the numeric
values would be appreciated. Thanks

public Job getJob(String jobId) throws Exception {
String jobFile = Variables.getJobPath() + jobId + ".desc";
File file = new File(jobFile);
byte[] b = new byte[(int) file.length()];
FileInputStream fs = new FileInputStream(file);
fs.read(b);
fs.close();

christian.bongiorno · Sep 15, 2005

David,

first, shake that notion about char = 8bits right out of your C/C++
oriented mind. A Character in java is 16 bits.

What you really want is to read the whole file as a byte[]. Here is
code to accomplish the job. The caveat is that it only supports 2gb
files

private static byte[] getFileContentsAsBytes(String file) throws
IOException {
File f = new File(file);
byte[] data = new byte[(int)f.length()];
InputStream inStream = new FileInputStream(f);
inStream.read(data);
inStream.close();
return data;
}

Remember also that all data types in java are BigEndian, where as
something produced by VB (which is a MS windows product running on
Intel x86) is LittleEndian. I am pretty sure the libraries will give
you an easy way to convert but I forget which and how.

Also, Reader classes are specifically meant for text characters. Caveat
emptor

Christian Bongiorno
http://christian.bongiorno.org/resume.pdf

Roedy Green · Sep 15, 2005

The fixed length data structure is written at the front of the file as a
header. The remainder of the file contains variable length text, with
offsets and lengths contained in the TextOffset array. TextOffset is zero
based having 61 x 2 elements.

Using old io, you could read the first section using DataInputStream
or possible LEDataInputStream or RandomAccessFile. see
http://mindprod.com/products1.html#LEDATASTREAM

You did not say if the offsets were big endian binary, little endian
binary or text.

Then you can either read the rest of the file with RandomAccessFile
reading byte[] which you after reading convert to String with the
appropriate encoding. See http://mindprod.com/jgloss/encoding.html

Alternatively, you read the entire fire into RAM as a byte array,
extract bits of it and convert to the form you want.

For details of how to do any of that i/o see
http://mindprod.com/applets/fileio.html

I have no experience with this, but you can probably also handle it
with nio mapping the file into RAM and treating it like a big byte
array. The advantage of that approach is you won't read parts of the
file you don't actually need.

Roedy Green · Sep 15, 2005

I can read the file with the following code. Ideas to convert the numeric
values would be appreciated. Thanks

There are 3 likely formats for the numbers.

If they are big-endian ints, you can extract the bytes and read them
with a DataInputStream attached to a ByteArrayReader.

For Little endian, ditto, except use LEDataInputStream.

For details of how, see http://mindprod.com/applets/fileio.html

If they are text chars extract the bytes, convert to string by the
procedure described in http://mindprod.com/jgloss/encoding.html

then convert to int by the process shown at
http://mindprod.com/applets/converter.html

David McDivitt · Sep 16, 2005

From: Roedy Green said:
Date: Thu, 15 Sep 2005 18:51:35 GMT
Lines: 32

The fixed length data structure is written at the front of the file as a
header. The remainder of the file contains variable length text, with
offsets and lengths contained in the TextOffset array. TextOffset is zero
based having 61 x 2 elements.

Click to expand...

Using old io, you could read the first section using DataInputStream
or possible LEDataInputStream or RandomAccessFile. see
http://mindprod.com/products1.html#LEDATASTREAM

You did not say if the offsets were big endian binary, little endian
binary or text.

Then you can either read the rest of the file with RandomAccessFile
reading byte[] which you after reading convert to String with the
appropriate encoding. See http://mindprod.com/jgloss/encoding.html

Alternatively, you read the entire fire into RAM as a byte array,
extract bits of it and convert to the form you want.

For details of how to do any of that i/o see
http://mindprod.com/applets/fileio.html

I have no experience with this, but you can probably also handle it
with nio mapping the file into RAM and treating it like a big byte
array. The advantage of that approach is you won't read parts of the
file you don't actually need.

I read the whole file at one shot with FileInputStream into a byte array.
Then I convert the entire byte array to a string with
String textBlob = new String(b);

Integers are interpreted from the byte array, but text is obtained with
textBlob.substring.

Integers are written little-endian. You just step through and add the bytes
multiplying the first byte by 256 to the zero power, then 256 to the first
power, second, etc. Works real well.

I am not able to figure out date data type, though. Microsoft represents a
date with an eight byte IEEE 64 bit double value. Stuff to the left of the
decimal is date and stuff to the right is time. If I could reconstruct the
double value, changing to a java date would be easy. I cannot reconstruct
the double value, though.

There is the method Double.longBitsToDouble. I converted a string of bytes
into a long, just as any other long, then tried the double conversion. It is
assumed the bits would correspond to IEEE. Did not work. I would like to get
such a method to work rather than muck with individual bits.

/* Public Type ProcRec
LastScanTime As Date 000-007
NextExecuteTime As Date 008-015
NextExecuteTimeSet As Integer 016-017
MinutesToRun As Long 018-021
TaskID As Double 022-029
ID As String * 4 030-033
ProcScheduler As String * 1 034-034
ProcName As String * 30 035-064
Active As Integer 065-066
PassArguments As Integer 067-068
OwnSection As String * 4 069-072
OwnUnit As String * 4 073-076
Executable As String * 256 077-332
ExecutableType As Integer 333-334
TextOffset(60, 1) As Long 335-822
End Type */

David McDivitt · Sep 16, 2005

I am posting my solution since I hate to find problems in newsgroups without
answers.

A file is read into a byte array. Beginning at the offset in the array where
the date value should start, bytes are added together, multiplying each by
256 to whatever power. The first byte is 256 to the zero power (or one), the
second is 256 to the first power (or 256), then the second, etc. After doing
eight bytes a long is extracted from the file. The method
Double.longBitsToDouble is used to get the double value, since the data was
originally saved in IEEE 64 bit format. The double value at this point will
be the same as the Microsoft double value, and will be the same as the
Microsoft date data type when viewed as a double. The beginning reference
point for Microsoft is 12/30/1899. Java is 01/01/1970. That is a difference
of 25569 days. That amount must be subtracted from the double value
extracted. If the double is then multiplied by 86400000, which is the number
of milliseconds in a day, that is the java equivalent to a Microsoft VB6
date data type.

From: David McDivitt <[email protected]>
Date: Fri, 16 Sep 2005 08:53:26 -0500
Lines: 77

From: Roedy Green <[email protected]>
Date: Thu, 15 Sep 2005 18:51:35 GMT
Lines: 32

The fixed length data structure is written at the front of the file as a
header. The remainder of the file contains variable length text, with
offsets and lengths contained in the TextOffset array. TextOffset is zero
based having 61 x 2 elements.

Click to expand...

Using old io, you could read the first section using DataInputStream
or possible LEDataInputStream or RandomAccessFile. see
http://mindprod.com/products1.html#LEDATASTREAM

You did not say if the offsets were big endian binary, little endian
binary or text.

Then you can either read the rest of the file with RandomAccessFile
reading byte[] which you after reading convert to String with the
appropriate encoding. See http://mindprod.com/jgloss/encoding.html

Alternatively, you read the entire fire into RAM as a byte array,
extract bits of it and convert to the form you want.

For details of how to do any of that i/o see
http://mindprod.com/applets/fileio.html

I have no experience with this, but you can probably also handle it
with nio mapping the file into RAM and treating it like a big byte
array. The advantage of that approach is you won't read parts of the
file you don't actually need.

Click to expand...

I read the whole file at one shot with FileInputStream into a byte array.
Then I convert the entire byte array to a string with
String textBlob = new String(b);

Integers are interpreted from the byte array, but text is obtained with
textBlob.substring.

Integers are written little-endian. You just step through and add the bytes
multiplying the first byte by 256 to the zero power, then 256 to the first
power, second, etc. Works real well.

I am not able to figure out date data type, though. Microsoft represents a
date with an eight byte IEEE 64 bit double value. Stuff to the left of the
decimal is date and stuff to the right is time. If I could reconstruct the
double value, changing to a java date would be easy. I cannot reconstruct
the double value, though.

There is the method Double.longBitsToDouble. I converted a string of bytes
into a long, just as any other long, then tried the double conversion. It is
assumed the bits would correspond to IEEE. Did not work. I would like to get
such a method to work rather than muck with individual bits.

/* Public Type ProcRec
LastScanTime As Date 000-007
NextExecuteTime As Date 008-015
NextExecuteTimeSet As Integer 016-017
MinutesToRun As Long 018-021
TaskID As Double 022-029
ID As String * 4 030-033
ProcScheduler As String * 1 034-034
ProcName As String * 30 035-064
Active As Integer 065-066
PassArguments As Integer 067-068
OwnSection As String * 4 069-072
OwnUnit As String * 4 073-076
Executable As String * 256 077-332
ExecutableType As Integer 333-334
TextOffset(60, 1) As Long 335-822
End Type */

Nigel Wade · Sep 19, 2005

David said:
I am posting my solution since I hate to find problems in newsgroups without
answers.

A file is read into a byte array. Beginning at the offset in the array where
the date value should start, bytes are added together, multiplying each by
256 to whatever power. The first byte is 256 to the zero power (or one), the
second is 256 to the first power (or 256), then the second, etc. After doing
eight bytes a long is extracted from the file. The method
Double.longBitsToDouble is used to get the double value, since the data was
originally saved in IEEE 64 bit format. The double value at this point will
be the same as the Microsoft double value, and will be the same as the
Microsoft date data type when viewed as a double. The beginning reference
point for Microsoft is 12/30/1899. Java is 01/01/1970. That is a difference
of 25569 days. That amount must be subtracted from the double value
extracted. If the double is then multiplied by 86400000, which is the number
of milliseconds in a day, that is the java equivalent to a Microsoft VB6
date data type.

After you've read your file into the byte array, why not use ByteBuffer?
This allows you to access the data within the byte array, and read it as
big- or little-endian. You can get int, long, double etc. values from the
byte array.

Re-inventing wheels is not a productive way to spend your time.

Roedy Green · Sep 20, 2005

There is the method Double.longBitsToDouble. I converted a string of bytes
into a long, just as any other long, then tried the double conversion.

it was a little-endian long?

Date in a double seems odd usually it is some sort of long..

Here is what I learned about the dates on Windows files when I wrote
the JNI code FileTimes see
http://mindprod.com/products1.html#FILETIMES

Java timestamps use 64-bit milliseconds since 1970 GMT. Windows
timestamps use 64-bit value representing the number of 100-nanosecond
intervals since January 1, 1601. This is the difference between
January 1 1601 and January 1 1970 in milliseconds. This magic number
came from com.mindprod.common11.TestDate. Done according to
Gregorian Calendar, no correction for 1752 Sep 2 Wednesday was
followed immediately by 1752 Sep 14 Thursday dropping 12 days. Also
according to http://gcc.gnu.org/ml/java-patches/2003-q1/msg00565.html

private static long diffInMillis = 11644473600000L;

long javaTime = ( windowsTime / 10000 ) - diffInMillis;

Roedy Green · Sep 20, 2005

After you've read your file into the byte array, why not use ByteBuffer?
This allows you to access the data within the byte array, and read it as
big- or little-endian. You can get int, long, double etc. values from the
byte array.

If you can't use Java 1.4+, you can use LEDataStream that lets you
read little-endian binary data, bytes etc.
Much of the logic you are describing flipping bytes around, using
longBitsToDouble etc are all part of LEDataInputStream.

Your logic will be a lot cleaner that using all kinds of magic
offsets.

Cyrillic text from file - set utf8 in cmd, unknown characters output anyway	0	Nov 11, 2022
Tasks	1	Nov 29, 2022
Another topic on how to read a binary file.	25	May 20, 2007
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
reading bytes from a text file	2	Apr 6, 2008
Collect Excel Data from Website	5	Apr 30, 2022
Can't solve problems! please Help	0	Sep 26, 2022
Just started coding and im stuck on a lesson?	1	Oct 30, 2022

read bytes from file

David McDivitt

David McDivitt

christian.bongiorno

Roedy Green

Roedy Green

David McDivitt

David McDivitt

Nigel Wade

Roedy Green

Roedy Green

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads