Help with FileInputStream and DataInputStream - porting c++ fread function into Java

Discussion in 'Java' started by Patrick, Jul 13, 2004.

  1. Patrick

    Patrick Guest

    Hello all!
    I am porting an application from C++ to Java and have run into a
    problem using the DataInputStream reader object. The file I am trying
    to read in is anywhere from 20 to 60 MB and has a short (25 lines or
    so) ASCII text "header". The file structure is a double dimensioned
    array of objects. The ASCII header defines how many "columns" (the
    first array index) there will be in the file. After the ASCII header,
    the first value is an integer that contains the number of objects in
    the first column. You are intended to read this many objects in, and
    then the next number will be an integer containing the number of
    objects in the next column. And so on and so forth. Each "object"
    has, basically, three doubles, a long integer, and a 4 character
    array.
    My problem comes when reading the first binary number, an integer
    containing the number of objects in the first column. It reads
    without throwing an exception, but if I print this number to the
    console it ends up being 10 million something, when I know that it
    should be no more than 1000. My code is basically as follows

    File theFile = new File(filename);
    if (theFile.canRead()) {
    FileInputStream fis = new FileInputStream(theFile);
    BufferedReader fileReader =
    new BufferedReader(new InputStreamReader(fis));

    //use BufferedReader fileReader object to read in ASCII header
    (snipped)
    //this part is working swimmingly

    DataInputStream dataReader = new DataInputStream(fis);
    //I assume that this dataReader is "pointing" to the same place
    in the
    //file that the BufferedRead ended on, not, say, at the
    beginning of the
    //file or something like that. If it is based on the same
    stream,
    //can't a stream just have one location? Maybe I am too used to
    C++

    //This is the first read, mentioned above. I can't figure out
    why
    //its reading in 10156179 when it should be getting around
    900-1000!
    try {
    numPoints = dataReader.readInt();
    //Isn't this the same as:
    //fread(&numPoints, sizeof(int), 1, fp);
    //in C++, where we are reading 1 integer sized binary section
    //of the file, and storing it into the integer array
    numPoints?

    System.out.println(numPoints);
    } catch (EOFException e) {
    System.out.println("Fewer columns than expected (V2) ( + " +
    i +
    " < " + mParams.numColumns + ")");
    mParams.numColumns = i;
    break;
    }

    //Since I do this later:
    data = new Rtpi[numPoints];
    //and am trying to allocate 10 million of these objects, I
    eventually run
    //out of memory/Java VM heap space, and it throws a
    //java.lang.OutOfMemory error. Not too surprsing I guess

    //end of non-working code

    So my main problem/misunderstanding is on how to use the
    DataInputStream reader object. I have read through the Java API for
    this class, but don't really get it too much. Any and all help would
    be much appreciated. I desperately need this code to work for my
    M.Sc. Dissertation.

    TIA,

    -Patrick

    Please send any responses to me directly as well as to the newsgroup.
     
    Patrick, Jul 13, 2004
    #1
    1. Advertising

  2. [ invalid group comp.lang.java.developer removed ]

    On 12 Jul 2004 17:00:53 -0700, Patrick wrote:
    > My problem comes when reading the first binary number, an integer
    > containing the number of objects in the first column. It reads
    > without throwing an exception, but if I print this number to the
    > console it ends up being 10 million something, when I know that it
    > should be no more than 1000.


    When you read numbers in binary format, the reader and writer need to
    agree on the endianness of the representation (i.e. which byte comes
    first).

    The Java standard streams assume network byte order (big endian), and
    so should your C program. It should be using macros like htonl() and
    htons() to write in network byte order.

    If your C application writes values using a mechanism like this:

    foo_t foo;
    write(fd, &foo, sizeof(foo));

    and you run it on a little-endian platform, then you will see exactly
    the problem you've described. Note that such an application will fail
    to read its own data if it runs on a platform with a different byte
    order.

    If the C program is beyond your control, there are third party Java
    classes for reading in little endian, or you can roll your own by
    reading one byte at a time, then shifting and adding to recreate the
    original values.

    /gordon

    --
    [ do not email me copies of your followups ]
    g o r d o n + n e w s @ b a l d e r 1 3 . s e
     
    Gordon Beaton, Jul 13, 2004
    #2
    1. Advertising

  3. Patrick

    Roedy Green Guest

    On 13 Jul 2004 08:03:05 +0200, Gordon Beaton <> wrote or
    quoted :

    >and you run it on a little-endian platform, then you will see exactly
    >the problem you've described. Note that such an application will fail
    >to read its own data if it runs on a platform with a different byte


    >If the C program is beyond your control, there are third party Java
    >classes for reading in little endian, or you can roll your own by
    >reading one byte at a time, then shifting and adding to recreate the
    >original values.


    see http://mindprod.com/jgloss/ledatastream.html
    http://mindprod.com/jgloss/endian.html

    you can read more than one byte at a time, but you do have to
    rearrange a byte at a time.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Jul 13, 2004
    #3
  4. Patrick

    Nigel Wade Guest

    On Tue, 13 Jul 2004 07:09:07 +0000, Roedy Green wrote:

    > On 13 Jul 2004 08:03:05 +0200, Gordon Beaton <> wrote or
    > quoted :
    >
    >>and you run it on a little-endian platform, then you will see exactly
    >>the problem you've described. Note that such an application will fail
    >>to read its own data if it runs on a platform with a different byte

    >
    >>If the C program is beyond your control, there are third party Java
    >>classes for reading in little endian, or you can roll your own by
    >>reading one byte at a time, then shifting and adding to recreate the
    >>original values.

    >
    > see http://mindprod.com/jgloss/ledatastream.html
    > http://mindprod.com/jgloss/endian.html
    >
    > you can read more than one byte at a time, but you do have to
    > rearrange a byte at a time.


    You can use ByteBuffer. Load the data, set the byte order of
    the buffer and then read from it. It has methods to read all the primitive
    types.


    --
    Nigel Wade, System Administrator, Space Plasma Physics Group,
    University of Leicester, Leicester, LE1 7RH, UK
    E-mail :
    Phone : +44 (0)116 2523548, Fax : +44 (0)116 2523555
     
    Nigel Wade, Jul 13, 2004
    #4
  5. Re: Help with FileInputStream and DataInputStream - porting c++ freadfunction into Java

    Patrick wrote:
    > Hello all!
    > I am porting an application from C++ to Java and have run into a
    > problem using the DataInputStream reader object. The file I am trying
    > to read in is anywhere from 20 to 60 MB and has a short (25 lines or
    > so) ASCII text "header". The file structure is a double dimensioned
    > array of objects. The ASCII header defines how many "columns" (the
    > first array index) there will be in the file. After the ASCII header,
    > the first value is an integer that contains the number of objects in
    > the first column. You are intended to read this many objects in, and
    > then the next number will be an integer containing the number of
    > objects in the next column. And so on and so forth. Each "object"
    > has, basically, three doubles, a long integer, and a 4 character
    > array.


    Patrick,

    You received many excellent responses concerning the endian-ness of the
    data. In addition to that, you should make sure that the length of the
    data agrees as well. E.g., Java uses 32-bit ints on every platform,
    does that agree with your C programs, etc.

    HTH,
    Ray

    --
    XML is the programmer's duct tape.
     
    Raymond DeCampo, Jul 13, 2004
    #5
  6. Patrick

    Patrick Guest

    Hello again,
    I think I figured my problem out! Although the Java API states
    that DataInputStream "lets an application read primitive Java data
    types from an underlying input stream in a machine-independent way",
    it doesn't mean that it can actually do it. Rather, it messes with
    your mind for awhile until you figure out that if you're reading in
    files from a machine built on a little endian architecture (virtually
    all PCs - intel and all compatible) that weren't written by a Java
    DataOutputStream object, then you'll be pretty much screwed - there is
    no convenient method by which to do this in the Java API. They could
    definitely stand to clear this up in the API, and also include the
    following classes: LEDataInputStream, LEDataOutputStream. They are
    available here and are my life savers right now:

    http://mindprod.com/jgloss/endian.html

    Thanks to Roedy Green!


    -Patrick
     
    Patrick, Jul 13, 2004
    #6
  7. Patrick

    Roedy Green Guest

    Roedy Green, Jul 13, 2004
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. David Zimmerman
    Replies:
    1
    Views:
    2,127
    Harald Kirsch
    Jul 22, 2003
  2. Roedy Green
    Replies:
    6
    Views:
    3,310
    Steve Horsley
    Jul 23, 2003
  3. Krick
    Replies:
    2
    Views:
    14,300
    Marco Schmidt
    Aug 28, 2003
  4. smartminion
    Replies:
    3
    Views:
    2,491
    smartminion
    Aug 22, 2007
  5. Arne Vajhøj
    Replies:
    0
    Views:
    433
    Arne Vajhøj
    Apr 3, 2012
Loading...

Share This Page