converting char to int (reading from a binary file)

Discussion in 'C++' started by itdevries, May 16, 2008.

  1. itdevries

    itdevries Guest

    Hi,
    I'm trying to read some binary data from a file, I've read a few bytes
    of the data into a
    char array with ifstream. Now I know that the first 4 bytes in the
    char array represent
    an integer. How do I go about converting the elements to an integer?
    regards, Igor
     
    itdevries, May 16, 2008
    #1
    1. Advertising

  2. itdevries

    sebastian Guest

    basically:

    int i = *( ( int * )ptr )
     
    sebastian, May 16, 2008
    #2
    1. Advertising

  3. itdevries

    itdevries Guest

    On May 16, 8:43 pm, Victor Bazarov <> wrote:
    > sebastian wrote:
    > > basically:

    >
    > > int i = *( ( int * )ptr )

    >
    > That is a very bad idea, 'ptr' may not be correctly aligned. It would
    > be much better to supply the address of 'i' to the procedure that reads
    > the bytes, something like
    >
    > int i = 0;
    > myfile.read(&i, sizeof(int));
    >
    > V
    > --
    > Please remove capital 'A's when replying by e-mail
    > I do not respond to top-posted replies, please don't ask


    thanks for your response. I'm not 100% sure I understand what you mean
    by correctly aligned, would you mind clarifying? I also can't get your
    code snippet to work; I get the following compile error:

    "Error 1 error C2664: 'std::basic_istream<_Elem,_Traits>::read' :
    cannot convert parameter 1 from 'int' to 'char *' "

    kind regards,
    Igor
     
    itdevries, May 16, 2008
    #3
  4. itdevries

    itdevries Guest

    On May 16, 8:36 pm, sebastian <> wrote:
    > basically:
    >
    > int i = *( ( int * )ptr )


    thanks for your response, it does the trick...
    Igor
     
    itdevries, May 16, 2008
    #4
  5. itdevries

    itdevries Guest

    On May 16, 9:50 pm, Victor Bazarov <> wrote:
    > itdevries wrote:
    > > On May 16, 8:43 pm, Victor Bazarov <> wrote:
    > >> sebastian wrote:
    > >>> basically:
    > >>> int i = *( ( int * )ptr )
    > >> That is a very bad idea, 'ptr' may not be correctly aligned. It would
    > >> be much better to supply the address of 'i' to the procedure that reads
    > >> the bytes, something like

    >
    > >> int i = 0;
    > >> myfile.read(&i, sizeof(int));

    >
    > >> V
    > >> --
    > >> Please remove capital 'A's when replying by e-mail
    > >> I do not respond to top-posted replies, please don't ask

    >
    > > thanks for your response. I'm not 100% sure I understand what you mean
    > > by correctly aligned, would you mind clarifying?

    >
    > On some hardware objects of certain sizes (like 'int') need to exist in
    > memory at addresses with certain properties, like divisible by the size
    > of the object, for example. In such systems a 'char' can lie on the odd
    > byte boundary, which may not necessarily be acceptable for an 'int' that
    > need an address divisible by, say, 4. Attempt to access the object (by
    > dereferencing the pointer formed by casting a pointer to char) can
    > trigger a hardware exception.
    >
    > > I also can't get your

    >
    > > code snippet to work; I get the following compile error:

    >
    > > "Error 1 error C2664: 'std::basic_istream<_Elem,_Traits>::read' :
    > > cannot convert parameter 1 from 'int' to 'char *' "

    >
    > You probably missed the '&'. Also, to convert a pointer to 'int' to a
    > pointer to 'char' you may need to use 'reinterpret_cast' (which I didn't
    > use).
    >
    > V
    > --
    > Please remove capital 'A's when replying by e-mail
    > I do not respond to top-posted replies, please don't ask


    Victor,
    Many thanks for taking the time to explain!
    I think I understand what you're saying, do you know what the chances
    are of this happening
    on a win32 platform?
    regards,
    Igor
     
    itdevries, May 16, 2008
    #5
  6. itdevries

    Jim Langston Guest

    itdevries wrote:
    > On May 16, 8:36 pm, sebastian <> wrote:
    >> basically:
    >>
    >> int i = *( ( int * )ptr )

    >
    > thanks for your response, it does the trick...
    > Igor


    Be aware that depending on your OS this may break at times, not at others,
    or always work. It depends on your OS mainly and if it requires intergers
    to by specifcally byte aligned. I know that this will work on Windows
    systems fine. I understand that wrong alignment it will break on Sun
    systems.

    If this is platform specific for you and you will never run it on another
    platform and you're sure that your system won't break on byte misalligned
    integers it should be fine to use. If you ever plan on running the code on
    another system then you'll need to do it another way.


    --
    Jim Langston
     
    Jim Langston, May 16, 2008
    #6
  7. itdevries

    James Kanze Guest

    On 16 mai, 20:43, Victor Bazarov <> wrote:
    > sebastian wrote:
    > > basically:


    > > int i = *( ( int * )ptr )


    > That is a very bad idea, 'ptr' may not be correctly aligned.


    Not to mention issues of size and representation. (As an
    extreme case, I know of one machine which uses 6 byte signed
    magnitude ints.)

    The original poster didn't begin to give enough information with
    regards to the input format for us to say, but if it's a
    standard Internet protocol, then you read an int with something
    like:

    int32_t
    getInt( std::istream& source )
    {
    uint32_t result = source.get() << 24 ;
    result |= source.get() << 16 ;
    result |= source.get() << 8 ;
    result |= source.get() ;
    return result ;
    }

    Except that you'd add some error handling. (And of course, if
    you don't have int32_t and uint32_t---which are only present if
    the hardware supports them directly, then the conversion from
    unsigned to signed becomes more difficult as well.)

    > It would be much better to supply the address of 'i' to the
    > procedure that reads the bytes, something like


    > int i = 0;
    > myfile.read(&i, sizeof(int));


    That doesn't work any better, really.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, May 16, 2008
    #7
  8. itdevries

    James Kanze Guest

    On 16 mai, 23:16, "Jim Langston" <> wrote:
    > itdevries wrote:
    > > On May 16, 8:36 pm, sebastian <> wrote:
    > >> basically:


    > >> int i = *( ( int * )ptr )


    > > thanks for your response, it does the trick...


    > Be aware that depending on your OS this may break at times,
    > not at others, or always work. It depends on your OS mainly
    > and if it requires intergers to by specifcally byte aligned.
    > I know that this will work on Windows systems fine. I
    > understand that wrong alignment it will break on Sun systems.


    > If this is platform specific for you and you will never run it
    > on another platform and you're sure that your system won't
    > break on byte misalligned integers it should be fine to use.
    > If you ever plan on running the code on another system then
    > you'll need to do it another way.


    It will also fail on an Intel if the int's are in the standard
    Internet format. In general, you can only count on it working
    if you are reading and writing from the same run of the same
    program---I've seen cases where just recompiling with a newer
    version of the compiler made it fail.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, May 16, 2008
    #8
  9. itdevries

    sebastian Guest

    > That doesn't work any better, really.

    that's because istream::read expects a pionter to char (you must cast
    it). but as others have already pointed out, there are many problems
    with these sorts of casts. there are serialization libraries available
    (such as boost::serialize) designed specifically for this purpose, in
    case you really want to get the job done right...
     
    sebastian, May 17, 2008
    #9
  10. itdevries

    Guest

    On May 16, 5:26 pm, sebastian <> wrote:
    > > That doesn't work any better, really.

    >
    > that's because istream::read expects a pionter to char (you must cast
    > it). but as others have already pointed out, there are many problems
    > with these sorts of casts. there are serialization libraries available
    > (such as boost::serialize) designed specifically for this purpose, in
    > case you really want to get the job done right...


    I agree B.Ser will produce correct results in this case, but it may
    not
    produce those results efficiently - http://webEbenezer.net/comparison.html

    Brian Wood
    Ebenezer Enterprises
    www.webEbenezer.net
     
    , May 17, 2008
    #10
  11. itdevries

    James Kanze Guest

    On 17 mai, 01:01, "Victor Bazarov" <> wrote:
    > James Kanze wrote:
    > > On 16 mai, 20:43, Victor Bazarov <> wrote:
    > >> int i = 0;
    > >> myfile.read(&i, sizeof(int));


    > > That doesn't work any better, really.


    > Do tell.


    Do tell what? To begin with, it won't compile without a
    reinterpret_cast (which is a very good sign that something is
    wrong with it). And it still ignores all issues of size and
    representation.

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, May 17, 2008
    #11
  12. itdevries

    itdevries Guest

    On May 16, 11:31 pm, James Kanze <> wrote:
    > On 16 mai, 20:43, Victor Bazarov <> wrote:
    >
    > > sebastian wrote:
    > > > basically:
    > > > int i = *( ( int * )ptr )

    > > That is a very bad idea, 'ptr' may not be correctly aligned.

    >
    > Not to mention issues of size and representation. (As an
    > extreme case, I know of one machine which uses 6 byte signed
    > magnitude ints.)
    >
    > The original poster didn't begin to give enough information with
    > regards to the input format for us to say, but if it's a
    > standard Internet protocol, then you read an int with something
    > like:
    >
    > int32_t
    > getInt( std::istream& source )
    > {
    > uint32_t result = source.get() << 24 ;
    > result |= source.get() << 16 ;
    > result |= source.get() << 8 ;
    > result |= source.get() ;
    > return result ;
    > }
    >
    > Except that you'd add some error handling. (And of course, if
    > you don't have int32_t and uint32_t---which are only present if
    > the hardware supports them directly, then the conversion from
    > unsigned to signed becomes more difficult as well.)
    >
    > > It would be much better to supply the address of 'i' to the
    > > procedure that reads the bytes, something like
    > > int i = 0;
    > > myfile.read(&i, sizeof(int));

    >
    > That doesn't work any better, really.
    >
    > --
    > James Kanze (GABI Software) email:
    > Conseils en informatique orientée objet/
    > Beratung in objektorientierter Datenverarbeitung
    > 9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34


    Hi James,
    thanks for taking the time to respond, I really appreciate it.

    The code is intended to read data from a file generated by a fortran
    program
    and only ever will be run on windows machines. I don't need a mega
    portable/robust app, just need a program that will extract data for a
    particular
    version of the file. If it runs on windows 2k/xp/vista for most
    processors then that's
    good enough for the time being. At this point I prefer not to make
    life too
    difficult for myself and would prefer to use the typecasting trick
    proposed
    by sebastian. Do you think that it's safe "enough"?

    As an added difficulty the fortran file is "record" oriented not
    "stream" oriented
    (i don't know if I'm using the right official terminology) which means
    that
    there's some peculiarity about how I have to read the data; some
    records
    contain only one int, others contain more. Since all the records are
    4*4 bytes long
    it means I have to skip around over empty records/control info to read
    everything.

    kind regards,
    Igor
     
    itdevries, May 17, 2008
    #12
  13. itdevries

    James Kanze Guest

    On 17 mai, 11:52, itdevries <> wrote:
    > On May 16, 11:31 pm, James Kanze <> wrote:
    > > On 16 mai, 20:43, Victor Bazarov <> wrote:


    > > > sebastian wrote:
    > > > > basically:
    > > > > int i = *( ( int * )ptr )
    > > > That is a very bad idea, 'ptr' may not be correctly aligned.


    > > Not to mention issues of size and representation. (As an
    > > extreme case, I know of one machine which uses 6 byte signed
    > > magnitude ints.)


    > > The original poster didn't begin to give enough information with
    > > regards to the input format for us to say, but if it's a
    > > standard Internet protocol, then you read an int with something
    > > like:


    > > int32_t
    > > getInt( std::istream& source )
    > > {
    > > uint32_t result = source.get() << 24 ;
    > > result |= source.get() << 16 ;
    > > result |= source.get() << 8 ;
    > > result |= source.get() ;
    > > return result ;
    > > }


    > > Except that you'd add some error handling. (And of course, if
    > > you don't have int32_t and uint32_t---which are only present if
    > > the hardware supports them directly, then the conversion from
    > > unsigned to signed becomes more difficult as well.)


    > > > It would be much better to supply the address of 'i' to the
    > > > procedure that reads the bytes, something like
    > > > int i = 0;
    > > > myfile.read(&i, sizeof(int));


    > > That doesn't work any better, really.


    > thanks for taking the time to respond, I really appreciate it.


    > The code is intended to read data from a file generated by a
    > fortran program and only ever will be run on windows machines.
    > I don't need a mega portable/robust app, just need a program
    > that will extract data for a particular version of the file.


    Well, the first thing you need is a specification of how the
    Fortran program wrote the data:).

    Beyond that: you may have raised a case that I've generally
    forgotten when talking about reading and writing binary data:
    migration. You've got some old data, written in some format
    (binary dump of the bytes in memory?), and you want to migrate
    it to an established format for a new program. You're only
    going to read the old data once, so worrying about what might
    happen in some future version of the compiler, or some future
    machine, is really irrelevant. In this case, if the code was
    written using a simple binary dump of the bytes in memory, and
    you can compile with something for which you're sure that the
    binary images will be identical (far from obvious if it was
    written in Fortran and you're reading it in C++), then the
    simplest solution is to read the data into a "byte buffer" (an
    std::vector< unsigned char > is what I usually use for this),
    and memcpy the individual elements out of it. (This solves the
    alignment problem, which may or may not be present, depending on
    how the bytes were written.)

    > If it runs on windows 2k/xp/vista for most processors then
    > that's good enough for the time being. At this point I prefer
    > not to make life too difficult for myself and would prefer to
    > use the typecasting trick proposed by sebastian. Do you think
    > that it's safe "enough"?


    If you're only reading the data once, maybe. You still have to
    establish the format used by Fortran when the data was written,
    and you have to worry about alignment issues (although that is
    normally not a problem on an Intel).

    > As an added difficulty the fortran file is "record" oriented
    > not "stream" oriented (i don't know if I'm using the right
    > official terminology) which means that there's some
    > peculiarity about how I have to read the data; some records
    > contain only one int, others contain more. Since all the
    > records are 4*4 bytes long it means I have to skip around over
    > empty records/control info to read everything.


    In other words, if I understand correctly, the file written by
    Fortran has the following format:

    -- It is a sequence of records.
    -- Each record consists of an integers, containing the length
    of the record, or some information allowing you to establish
    the length (record type, etc.)
    -- Any further information in the record takes the form of a
    sequence of integers.

    We still don't know what format the integers are in, of course,
    except that they are four bytes, but that's a good start. The
    probability that they aren't two's complement seems very, very
    small in practice, so really, the only issue is byte order.
    Still, you'll have to find out how they were written in the
    Fortran program, and then find out what that means for the
    format on disk (from the Fortran documentation).

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, May 18, 2008
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Schnoffos
    Replies:
    2
    Views:
    1,219
    Martien Verbruggen
    Jun 27, 2003
  2. trey

    newbie: char* int and char *int

    trey, Sep 10, 2003, in forum: C Programming
    Replies:
    7
    Views:
    405
    Irrwahn Grausewitz
    Sep 10, 2003
  3. Hal Styli
    Replies:
    14
    Views:
    1,646
    Old Wolf
    Jan 20, 2004
  4. itdevries
    Replies:
    17
    Views:
    4,311
    James Kanze
    May 31, 2008
  5. someone
    Replies:
    37
    Views:
    2,521
    Joshua Maurice
    Oct 18, 2011
Loading...

Share This Page