converting char to int (reading from a binary file)

I

itdevries

Hi,
I'm trying to read some binary data from a file, I've read a few bytes
of the data into a
char array with ifstream. Now I know that the first 4 bytes in the
char array represent
an integer. How do I go about converting the elements to an integer?
regards, Igor
 
I

itdevries

That is a very bad idea, 'ptr' may not be correctly aligned. It would
be much better to supply the address of 'i' to the procedure that reads
the bytes, something like

int i = 0;
myfile.read(&i, sizeof(int));

V

thanks for your response. I'm not 100% sure I understand what you mean
by correctly aligned, would you mind clarifying? I also can't get your
code snippet to work; I get the following compile error:

"Error 1 error C2664: 'std::basic_istream<_Elem,_Traits>::read' :
cannot convert parameter 1 from 'int' to 'char *' "

kind regards,
Igor
 
I

itdevries

On some hardware objects of certain sizes (like 'int') need to exist in
memory at addresses with certain properties, like divisible by the size
of the object, for example. In such systems a 'char' can lie on the odd
byte boundary, which may not necessarily be acceptable for an 'int' that
need an address divisible by, say, 4. Attempt to access the object (by
dereferencing the pointer formed by casting a pointer to char) can
trigger a hardware exception.




You probably missed the '&'. Also, to convert a pointer to 'int' to a
pointer to 'char' you may need to use 'reinterpret_cast' (which I didn't
use).

V

Victor,
Many thanks for taking the time to explain!
I think I understand what you're saying, do you know what the chances
are of this happening
on a win32 platform?
regards,
Igor
 
J

Jim Langston

itdevries said:
thanks for your response, it does the trick...
Igor

Be aware that depending on your OS this may break at times, not at others,
or always work. It depends on your OS mainly and if it requires intergers
to by specifcally byte aligned. I know that this will work on Windows
systems fine. I understand that wrong alignment it will break on Sun
systems.

If this is platform specific for you and you will never run it on another
platform and you're sure that your system won't break on byte misalligned
integers it should be fine to use. If you ever plan on running the code on
another system then you'll need to do it another way.
 
J

James Kanze

That is a very bad idea, 'ptr' may not be correctly aligned.

Not to mention issues of size and representation. (As an
extreme case, I know of one machine which uses 6 byte signed
magnitude ints.)

The original poster didn't begin to give enough information with
regards to the input format for us to say, but if it's a
standard Internet protocol, then you read an int with something
like:

int32_t
getInt( std::istream& source )
{
uint32_t result = source.get() << 24 ;
result |= source.get() << 16 ;
result |= source.get() << 8 ;
result |= source.get() ;
return result ;
}

Except that you'd add some error handling. (And of course, if
you don't have int32_t and uint32_t---which are only present if
the hardware supports them directly, then the conversion from
unsigned to signed becomes more difficult as well.)
It would be much better to supply the address of 'i' to the
procedure that reads the bytes, something like
int i = 0;
myfile.read(&i, sizeof(int));

That doesn't work any better, really.
 
J

James Kanze

Be aware that depending on your OS this may break at times,
not at others, or always work. It depends on your OS mainly
and if it requires intergers to by specifcally byte aligned.
I know that this will work on Windows systems fine. I
understand that wrong alignment it will break on Sun systems.
If this is platform specific for you and you will never run it
on another platform and you're sure that your system won't
break on byte misalligned integers it should be fine to use.
If you ever plan on running the code on another system then
you'll need to do it another way.

It will also fail on an Intel if the int's are in the standard
Internet format. In general, you can only count on it working
if you are reading and writing from the same run of the same
program---I've seen cases where just recompiling with a newer
version of the compiler made it fail.
 
S

sebastian

That doesn't work any better, really.

that's because istream::read expects a pionter to char (you must cast
it). but as others have already pointed out, there are many problems
with these sorts of casts. there are serialization libraries available
(such as boost::serialize) designed specifically for this purpose, in
case you really want to get the job done right...
 
C

coal

that's because istream::read expects a pionter to char (you must cast
it). but as others have already pointed out, there are many problems
with these sorts of casts. there are serialization libraries available
(such as boost::serialize) designed specifically for this purpose, in
case you really want to get the job done right...

I agree B.Ser will produce correct results in this case, but it may
not
produce those results efficiently - http://webEbenezer.net/comparison.html

Brian Wood
Ebenezer Enterprises
www.webEbenezer.net
 
J

James Kanze


Do tell what? To begin with, it won't compile without a
reinterpret_cast (which is a very good sign that something is
wrong with it). And it still ignores all issues of size and
representation.
 
I

itdevries

Not to mention issues of size and representation. (As an
extreme case, I know of one machine which uses 6 byte signed
magnitude ints.)

The original poster didn't begin to give enough information with
regards to the input format for us to say, but if it's a
standard Internet protocol, then you read an int with something
like:

int32_t
getInt( std::istream& source )
{
uint32_t result = source.get() << 24 ;
result |= source.get() << 16 ;
result |= source.get() << 8 ;
result |= source.get() ;
return result ;
}

Except that you'd add some error handling. (And of course, if
you don't have int32_t and uint32_t---which are only present if
the hardware supports them directly, then the conversion from
unsigned to signed becomes more difficult as well.)


That doesn't work any better, really.

--
James Kanze (GABI Software) email:[email protected]
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Hi James,
thanks for taking the time to respond, I really appreciate it.

The code is intended to read data from a file generated by a fortran
program
and only ever will be run on windows machines. I don't need a mega
portable/robust app, just need a program that will extract data for a
particular
version of the file. If it runs on windows 2k/xp/vista for most
processors then that's
good enough for the time being. At this point I prefer not to make
life too
difficult for myself and would prefer to use the typecasting trick
proposed
by sebastian. Do you think that it's safe "enough"?

As an added difficulty the fortran file is "record" oriented not
"stream" oriented
(i don't know if I'm using the right official terminology) which means
that
there's some peculiarity about how I have to read the data; some
records
contain only one int, others contain more. Since all the records are
4*4 bytes long
it means I have to skip around over empty records/control info to read
everything.

kind regards,
Igor
 
J

James Kanze

thanks for taking the time to respond, I really appreciate it.
The code is intended to read data from a file generated by a
fortran program and only ever will be run on windows machines.
I don't need a mega portable/robust app, just need a program
that will extract data for a particular version of the file.

Well, the first thing you need is a specification of how the
Fortran program wrote the data:).

Beyond that: you may have raised a case that I've generally
forgotten when talking about reading and writing binary data:
migration. You've got some old data, written in some format
(binary dump of the bytes in memory?), and you want to migrate
it to an established format for a new program. You're only
going to read the old data once, so worrying about what might
happen in some future version of the compiler, or some future
machine, is really irrelevant. In this case, if the code was
written using a simple binary dump of the bytes in memory, and
you can compile with something for which you're sure that the
binary images will be identical (far from obvious if it was
written in Fortran and you're reading it in C++), then the
simplest solution is to read the data into a "byte buffer" (an
std::vector< unsigned char > is what I usually use for this),
and memcpy the individual elements out of it. (This solves the
alignment problem, which may or may not be present, depending on
how the bytes were written.)
If it runs on windows 2k/xp/vista for most processors then
that's good enough for the time being. At this point I prefer
not to make life too difficult for myself and would prefer to
use the typecasting trick proposed by sebastian. Do you think
that it's safe "enough"?

If you're only reading the data once, maybe. You still have to
establish the format used by Fortran when the data was written,
and you have to worry about alignment issues (although that is
normally not a problem on an Intel).
As an added difficulty the fortran file is "record" oriented
not "stream" oriented (i don't know if I'm using the right
official terminology) which means that there's some
peculiarity about how I have to read the data; some records
contain only one int, others contain more. Since all the
records are 4*4 bytes long it means I have to skip around over
empty records/control info to read everything.

In other words, if I understand correctly, the file written by
Fortran has the following format:

-- It is a sequence of records.
-- Each record consists of an integers, containing the length
of the record, or some information allowing you to establish
the length (record type, etc.)
-- Any further information in the record takes the form of a
sequence of integers.

We still don't know what format the integers are in, of course,
except that they are four bytes, but that's a good start. The
probability that they aren't two's complement seems very, very
small in practice, so really, the only issue is byte order.
Still, you'll have to find out how they were written in the
Fortran program, and then find out what that means for the
format on disk (from the Fortran documentation).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top