reading binary file into memory. Converting from char to uint32,float, double, ASCII strings etc (st

S

someone

Hi,

Here's my program:

==========================
#include <fstream>
#include <iostream>
#include <string>
#include <sstream>
#include <stdio.h>
//#include <boost/cstdint.hpp> // <-- I know this is not the group to
ask in, maybe unnecessary

using namespace std;


int main( int argc, const char* argv[] )
{
ifstream::pos_type fsize;
char * memblock;
ifstream infile;

cout << "Program to read SENSOR file (for FLEX5). Build on: " <<
__DATE__ << " / " << __TIME__ << endl;

infile.open("DELETEME.res", ios::in|ios::binary);

if (infile.is_open())
{
infile.seekg (0, ios::end); // go to end of file
fsize = infile.tellg();
infile.seekg (0, ios::beg); // go to beginning

cout << "Size of file is: " << fsize << " bytes." << endl;
memblock = new char [fsize];

infile.read (memblock, fsize);
infile.close();
//cout << "the complete file content is in memory" << endl;
}
else
cout << "Unable to open file" << endl;

cout << endl; // delimiter before program outputs read data

int R1_begin;
R1_begin = static_cast<typeof R1_begin> (memblock[0]);//, 1,
'uint32',endianNess);
cout << "Value = " << R1_begin << endl;

delete[] memblock;
return 0;
}
==========================

Problem:

R1_begin is the first value read from a binary file. But it's wrong!
It should be 72 but the program prints 49 out to the screen. So I
cannot continue before I get this right. The first few bytes I'm going
to read is:

R1begin = 1 byte times sizeof 'uint32' = 4 bytes
I1_I6 = 6 bytes times sizeof 'uint32' = 6x4 = 24 bytes
TEXT = 40 ASCII characters of type 'char' = 40 bytes
R1end = 1 byte times sizeof 'uint32' = 4 bytes

Total bytes read = 4+24+40+4 = 72 bytes. That's why I must read 72
bytes from the very first byte in the binary file... And I cannot
solve this! :-(

Please help (and please provide hints for reading the next few bytes
and how I should convert from data in memory into different data
types). Later I also need to convert some bytes from memblock[ ] into
floats and doubles...

I'm stuck...

And what about big endian / little endian? I'm confused. Looking
forward to hear from you!
 
I

Ian Collins

Hi,

Here's my program:

==========================
#include<fstream>
#include<iostream>
#include<string>
#include<sstream>
#include<stdio.h>
//#include<boost/cstdint.hpp> //<-- I know this is not the group to
ask in, maybe unnecessary

using namespace std;


int main( int argc, const char* argv[] )
{
ifstream::pos_type fsize;
char * memblock;
ifstream infile;

cout<< "Program to read SENSOR file (for FLEX5). Build on: "<<
__DATE__<< " / "<< __TIME__<< endl;

infile.open("DELETEME.res", ios::in|ios::binary);

if (infile.is_open())
{
infile.seekg (0, ios::end); // go to end of file
fsize = infile.tellg();
infile.seekg (0, ios::beg); // go to beginning

cout<< "Size of file is: "<< fsize<< " bytes."<< endl;
memblock = new char [fsize];

infile.read (memblock, fsize);
infile.close();
//cout<< "the complete file content is in memory"<< endl;
}
else
cout<< "Unable to open file"<< endl;

cout<< endl; // delimiter before program outputs read data

int R1_begin;
R1_begin = static_cast<typeof R1_begin> (memblock[0]);//, 1,
'uint32',endianNess);

You can't just cast an array of char to an int and expect the right
result (unless the data is written in the same byte order as your machine).

The most convenient way to convert the byte order is to use the network
ordering functions (if your platform has them). See ntohl.

And what about big endian / little endian? I'm confused. Looking
forward to hear from you!

Google "endian"
 
S

someone

On 10/16/11 08:00 AM, someone wrote:
   int R1_begin;
   R1_begin = static_cast<typeof R1_begin>  (memblock[0]);//, 1,
'uint32',endianNess);

You can't just cast an array of char to an int and expect the right
result (unless the data is written in the same byte order as your machine).

So how should I do it?
[to begin with, we can just assume that the data is written in the
same byte order and see if that works...]

The most convenient way to convert the byte order is to use the network
ordering functions (if your platform has them).  See ntohl.

Sorry, I have absolutely no idea about any network functions...
I was (pretty) sure that somebody had some code they could show here?
This "ntohl" looks a bit confusing to me - besides that, it's not
standard C++.
Google "endian"

I did it. But what I meant is, what are your thoughts about it, when
reading a binary file like shown in my code?
 
R

Richard Damon

Hi,

Here's my program: ....
int R1_begin;
R1_begin = static_cast<typeof R1_begin> (memblock[0]);//, 1,
'uint32',endianNess);
cout<< "Value = "<< R1_begin<< endl;

delete[] memblock;
return 0;
}
==========================

Problem:

R1_begin is the first value read from a binary file. But it's wrong!
It should be 72 but the program prints 49 out to the screen. So I
cannot continue before I get this right. The first few bytes I'm going
to read is:

R1begin = 1 byte times sizeof 'uint32' = 4 bytes
I1_I6 = 6 bytes times sizeof 'uint32' = 6x4 = 24 bytes
TEXT = 40 ASCII characters of type 'char' = 40 bytes
R1end = 1 byte times sizeof 'uint32' = 4 bytes

Total bytes read = 4+24+40+4 = 72 bytes. That's why I must read 72
bytes from the very first byte in the binary file... And I cannot
solve this! :-(

Please help (and please provide hints for reading the next few bytes
and how I should convert from data in memory into different data
types). Later I also need to convert some bytes from memblock[ ] into
floats and doubles...

I'm stuck...

And what about big endian / little endian? I'm confused. Looking
forward to hear from you!
First, what is the first byte in the file? you are saying that you are
expecting it to be 72 but you are getting 49. Have you looked at the file.

Next you talk about getting R1begin which should be 4 bytes, but you are
setting it to the value of the first byte of the file (memblock[0] is
the first byte of the file) the static_cast to int is unneeded, as char
will convert to int implicitly.

Lastly, you are saying that you are expecting to read 72 bytes at the
beginning of the file, and also that you are expecting to get the value
72 from the beginning of the file, implying that R1begin is a byte count
for the record, but then don't describe the record format depending on
that value at all, and then only look at 1 byte for that info, when you
describe it as a 4 byte number.

If R1_begin ins't a data length field, why are you expecting it to have
the value 72?
 
E

Ebenezer

   int R1_begin;
   R1_begin = static_cast<typeof R1_begin>  (memblock[0]);//,1,
'uint32',endianNess);
You can't just cast an array of char to an int and expect the right
result (unless the data is written in the same byte order as your machine).

So how should I do it?
[to begin with, we can just assume that the data is written in the
same byte order and see if that works...]
The most convenient way to convert the byte order is to use the network
ordering functions (if your platform has them).  See ntohl.

Sorry, I have absolutely no idea about any network functions...
I was (pretty) sure that somebody had some code they could show here?
This "ntohl" looks a bit confusing to me - besides that, it's not
standard C++.

I think these files may be what you're looking for:
http://webEbenezer.net/misc/Formatting.hh
http://webEbenezer.net/misc/ReceiveBuffer.hh

They are part of this archive --
http://webEbenezer.net/misc/direct.tar.bz2 .

Brian Wood
Ebenezer Enterprises
www.webEbenezer.net
 
I

Ian Collins

On 10/16/11 08:00 AM, someone wrote:
int R1_begin;
R1_begin = static_cast<typeof R1_begin> (memblock[0]);//, 1,
'uint32',endianNess);

You can't just cast an array of char to an int and expect the right
result (unless the data is written in the same byte order as your machine).

So how should I do it?
[to begin with, we can just assume that the data is written in the
same byte order and see if that works...]
The most convenient way to convert the byte order is to use the network
ordering functions (if your platform has them). See ntohl.

Sorry, I have absolutely no idea about any network functions...
I was (pretty) sure that somebody had some code they could show here?
This "ntohl" looks a bit confusing to me - besides that, it's not
standard C++.

They are typically macros, just look them up if you want to see how they
work.
I did it. But what I meant is, what are your thoughts about it, when
reading a binary file like shown in my code?

I use the network byte order functions, that's what they are there for.
 
M

Martin Jørgensen

On 10/15/11 3:00 PM, someone wrote: ==


First, what is the first byte in the file? you are saying that you are
expecting it to be 72 but you are getting 49. Have you looked at the file..

Good point. The first four bytes of the input file has the values:
[048h 00h 00h 00h] (hexadecimal, I used a hex editor to see this). And
0x48H = 72 decimal... I don't even know if this is little endian or
not, but what I find very strange is the number 49 then... That number
makes no sense to me (why is it adding 1, if it's printing the number
in hexadecimal? What's it thinking?)..?
Next you talk about getting R1begin which should be 4 bytes, but you are
setting it to the value of the first byte of the file (memblock[0] is
the first byte of the file) the static_cast to int is unneeded, as char
will convert to int implicitly.

Oops, sorry. You're right. I should take the first 4 bytes and convert
them into the correct integer format (which I think is unsigned int
32, but I'll have to test it). However, I still don't know exactly how
to do the conversion, sorry, I'm a bit unexperienced and it took quite
some hours to produce the first code in the original post :-(
Lastly, you are saying that you are expecting to read 72 bytes at the
beginning of the file, and also that you are expecting to get the value
72 from the beginning of the file, implying that R1begin is a byte count
for the record, but then don't describe the record format depending on
Exactly.

that value at all, and then only look at 1 byte for that info, when you
describe it as a 4 byte number.

You're right. I gave up on the last and concluded that I needed some
help. I was trying to get inspired by some template<T>-code I found.
However that code only worked for reading directly from the file and
now I copied the whole file into memory, in a buffer... The correct
thing to do, is to take out the first 4 bytes and then convert that
into the relevant type. I'm currently struggling with that.
If R1_begin ins't a data length field, why are you expecting it to have
the value 72?

You're completely right. It is the data length field and I expect to
read in the value of 72 because: Total bytes read = 4+24+40+4 = 72
bytes. You don't happen to have some genius template that works for
this kind of problem, do you?

I'm sorry, but I still don't get the right value of 72. Here are 2 new
attempts, and I don't quite understand the output:


int R1_begin;
R1_begin = *(reinterpret_cast<char*> (&memblock));
cout << "Value = " << R1_begin << endl; // writes out: Value = 96 ??
why ?


int R1_begin; // makes no difference if I write unsigned int
R1_begin here!
memcpy(&R1_begin, &memblock[0], sizeof(R1_begin));
cout << "Value = " << R1_begin << endl; // writes out: Value =
875770417 ?? why ?


Once I get understand this and get it right, I'll look into the endian-
thing (I'm still not sure whether I have an endian problem or not)...
 
M

Martin Jørgensen

On 10/16/11 10:17 AM, someone wrote:


They are typically macros, just look them up if you want to see how they
work.
Ok.



I use the network byte order functions, that's what they are there for.

Hmm... Based on my output to Richard - do I then have an endian
problem? Sorry I'm a bit too unexperienced to fully understand this
yet. A few replies and I think I'll understand it..

Thanks a lot for your (you, Richard and Ebenezer) help so far! :)
 
S

someone

Sorry if it caused any confusion, but I'm the original poster (I had
two browser windows open with 2 different gmail accounts).
I now found out that my system is little endian. I also think the data
in the file is stored in little endian format (hex editor shows the
first four bytes: [048h 00h 00h 00h]) and that should correspond to an
integer value of 72, even though I cannot get it right yet... Guess
I'm casting incorrectly from char* to int.
 
S

someone

integer value of 72, even though I cannot get it right yet... Guess
I'm casting incorrectly from char* to int.

Oooops... Stupid me! I had outcommented the correct file name and used
a much smaller working file in the line: infile.open("DELETEME.res",
ios::in|ios::binary)

--- sorry guys, maybe it's too late to work for me now! :)


Last question for today and I shall soon close the thread: Can I ask
you which of the following you think is the best to use (or suggest
alternatives)?


if (0) // my way of switching between alternate "solutions" when I
want both in the code
R1_begin = *(reinterpret_cast<unsigned char*> (&memblock[0]));//,
1, 'uint32',endianNess);
else
memcpy(&R1_begin, &memblock[0], sizeof(R1_begin));


?
 
I

Ian Collins

On 10/15/11 3:00 PM, someone wrote: ==


First, what is the first byte in the file? you are saying that you are
expecting it to be 72 but you are getting 49. Have you looked at the file..

Good point. The first four bytes of the input file has the values:
[048h 00h 00h 00h] (hexadecimal, I used a hex editor to see this). And
0x48H = 72 decimal... I don't even know if this is little endian or
not, but what I find very strange is the number 49 then... That number
makes no sense to me (why is it adding 1, if it's printing the number
in hexadecimal? What's it thinking?)..?

If you are in a 32 bit little endian (x86) machine, this will work:

#include <iostream>

int main()
{
char tmp[4] = {0x48,0,0,0};

std::cout << *reinterpret_cast<int*>(tmp) << std::endl;

return 0;
}

If not, you will have to build the integer value from the individual
bytes. For example if the data were written on a big endian machine:

char be[4] = {0,0,0,0x48};

int n = be[3] + (be[2]<<8) + (be[1]<<16) + (be[0]<<24);

std::cout << n << std::endl;
 
S

someone

On 10/16/11 11:39 AM, Martin Jørgensen wrote:
If you are in a 32 bit little endian (x86) machine, this will work:

#include <iostream>

int main()
{
   char tmp[4] = {0x48,0,0,0};

   std::cout << *reinterpret_cast<int*>(tmp) << std::endl;

   return 0;

}

If not, you will have to build the integer value from the individual
bytes.  For example if the data were written on a big endian machine:

   char be[4] = {0,0,0,0x48};

   int n = be[3] + (be[2]<<8) + (be[1]<<16) + (be[0]<<24);

   std::cout << n << std::endl;


THANKS A LOT TO ALL WHO HELPED! :)

I'll get back if I get problems with reading the rest of the file,
however I think (I hope) everything will work out now :)
 
R

Richard Damon

integer value of 72, even though I cannot get it right yet... Guess
I'm casting incorrectly from char* to int.

Oooops... Stupid me! I had outcommented the correct file name and used
a much smaller working file in the line: infile.open("DELETEME.res",
ios::in|ios::binary)

--- sorry guys, maybe it's too late to work for me now! :)


Last question for today and I shall soon close the thread: Can I ask
you which of the following you think is the best to use (or suggest
alternatives)?


if (0) // my way of switching between alternate "solutions" when I
want both in the code
R1_begin = *(reinterpret_cast<unsigned char*> (&memblock[0]));//,
1, 'uint32',endianNess);
else
memcpy(&R1_begin,&memblock[0], sizeof(R1_begin));


?

The first doesn't do what you want. &memblock[0] is already a char* so
casting it to unsigned char* and dereferencing isn't going to make a big
change. If you did a cast to unsigned int*, that would be different.


You also shouldn't use reinterpret_cast here, static_cast is good
enough, and is actually what you want.

The memcpy is better, as the cast has the danger that some fields might
not line up on the right word boundaries in your data buffer. It also
says you can later change them to the network order calls later to make
your program more portable.
 
P

Pavel

someone said:
Hi,

Here's my program:

==========================
#include<fstream>
#include<iostream>
#include<string>
#include<sstream>
#include<stdio.h>
//#include<boost/cstdint.hpp> //<-- I know this is not the group to
ask in, maybe unnecessary

using namespace std;


int main( int argc, const char* argv[] )
{
ifstream::pos_type fsize;
char * memblock;
ifstream infile;

cout<< "Program to read SENSOR file (for FLEX5). Build on: "<<
__DATE__<< " / "<< __TIME__<< endl;

infile.open("DELETEME.res", ios::in|ios::binary);

if (infile.is_open())
{
infile.seekg (0, ios::end); // go to end of file
fsize = infile.tellg();
infile.seekg (0, ios::beg); // go to beginning

cout<< "Size of file is: "<< fsize<< " bytes."<< endl;
memblock = new char [fsize];

infile.read (memblock, fsize);
infile.close();
//cout<< "the complete file content is in memory"<< endl;
}
else
cout<< "Unable to open file"<< endl;

cout<< endl; // delimiter before program outputs read data

int R1_begin;
R1_begin = static_cast<typeof R1_begin> (memblock[0]);//, 1,
'uint32',endianNess);
cout<< "Value = "<< R1_begin<< endl;

delete[] memblock;
return 0;
}
==========================

Problem:

R1_begin is the first value read from a binary file. But it's wrong!
It should be 72 but the program prints 49 out to the screen. So I
cannot continue before I get this right. The first few bytes I'm going
to read is:

R1begin = 1 byte times sizeof 'uint32' = 4 bytes
I1_I6 = 6 bytes times sizeof 'uint32' = 6x4 = 24 bytes
TEXT = 40 ASCII characters of type 'char' = 40 bytes
R1end = 1 byte times sizeof 'uint32' = 4 bytes

Total bytes read = 4+24+40+4 = 72 bytes. That's why I must read 72
bytes from the very first byte in the binary file... And I cannot
solve this! :-(

Please help (and please provide hints for reading the next few bytes
and how I should convert from data in memory into different data
types). Later I also need to convert some bytes from memblock[ ] into
floats and doubles...

I'm stuck...

And what about big endian / little endian? I'm confused. Looking
forward to hear from you!

I think previous responders answered most of your questions. I just noticed few
other things that may go wrong with your code sooner or later:

1. No check the return value of infile.read().
2. memblock was not initiazlied so delete[] can coredump (I would just use
std::vector or unique_ptr with array deleter if implemented and I cared about
performance).

-Pavel
 
R

Richard Damon

Hmm... Based on my output to Richard - do I then have an endian
problem? Sorry I'm a bit too unexperienced to fully understand this
yet. A few replies and I think I'll understand it..

Thanks a lot for your (you, Richard and Ebenezer) help so far! :)

Your data in the file is little endian. You have a problem if your
processor is not little endian.
 
S

someone

Your data in the file is little endian. You have a problem if your
processor is not little endian.

Yep... Currently I think there's no problem (based on the first
unsigned integer, which is 4 bytes so now I don't want to learn more
than I can absorb) :)
 
S

someone

someone wrote:
I think previous responders answered most of your questions. I just noticed few
other things that may go wrong with your code sooner or later:

1. No check the return value of infile.read().

Ah, thanks a lot. I changed it into this:

if (infile.is_open())
{
infile.seekg (0, ios::end); // go to end of file
fsize = infile.tellg();
infile.seekg (0, ios::beg); // go to beginning

cout << "Size of file is: " << fsize << " bytes." << endl;
memblock = new char[fsize];

infile.read (memblock, fsize);
if (!infile)
{
cout << "Some error occured while reading the file, after: " <<
infile.gcount() << " bytes..." << endl;
return(1);
}
infile.close();
//cout << "the complete file content is in memory" << endl;
}
else
cout << "Unable to open file" << endl;

2. memblock was not initiazlied so delete[] can coredump (I would just use
std::vector or unique_ptr with array deleter if implemented and I cared about
performance).

Huh? I don't understand this. Even though it was not initialized, it
has been allocated and therefore I try to clean up in the end. With
the command:

infile.read (memblock, fsize);

I guess that I'm initializing it... Or what do you suggest?
I can't see why it should coredump, so please advice and thanks :)
 
I

Ian Collins

Yep... Currently I think there's no problem (based on the first
unsigned integer, which is 4 bytes so now I don't want to learn more
than I can absorb) :)

If you are happy that the file's number representations match your
machine's and portability isn't an issue, why don't you just stream the
values in directly rather than buggering about with a buffer?

int R1_begin;

infile >> R1_begin;
 
S

Stefan Ram

someone said:
infile.open("DELETEME.res", ios::in|ios::binary);
(...) infile.seekg (0, ios::end); // go to end of file

»seekg« seems to be defined in ISO/IEC 14882:2011
(in 27.7.2.3p40) using »pubseekoff« , which is defined
(in 27.6.3.2.2p2) in terms of »seekoff« , which is defined
(in 27.6.3.4.2p3) based on »::std::fseek« for file-base streams
(in 27.9.1.5p13); and ISO/IEC 9899:1990 explains about »fseek«:

»A binary stream need not meaningfully support
fseek calls with a whence value of SEEK_END.«
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,880
Messages
2,569,944
Members
46,250
Latest member
Colette301

Latest Threads

Top