Data storage/retrieval and dynamically allocating memory

J

Joe Estock

I have been tinkering with this project that stores data into binary files.
The ammount of records that it stores is not known at compile-time. For
example, refer to the ZIP file structure. There is basically a header
describing each of the files and their location in the file, followed by the
actual contents of those files.

My header looks something like this:

struct myStruct {
UINT uiNumItems;
UINT *pStart;
UINT *pSize;
}

uiNumItems is the total number of items that will be stored in the custom
binary file (let's say for the moment that they are png files). pStart is
the starting location in my custom binary file where the particular png file
is written, and pSize is the filesize of that png image.

What I would like to do is the following:

myStruct ms;
int nNumItems = 20;

ms.pStart = new UINT[nNumItems];
ms.pSize = new UINT[nNumItems];

(here I will set the contents of my struct)
(now i will write that struct to a binary file)

delete[] ms.pStart;
delete[] ms.pSize;

My question is this: Is this the most elegant way to do this? I want to
implement a way to do this with the least ammount of overhead possible.
Additionally, am I making proper use of new and delete[]?

Thanks in advance :)
 
J

Joe Estock

One more thing I forgot to add...Judging from my implementation, would this
be cross-platform compatible? i.e., will the file that is output from one
system be able to be read on another system without modifications? My main
concern is with the size of an int (or UINT, etc) on my system may be
totally different on another system. Even if the size of an UINT is 1 larger
than on my system, this could drastically change the way I interpret the
data.
 
K

Karl Heinz Buchegger

Joe said:
My question is this: Is this the most elegant way to do this?

Beauty (and elegance) is in the eye of the beholder :)

But yes, it's a way to do the job.

I would have done it a little bit different:

struct OneItem
{
UINT Start;
UINT Size;
};

struct Header
{
UINT uiNumItems;
OneItem* pItems;
};

Header MyHeader;

MyHeader.uiNumItems = 20;
MyHeader.pItems = new OneItem[ MyHeader.uiNumItems ];

such that information describing one allocation stays together,
but in principle your system would work too.

I want to
implement a way to do this with the least ammount of overhead possible.
Additionally, am I making proper use of new and delete[]?

Nothing wrong with that.
 
K

Karl Heinz Buchegger

Joe said:
One more thing I forgot to add...Judging from my implementation, would this
be cross-platform compatible? i.e., will the file that is output from one
system be able to be read on another system without modifications?

Depends on the hardware, operating system and/or compiler system actually
used on the target machine.
My main
concern is with the size of an int (or UINT, etc) on my system may be
totally different on another system.

Yep. That's one problem.
Even if the size of an UINT is 1 larger
than on my system, this could drastically change the way I interpret the
data.

You figured out what problems await you when transporting binary
files on different computers.

Other problems:
* endianess (low byte first vs. high byte first)
* actual floating point format used (there are lots of them around)
* character codes (eg. ASCII vs. EBCDIC )
...

All of the problems can be solved, some of them are really tricky to
solve. Eg. endianess: Either you nail your binary format down and require
eg. low byte first, or you add additional information in the header which
documents to the file reader in which endianess format the file is written
and the reader has to account for that.
The problem with floating point formats is actually harder to solve. In
principle you again have the same 2 options, but the conversion from one
format into another format may become a pain in the ass.
 
P

Peter van Merkerk

One more thing I forgot to add...Judging from my implementation, would
this
be cross-platform compatible? i.e., will the file that is output from one
system be able to be read on another system without modifications? My main
concern is with the size of an int (or UINT, etc) on my system may be
totally different on another system. Even if the size of an UINT is 1 larger
than on my system, this could drastically change the way I interpret the
data.

No it won't be cross-platform compatible. The size of the UINT (I am
using this is a typedef for an unsigned int, it is not a standard type)
is only one concern. The standard only guarantees the minimum size of
primitive data types, but does not guarantee the exact size. To get
around the size difference a pragmatic approach would be typedefs i.c.w.
conditional compilation:

#if defined(PLATFORM_1)
typedef unsigned long UINT32
#elif defined (PLATFORM_2)
typedef unsigned __int32 UINT32
#else
#error Unsupported platform.
#endif

Another concern is endianess, some platforms store the most significant
bytes of an integer first (big endian), others store the least
significant byte first (little endian).

It is not a good idea to let the file format be a side effect of your
code. I recommend you define a file format first, and then write code
that can read and write that file format later. When you define file
format you will have to choose between little endian and big endian
format and of how many bytes a integer has.
 
J

Joe Estock

Karl Heinz Buchegger said:
Depends on the hardware, operating system and/or compiler system actually
used on the target machine.


Yep. That's one problem.


You figured out what problems await you when transporting binary
files on different computers.

Other problems:
* endianess (low byte first vs. high byte first)

This is one of the things that I had not taken into account. How do I
determine the endianess that I am using on my system? Or is it something
that is common knowledge depending on the type of system that I am on?
* actual floating point format used (there are lots of them around)
* character codes (eg. ASCII vs. EBCDIC )
...

All of the problems can be solved, some of them are really tricky to
solve. Eg. endianess: Either you nail your binary format down and require
eg. low byte first, or you add additional information in the header which
documents to the file reader in which endianess format the file is written
and the reader has to account for that.
The problem with floating point formats is actually harder to solve. In
principle you again have the same 2 options, but the conversion from one
format into another format may become a pain in the ass.

How would I accomplish this? e.g., how would I force the low byte to be
first, or vice-versa?
 
L

lilburne

Joe said:
How would I accomplish this? e.g., how would I force the low byte to be
first, or vice-versa?

Makes ints little endian if second parameter is true
otherwise makes them big endian.

#include <cassert>

int make_little_endian(int n, bool little)
{
// this assumes that int is 4 bytes
assert(sizeof(int) == 4);

// Check endian-ness of current machine
// if little endian then the value 1 will
// be stored in the lowest memory address.
// Deference the first byte of int(1), if
// it is 1 then little endian.
const int t = 1;
const int endian = *((char*)&t);
int i = n;

// swap bytes if machines endian-ness is not
// what we want.
if (little != endian) {
i = (n << 16) + (unsigned(n) >> 16);
}
return i;
}
 
K

Karl Heinz Buchegger

Joe said:
This is one of the things that I had not taken into account. How do I
determine the endianess that I am using on my system? Or is it something
that is common knowledge depending on the type of system that I am on?

There are ways to determine the endianess but most of the time one simply
uses common knowledge. Endieness usually is fixed at the CPU level although
there are CPU's which can be switched during runtime :)
How would I accomplish this? e.g., how would I force the low byte to be
first, or vice-versa?

Thats not the issue I am talking about. A double may eg consist of
8 bytes. The floating point format may eg be on machine A that the
exponent (expressed as powers of 10) resides in byte 4. On machine
B, the floating point format may be completely different: The exponent
(this time expressed as powers of 2) resides in the 7-th byte. On machine
C the situation is radically different. It doesn't have an exponent at all!

OK. The above is exaggregated but the thing is: you need to get literature
on who your compiler and the target compiler store floating point values.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top