canonical way for handling raw data

M

Matthias Czapla

Hi!

Whats the canonical way for handling raw data. I want to read a file without
making any assumption about its structure and store portions of it in memory
and compare ranges with constant byte sequences. _I_ would read it
into arrays of unsigned char and use C's memcmp(), but as you see Im a
novice C++ programmer and think that theres some better, typically used,
way.

Regards
lal
 
G

Gianni Mariani

Matthias said:
Hi!

Whats the canonical way for handling raw data. I want to read a file without
making any assumption about its structure and store portions of it in memory
and compare ranges with constant byte sequences. _I_ would read it
into arrays of unsigned char and use C's memcmp(), but as you see Im a
novice C++ programmer and think that theres some better, typically used,
way.

I've seen all kinds of messes when handling raw data !

Before you go down writing memcmp everywhere, ask yourself, what do
these "chunks of raw data" do ?

Do you:
- concatenate them
- do you write to them
- do you convert them
- do you break them up into smaller chunks

..... write a list of operations you do with them.

Sometimes you'll benefit from using a regular vector<char> and sometimes
you need somthing a little fancier.

I tend to write code that avoids copying data and so I usually have a
"Buffer" class where I can create create chunks of raw data and
reference chunks within those chunks .... etc The idea is that data is
not copied.
 
M

Matthias Czapla

Gianni said:
I've seen all kinds of messes when handling raw data !

Before you go down writing memcmp everywhere, ask yourself, what do
these "chunks of raw data" do ?

Do you:
- concatenate them
- do you write to them
- do you convert them
- do you break them up into smaller chunks

.... write a list of operations you do with them.

Ok, I have an image file of some smartcard used in a digital camera which was
accidentally deleted/formatted. I want to search in this file for occurences
of one of several byte sequences which indicate the start of a JPEG picture.
So Im interested in the position of these sequences in the file.

I already wrote a pure C program which works seemingly well but Im currently
in the process of gronking C++ and want to reimplement the program the C++ way.

Regards
lal
 
T

Thomas Matthews

Matthias said:
Hi!

Whats the canonical way for handling raw data. I want to read a file without
making any assumption about its structure and store portions of it in memory
and compare ranges with constant byte sequences. _I_ would read it
into arrays of unsigned char and use C's memcmp(), but as you see Im a
novice C++ programmer and think that theres some better, typically used,
way.

Regards
lal

The method for handling raw unstructured data is to read it into a
buffer, then parse the buffer.

One process that I use is to have classes for each datum type and have
the classes provide a "load from buffer" and "store to buffer"
methods. I then pass a pointer to the buffer and call the load
methods of the class. The load method would advance the buffer
pointer:
class MyClass
{
public:
void load_from_buffer(unsigned char * & buffer_pointer);
};

void
MyClass ::
load_from_buffer(unsigned char * & buffer_pointer)
{
my_item = *((/* type of my_item */ *) buffer_pointer);
buffer_pointer += sizeof /* type of my item */;
// ...
return;
}

also:
template <class AnyType>
AnyTtype load_from_buffer(unsigned char * & buffer_pointer)
{
return *((AnyType *) buffer_pointer);
}



--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book
 
M

Matthias Czapla

Thomas said:
The method for handling raw unstructured data is to read it into a
buffer, then parse the buffer.

One process that I use is to have classes for each datum type and have
the classes provide a "load from buffer" and "store to buffer"
methods. I then pass a pointer to the buffer and call the load
methods of the class. The load method would advance the buffer
pointer:
class MyClass
{
public:
void load_from_buffer(unsigned char * & buffer_pointer);
};

void
MyClass ::
load_from_buffer(unsigned char * & buffer_pointer)
{
my_item = *((/* type of my_item */ *) buffer_pointer);
buffer_pointer += sizeof /* type of my item */;
// ...
return;
}

also:
template <class AnyType>
AnyTtype load_from_buffer(unsigned char * & buffer_pointer)
{
return *((AnyType *) buffer_pointer);
}

Tanks for your reply. I thought about using a separate class for I/O too.
The most important point for me in your explanation is the use of unsigned
char to hold the data. Mind you asking me whats the advantage of using
unsigned over signed char? Do you agree to using std::ifstream::read() for
reading the data?
 
T

Thomas Matthews

Matthias said:
Thomas Matthews wrote:


Tanks for your reply. I thought about using a separate class for I/O too.
The most important point for me in your explanation is the use of unsigned
char to hold the data. Mind you asking me whats the advantage of using
unsigned over signed char? Do you agree to using std::ifstream::read() for
reading the data?

Unsigned char allows usage of all the bits, without any worries about
overflow and signing. I just want a simple 'byte' or smallest
accessible unit. The 'signed' quantities have issues when it comes
to bitmanipulation (such as shifting).

I guess it's just my style. You can find good discussions about
signed and unsigned integral types in this newsgroup and
our neighbor
You can use ifstream::read() as long as the file is opened in
binary mode. The binary mode tells the compiler/platform to
_NOT_ perform any translations on the data.

There are also claims that fread() is simpler and faster.
However, since developer time and quality is more important
than speed, go with ifstream::read().

In my Binary_Stream class, I have a pure virtual function:
unsigned long size_on_stream() const = 0;
All classes that use the Binary_Stream interface must provide
the size that they occupy on the stream. This allows one to
query an object about the size of data it requires in order
to allocate a buffer for reading:
unsigned long buffer_size = my_msg.size_on_stream();
unsigned char * buffer = new unsigned char[buffer_size];
my_data_file.read(buffer, buffer_size);
unsigned char * buf_ptr(buffer);
my_msg.load_from_buffer(buf_ptr);
delete [] buffer;
One nice benefit is that objects can be written to and read
from a stream without knowing any details about the object!

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book
 
T

Thomas Matthews

Thomas said:
I guess it's just my style. You can find good discussions about
signed and unsigned integral types in this newsgroup and
our neighbor news:comp.lang.c++.

That should be
--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book
 
M

Matthias Czapla

Thomas said:
Unsigned char allows usage of all the bits, without any worries about
overflow and signing. I just want a simple 'byte' or smallest
accessible unit. The 'signed' quantities have issues when it comes
to bitmanipulation (such as shifting).

I see.
I guess it's just my style. You can find good discussions about
signed and unsigned integral types in this newsgroup and
our neighbor
You can use ifstream::read() as long as the file is opened in
binary mode. The binary mode tells the compiler/platform to
_NOT_ perform any translations on the data.

Ill remember that.
There are also claims that fread() is simpler and faster.
However, since developer time and quality is more important
than speed, go with ifstream::read().

And as I stated elsewhere I want to do it the "C++ way".
In my Binary_Stream class, I have a pure virtual function:
unsigned long size_on_stream() const = 0;
All classes that use the Binary_Stream interface must provide
the size that they occupy on the stream. This allows one to
query an object about the size of data it requires in order
to allocate a buffer for reading:
unsigned long buffer_size = my_msg.size_on_stream();
unsigned char * buffer = new unsigned char[buffer_size];
my_data_file.read(buffer, buffer_size);
unsigned char * buf_ptr(buffer);
my_msg.load_from_buffer(buf_ptr);
delete [] buffer;
One nice benefit is that objects can be written to and read
from a stream without knowing any details about the object!

Very nice. That has given me an idea about the topic. As it seems raw data
handling isnt too different from Cs and when I think about it this is
logical since this is very low level. Thank you for your help.

Regards
lal
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top