canonical way for handling raw data

Discussion in 'C++' started by Matthias Czapla, Aug 24, 2003.

  1. Hi!

    Whats the canonical way for handling raw data. I want to read a file without
    making any assumption about its structure and store portions of it in memory
    and compare ranges with constant byte sequences. _I_ would read it
    into arrays of unsigned char and use C's memcmp(), but as you see Im a
    novice C++ programmer and think that theres some better, typically used,
    way.

    Regards
    lal
     
    Matthias Czapla, Aug 24, 2003
    #1
    1. Advertising

  2. Matthias Czapla wrote:
    > Hi!
    >
    > Whats the canonical way for handling raw data. I want to read a file without
    > making any assumption about its structure and store portions of it in memory
    > and compare ranges with constant byte sequences. _I_ would read it
    > into arrays of unsigned char and use C's memcmp(), but as you see Im a
    > novice C++ programmer and think that theres some better, typically used,
    > way.
    >


    I've seen all kinds of messes when handling raw data !

    Before you go down writing memcmp everywhere, ask yourself, what do
    these "chunks of raw data" do ?

    Do you:
    - concatenate them
    - do you write to them
    - do you convert them
    - do you break them up into smaller chunks

    ..... write a list of operations you do with them.

    Sometimes you'll benefit from using a regular vector<char> and sometimes
    you need somthing a little fancier.

    I tend to write code that avoids copying data and so I usually have a
    "Buffer" class where I can create create chunks of raw data and
    reference chunks within those chunks .... etc The idea is that data is
    not copied.
     
    Gianni Mariani, Aug 24, 2003
    #2
    1. Advertising

  3. Gianni Mariani wrote:
    > Matthias Czapla wrote:
    > > Hi!
    > >
    > > Whats the canonical way for handling raw data. I want to read a file without
    > > making any assumption about its structure and store portions of it in memory
    > > and compare ranges with constant byte sequences. _I_ would read it
    > > into arrays of unsigned char and use C's memcmp(), but as you see Im a
    > > novice C++ programmer and think that theres some better, typically used,
    > > way.
    > >

    >
    > I've seen all kinds of messes when handling raw data !
    >
    > Before you go down writing memcmp everywhere, ask yourself, what do
    > these "chunks of raw data" do ?
    >
    > Do you:
    > - concatenate them
    > - do you write to them
    > - do you convert them
    > - do you break them up into smaller chunks
    >
    > .... write a list of operations you do with them.


    Ok, I have an image file of some smartcard used in a digital camera which was
    accidentally deleted/formatted. I want to search in this file for occurences
    of one of several byte sequences which indicate the start of a JPEG picture.
    So Im interested in the position of these sequences in the file.

    I already wrote a pure C program which works seemingly well but Im currently
    in the process of gronking C++ and want to reimplement the program the C++ way.

    Regards
    lal
     
    Matthias Czapla, Aug 24, 2003
    #3
  4. Matthias Czapla wrote:
    > Hi!
    >
    > Whats the canonical way for handling raw data. I want to read a file without
    > making any assumption about its structure and store portions of it in memory
    > and compare ranges with constant byte sequences. _I_ would read it
    > into arrays of unsigned char and use C's memcmp(), but as you see Im a
    > novice C++ programmer and think that theres some better, typically used,
    > way.
    >
    > Regards
    > lal


    The method for handling raw unstructured data is to read it into a
    buffer, then parse the buffer.

    One process that I use is to have classes for each datum type and have
    the classes provide a "load from buffer" and "store to buffer"
    methods. I then pass a pointer to the buffer and call the load
    methods of the class. The load method would advance the buffer
    pointer:
    class MyClass
    {
    public:
    void load_from_buffer(unsigned char * & buffer_pointer);
    };

    void
    MyClass ::
    load_from_buffer(unsigned char * & buffer_pointer)
    {
    my_item = *((/* type of my_item */ *) buffer_pointer);
    buffer_pointer += sizeof /* type of my item */;
    // ...
    return;
    }

    also:
    template <class AnyType>
    AnyTtype load_from_buffer(unsigned char * & buffer_pointer)
    {
    return *((AnyType *) buffer_pointer);
    }



    --
    Thomas Matthews

    C++ newsgroup welcome message:
    http://www.slack.net/~shiva/welcome.txt
    C++ Faq: http://www.parashift.com/c -faq-lite
    C Faq: http://www.eskimo.com/~scs/c-faq/top.html
    alt.comp.lang.learn.c-c++ faq:
    http://www.raos.demon.uk/acllc-c /faq.html
    Other sites:
    http://www.josuttis.com -- C++ STL Library book
     
    Thomas Matthews, Aug 25, 2003
    #4
  5. Thomas Matthews wrote:
    > The method for handling raw unstructured data is to read it into a
    > buffer, then parse the buffer.
    >
    > One process that I use is to have classes for each datum type and have
    > the classes provide a "load from buffer" and "store to buffer"
    > methods. I then pass a pointer to the buffer and call the load
    > methods of the class. The load method would advance the buffer
    > pointer:
    > class MyClass
    > {
    > public:
    > void load_from_buffer(unsigned char * & buffer_pointer);
    > };
    >
    > void
    > MyClass ::
    > load_from_buffer(unsigned char * & buffer_pointer)
    > {
    > my_item = *((/* type of my_item */ *) buffer_pointer);
    > buffer_pointer += sizeof /* type of my item */;
    > // ...
    > return;
    > }
    >
    > also:
    > template <class AnyType>
    > AnyTtype load_from_buffer(unsigned char * & buffer_pointer)
    > {
    > return *((AnyType *) buffer_pointer);
    > }


    Tanks for your reply. I thought about using a separate class for I/O too.
    The most important point for me in your explanation is the use of unsigned
    char to hold the data. Mind you asking me whats the advantage of using
    unsigned over signed char? Do you agree to using std::ifstream::read() for
    reading the data?
     
    Matthias Czapla, Aug 25, 2003
    #5
  6. Matthias Czapla wrote:
    > Thomas Matthews wrote:
    >
    >
    > Tanks for your reply. I thought about using a separate class for I/O too.
    > The most important point for me in your explanation is the use of unsigned
    > char to hold the data. Mind you asking me whats the advantage of using
    > unsigned over signed char? Do you agree to using std::ifstream::read() for
    > reading the data?


    Unsigned char allows usage of all the bits, without any worries about
    overflow and signing. I just want a simple 'byte' or smallest
    accessible unit. The 'signed' quantities have issues when it comes
    to bitmanipulation (such as shifting).

    I guess it's just my style. You can find good discussions about
    signed and unsigned integral types in this newsgroup and
    our neighbor news:comp.lang.c++.

    You can use ifstream::read() as long as the file is opened in
    binary mode. The binary mode tells the compiler/platform to
    _NOT_ perform any translations on the data.

    There are also claims that fread() is simpler and faster.
    However, since developer time and quality is more important
    than speed, go with ifstream::read().

    In my Binary_Stream class, I have a pure virtual function:
    unsigned long size_on_stream() const = 0;
    All classes that use the Binary_Stream interface must provide
    the size that they occupy on the stream. This allows one to
    query an object about the size of data it requires in order
    to allocate a buffer for reading:
    unsigned long buffer_size = my_msg.size_on_stream();
    unsigned char * buffer = new unsigned char[buffer_size];
    my_data_file.read(buffer, buffer_size);
    unsigned char * buf_ptr(buffer);
    my_msg.load_from_buffer(buf_ptr);
    delete [] buffer;
    One nice benefit is that objects can be written to and read
    from a stream without knowing any details about the object!

    --
    Thomas Matthews

    C++ newsgroup welcome message:
    http://www.slack.net/~shiva/welcome.txt
    C++ Faq: http://www.parashift.com/c -faq-lite
    C Faq: http://www.eskimo.com/~scs/c-faq/top.html
    alt.comp.lang.learn.c-c++ faq:
    http://www.raos.demon.uk/acllc-c /faq.html
    Other sites:
    http://www.josuttis.com -- C++ STL Library book
     
    Thomas Matthews, Aug 26, 2003
    #6
  7. Thomas Matthews wrote:

    > Matthias Czapla wrote:
    >
    >> Thomas Matthews wrote:

    > I guess it's just my style. You can find good discussions about
    > signed and unsigned integral types in this newsgroup and
    > our neighbor news:comp.lang.c++.


    That should be news:comp.lang.c.

    --
    Thomas Matthews

    C++ newsgroup welcome message:
    http://www.slack.net/~shiva/welcome.txt
    C++ Faq: http://www.parashift.com/c -faq-lite
    C Faq: http://www.eskimo.com/~scs/c-faq/top.html
    alt.comp.lang.learn.c-c++ faq:
    http://www.raos.demon.uk/acllc-c /faq.html
    Other sites:
    http://www.josuttis.com -- C++ STL Library book
     
    Thomas Matthews, Aug 26, 2003
    #7
  8. Thomas Matthews wrote:
    > > char to hold the data. Mind you asking me whats the advantage of using
    > > unsigned over signed char? Do you agree to using std::ifstream::read() for
    > > reading the data?

    >
    > Unsigned char allows usage of all the bits, without any worries about
    > overflow and signing. I just want a simple 'byte' or smallest
    > accessible unit. The 'signed' quantities have issues when it comes
    > to bitmanipulation (such as shifting).


    I see.

    > I guess it's just my style. You can find good discussions about
    > signed and unsigned integral types in this newsgroup and
    > our neighbor news:comp.lang.c++.
    >
    > You can use ifstream::read() as long as the file is opened in
    > binary mode. The binary mode tells the compiler/platform to
    > _NOT_ perform any translations on the data.


    Ill remember that.

    > There are also claims that fread() is simpler and faster.
    > However, since developer time and quality is more important
    > than speed, go with ifstream::read().


    And as I stated elsewhere I want to do it the "C++ way".

    > In my Binary_Stream class, I have a pure virtual function:
    > unsigned long size_on_stream() const = 0;
    > All classes that use the Binary_Stream interface must provide
    > the size that they occupy on the stream. This allows one to
    > query an object about the size of data it requires in order
    > to allocate a buffer for reading:
    > unsigned long buffer_size = my_msg.size_on_stream();
    > unsigned char * buffer = new unsigned char[buffer_size];
    > my_data_file.read(buffer, buffer_size);
    > unsigned char * buf_ptr(buffer);
    > my_msg.load_from_buffer(buf_ptr);
    > delete [] buffer;
    > One nice benefit is that objects can be written to and read
    > from a stream without knowing any details about the object!


    Very nice. That has given me an idea about the topic. As it seems raw data
    handling isnt too different from Cs and when I think about it this is
    logical since this is very low level. Thank you for your help.

    Regards
    lal
     
    Matthias Czapla, Aug 27, 2003
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Alex Polite
    Replies:
    17
    Views:
    750
    lyallex
    Jun 8, 2004
  2. foo
    Replies:
    4
    Views:
    3,726
  3. Douglas Alan
    Replies:
    17
    Views:
    507
    Douglas Alan
    Mar 2, 2005
  4. Frederick Gotham

    Canonical way to copy an array

    Frederick Gotham, Aug 18, 2006, in forum: C++
    Replies:
    15
    Views:
    560
  5. Robert Latest
    Replies:
    12
    Views:
    446
    Robert Latest
    Jan 9, 2008
Loading...

Share This Page