Binary data representation

Discussion in 'C++' started by Charles T., Feb 4, 2004.

  1. Charles T.

    Charles T. Guest

    Hi,

    I currently writing a serialize/unserialize architecture. The read/write
    function will read/write from a binary file.

    My question is is there some sort on defined standart to use when
    representing data type (int , int32, int64, double, string, etc....) ?


    Thanks,
     
    Charles T., Feb 4, 2004
    #1
    1. Advertising

  2. Charles T.

    Phlip Guest

    Charles T. wrote:

    > I currently writing a serialize/unserialize architecture. The read/write
    > function will read/write from a binary file.


    Why a binary file?

    > My question is is there some sort on defined standart to use when
    > representing data type (int , int32, int64, double, string, etc....) ?


    No. There is not even a "standard" for what order bytes go inside an int.

    The least heinous data format is XML. You can write very simple or very
    complex data structures in it, and you can read those structures in a text
    editor.

    But XML can be a little obese. Some data formats are compressed XML.

    --
    Phlip
    http://www.xpsd.org/cgi-bin/wiki?TestFirstUserInterfaces
     
    Phlip, Feb 4, 2004
    #2
    1. Advertising

  3. Charles T.

    AirPete Guest

    [snip]

    > The least heinous data format is XML. You can write very simple or
    > very complex data structures in it, and you can read those structures
    > in a text editor.
    >
    > But XML can be a little obese. Some data formats are compressed XML.


    I would reccomend this, also.
    The game Age of Mythology uses XML compressed with zLib compatible
    compression, and it generates very compact but easily decoded files.
    You can get zLib here:
    http://www.zlib.org

    - Pete
     
    AirPete, Feb 4, 2004
    #3
  4. Charles T. wrote:

    > Hi,
    >
    > I currently writing a serialize/unserialize architecture. The read/write
    > function will read/write from a binary file.


    There has been much discussion on Serialization and Persistence in
    this newsgroup and news:comp.lang.c. Use a search engine and look
    for some ideas.


    > My question is is there some sort on defined standart to use when
    > representing data type (int , int32, int64, double, string, etc....) ?


    There is no standard, from platform to platform. On some platforms,
    there may be no standards between OS versions or compiler versions.
    For better portability, write out the data in a consistent form
    (i.e. uint64 == 64 bits, little endian) and let the programs convert
    the data into the native representation.

    Remember, when serializing, that the size of a structure may not
    be the sum of the size of its members. Compilers are allowed to
    add "padding bytes" between members.

    Pointers don't store well. There is a very small probability
    that an OS will allocate a variable in the same place for each
    execution of a program.

    Since pointers don't store well, don't store strings as pointers.
    Store text as <quantity, text> or <text, sentinel character>.

    See section [35] of the C++ FAQ (about serialization):
    http://www.parashift.com/c -faq-lite/serialization.html

    >
    >
    > Thanks,
    >



    --
    Thomas Matthews

    C++ newsgroup welcome message:
    http://www.slack.net/~shiva/welcome.txt
    C++ Faq: http://www.parashift.com/c -faq-lite
    C Faq: http://www.eskimo.com/~scs/c-faq/top.html
    alt.comp.lang.learn.c-c++ faq:
    http://www.raos.demon.uk/acllc-c /faq.html
    Other sites:
    http://www.josuttis.com -- C++ STL Library book
     
    Thomas Matthews, Feb 4, 2004
    #4
  5. Phlip wrote:
    > Charles T. wrote:
    >
    >
    >>I currently writing a serialize/unserialize architecture. The read/write
    >>function will read/write from a binary file.

    >
    >
    > Why a binary file?
    >
    >
    >>My question is is there some sort on defined standart to use when
    >>representing data type (int , int32, int64, double, string, etc....) ?

    >
    >
    > No. There is not even a "standard" for what order bytes go inside an int.
    >
    > The least heinous data format is XML. You can write very simple or very
    > complex data structures in it, and you can read those structures in a text
    > editor.
    >
    > But XML can be a little obese. Some data formats are compressed XML.


    If you're talking about a real-time (streaming) system, the XML overhead
    may be too much of a price to pay.

    In 1999 I built a binary XML format that could be "parsed" in a fraction
    of the time. But for some systems, even this one was too expensive.
     
    Gianni Mariani, Feb 4, 2004
    #5
  6. Charles T.

    AirPete Guest

    Gianni Mariani wrote:
    [snip]
    >
    > In 1999 I built a binary XML format that could be "parsed" in a
    > fraction of the time. But for some systems, even this one was too
    > expensive.


    Would you mind posting your implementation? I would be interested in seeing
    it.
    Thanks!

    - Pete
     
    AirPete, Feb 4, 2004
    #6
  7. You might want to look (depending on your application area and on
    whether you have time to learn it) at ASN.1, which is an ITU standard to
    provide "a notation for defining data structures [and] a defined
    (machine-independent) encoding for those data structures".

    Have a glance at www-sop.inria.fr/rodeo/personnel/hoschka/asn1.html,
    www.asn1.org, or google will bring back lots of links.

    Geoff Macartney

    Charles T. wrote:

    > Hi,
    >
    > I currently writing a serialize/unserialize architecture. The read/write
    > function will read/write from a binary file.
    >
    > My question is is there some sort on defined standart to use when
    > representing data type (int , int32, int64, double, string, etc....) ?
    >
    >
    > Thanks,
    >
    >
    >
    >
     
    Geoff Macartney, Feb 4, 2004
    #7
  8. On Wed, 04 Feb 2004 12:21:16 -0500, Gianni Mariani wrote:

    > In 1999 I built a binary XML format that could be "parsed" in a fraction
    > of the time. But for some systems, even this one was too expensive.


    No need to reinvent the wheel, have a look at ASN.1. Parsers abundand BTW.

    M4
     
    Martijn Lievaart, Feb 4, 2004
    #8
  9. Martijn Lievaart wrote:
    > On Wed, 04 Feb 2004 12:21:16 -0500, Gianni Mariani wrote:
    >
    >
    >>In 1999 I built a binary XML format that could be "parsed" in a fraction
    >>of the time. But for some systems, even this one was too expensive.

    >
    >
    > No need to reinvent the wheel, have a look at ASN.1. Parsers abundand BTW.


    ASN.1 is different - the binary format I'm talking about has a 1:1
    correlation to XML. The format was simply more efficient to parse than
    XML text - admitedly the XML parser I wrote was slower than molasses in
    a blizzard ... :)
     
    Gianni Mariani, Feb 5, 2004
    #9
  10. Charles T. wrote:
    > Hi,
    >
    > I currently writing a serialize/unserialize architecture. The read/write
    > function will read/write from a binary file.
    >
    > My question is is there some sort on defined standart to use when
    > representing data type (int , int32, int64, double, string, etc....) ?
    >


    I have an application in which the compactness of binary representation
    (as compared with, say, XML) is important, but where portability of that
    binary file, regardless of endianess, is also important. My solution is
    very simple: I just choose an endianess and stick with it, and make sure
    to write/read one byte at a time to construct/reconstruct the data. It
    works fine. The binary file is as compact as if I didn't care about
    portability, and it works with all kinds of endianess. The reading and
    the writing in principle takes a little longer because of the
    disassembling/assembling that takes place here, but in practice it is
    not a problem at all because of buffering. I just read, say, 1k at a
    time and the problem disappears. Also, there are usually layers of
    buffering involved anyway, in the OS, in the disk etc.

    /David
     
    David Rasmussen, Feb 5, 2004
    #10
  11. On Wed, 04 Feb 2004 19:50:36 -0500, Gianni Mariani wrote:

    > Martijn Lievaart wrote:
    >> On Wed, 04 Feb 2004 12:21:16 -0500, Gianni Mariani wrote:
    >>
    >>
    >>>In 1999 I built a binary XML format that could be "parsed" in a fraction
    >>>of the time. But for some systems, even this one was too expensive.

    >>
    >>
    >> No need to reinvent the wheel, have a look at ASN.1. Parsers abundand BTW.

    >
    > ASN.1 is different - the binary format I'm talking about has a 1:1
    > correlation to XML. The format was simply more efficient to parse than
    > XML text - admitedly the XML parser I wrote was slower than molasses in
    > a blizzard ... :)


    I think ASN.1 can easily handle binary-XML. Something like it's been a
    while since I worked with ASN.1, so terminlogy is likely to be incorrect):

    list xmlentitydef
    list xmltagdef
    utf8 xmltag
    list xmlattrdef
    utf8 attrkey
    utf8 attrval
    list xmlattr
    utf8 attrkey
    utf8 attrval
    utf8 entitybody

    Entity body could itself be a list with entities. Xmltag and xmlattrdef
    could probably use binary tags if there are only a few possible tags, thus
    saving greatly on space (and processing time).

    I don't think you can get very much more efficient than that.

    M4
     
    Martijn Lievaart, Feb 5, 2004
    #11
  12. Charles T.

    Charles T. Guest

    Thank, for the response,

    i will take a look at the asn1 stuff


    "Geoff Macartney" <> wrote in
    message news:E9eUb.47484$...
    > You might want to look (depending on your application area and on
    > whether you have time to learn it) at ASN.1, which is an ITU standard to
    > provide "a notation for defining data structures [and] a defined
    > (machine-independent) encoding for those data structures".
    >
    > Have a glance at www-sop.inria.fr/rodeo/personnel/hoschka/asn1.html,
    > www.asn1.org, or google will bring back lots of links.
    >
    > Geoff Macartney
    >
    > Charles T. wrote:
    >
    > > Hi,
    > >
    > > I currently writing a serialize/unserialize architecture. The read/write
    > > function will read/write from a binary file.
    > >
    > > My question is is there some sort on defined standart to use when
    > > representing data type (int , int32, int64, double, string, etc....) ?
    > >
    > >
    > > Thanks,
    > >
    > >
    > >
    > >

    >
     
    Charles T., Feb 5, 2004
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark Dufour
    Replies:
    0
    Views:
    313
    Mark Dufour
    Dec 16, 2003
  2. Harry George
    Replies:
    1
    Views:
    357
    Fredrik Lundh
    Dec 16, 2003
  3. Mark Dufour
    Replies:
    5
    Views:
    374
    Bengt Richter
    Dec 18, 2003
  4. Rim
    Replies:
    3
    Views:
    1,110
    Dan Bishop
    Jan 27, 2004
  5. Replies:
    10
    Views:
    2,874
    Torsten Bronger
    Dec 15, 2005
Loading...

Share This Page