Packing a simple dictionary into a string - extending struct?

Discussion in 'Python' started by Jonathan Fine, Jun 20, 2007.

  1. Hello

    I want to serialise a dictionary, whose keys and values are ordinary strings
    (i.e. a sequence of bytes).

    I can of course use pickle, but it has two big faults for me.
    1. It should not be used with untrusted data.
    2. I want non-Python programs to be able to read and write these
    dictionaries.

    I don't want to use XML because:
    1. It is verbose.
    2. It forces other applications to load an XML parser.

    I've written, in about 80 lines, Python code that will pack and unpack (to
    use the language of the struct module) such a dictionary. And then I
    thought I might be reinventing the wheel. But so far I've not found
    anything much like this out there. (The closest is work related to 'binary
    XML' - http://en.wikipedia.org/wiki/Binary_XML.)

    So, what I'm looking for is something like and extension of struct that
    allows dictionaries to be stored. Does anyone know of any related work?

    --
    Jonathan Fine
    Jonathan Fine, Jun 20, 2007
    #1
    1. Advertising

  2. In <f5b2f7$ptf$>, Jonathan Fine wrote:

    > I want to serialise a dictionary, whose keys and values are ordinary strings
    > (i.e. a sequence of bytes).


    Maybe you can use ConfigObj_ or JSON_ to store that data. Another format
    mentioned in the binary XML article you've linked in your post is
    `ASN.1`_. And there's a secure alternative to `pickle` called cerealizer_.

    ... _`ASN.1`: http://pyasn1.sourceforge.net/
    ... _cerealizer: http://home.gna.org/oomadness/en/cerealizer/
    ... _ConfigObj: http://www.voidspace.org.uk/python/configobj.html
    ... _JSON: http://www.json.org/

    Ciao,
    Marc 'BlackJack' Rintsch
    Marc 'BlackJack' Rintsch, Jun 20, 2007
    #2
    1. Advertising

  3. On 6/20/07, Jonathan Fine <> wrote:
    > Hello
    >
    > I want to serialise a dictionary, whose keys and values are ordinary strings
    > (i.e. a sequence of bytes).
    >
    > I can of course use pickle, but it has two big faults for me.
    > 1. It should not be used with untrusted data.
    > 2. I want non-Python programs to be able to read and write these
    > dictionaries.
    >
    > I don't want to use XML because:
    > 1. It is verbose.
    > 2. It forces other applications to load an XML parser.
    >
    > I've written, in about 80 lines, Python code that will pack and unpack (to
    > use the language of the struct module) such a dictionary. And then I
    > thought I might be reinventing the wheel. But so far I've not found
    > anything much like this out there. (The closest is work related to 'binary
    > XML' - http://en.wikipedia.org/wiki/Binary_XML.)
    >
    > So, what I'm looking for is something like and extension of struct that
    > allows dictionaries to be stored. Does anyone know of any related work?
    >


    What about JSON? You can serialize your dictionary, for example, in
    JSON format and then unserialize it in any language that has a JSON
    parser (unless it is Javascript).

    --
    http://srid.nearfar.org/
    Sridhar Ratna, Jun 20, 2007
    #3
  4. > What about JSON? You can serialize your dictionary, for example, in
    > JSON format and then unserialize it in any language that has a JSON
    > parser (unless it is Javascript).


    There is an implementation available for python called simplejson, available
    through easy_install.

    Diez
    Diez B. Roggisch, Jun 20, 2007
    #4
  5. Jonathan Fine

    John Machin Guest

    On Jun 20, 9:19 pm, "Jonathan Fine" <> wrote:
    > Hello
    >
    > I want to serialise a dictionary, whose keys and values are ordinary strings
    > (i.e. a sequence of bytes).
    >
    > I can of course use pickle, but it has two big faults for me.
    > 1. It should not be used with untrusted data.
    > 2. I want non-Python programs to be able to read and write these
    > dictionaries.
    >
    > I don't want to use XML because:
    > 1. It is verbose.
    > 2. It forces other applications to load an XML parser.
    >
    > I've written, in about 80 lines, Python code that will pack and unpack (to
    > use the language of the struct module) such a dictionary. And then I
    > thought I might be reinventing the wheel. But so far I've not found
    > anything much like this out there. (The closest is work related to 'binary
    > XML' -http://en.wikipedia.org/wiki/Binary_XML.)
    >
    > So, what I'm looking for is something like and extension of struct that
    > allows dictionaries to be stored. Does anyone know of any related work?
    >


    C:\junk>copy con adict.csv
    k1,v1
    k2,v2
    k3,v3
    ^Z
    1 file(s) copied.

    C:\junk>\python25\python
    Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
    (Intel)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import csv
    >>> adict = dict(csv.reader(open('adict.csv', 'rb')))
    >>> adict

    {'k3': 'v3', 'k2': 'v2', 'k1': 'v1'}
    >>> csv.writer(open('bdict.csv', 'wb')).writerows(adict.iteritems())
    >>> ^Z


    C:\junk>type bdict.csv
    k3,v3
    k2,v2
    k1,v1

    C:\junk>

    Easy enough?
    HTH,
    John
    John Machin, Jun 20, 2007
    #5
  6. "Sridhar Ratna" <> wrote in message

    > What about JSON? You can serialize your dictionary, for example, in
    > JSON format and then unserialize it in any language that has a JSON
    > parser (unless it is Javascript).


    Thank you for this suggestion. The growing adoption of JSON in Ajax
    programming is a strong argument for my using it in my application, although
    I think I'd prefer something a little more binary.

    So it looks like I'll be using JSON.

    Thanks.


    Jonathan
    Jonathan Fine, Jun 20, 2007
    #6
  7. Jonathan Fine

    Paddy Guest

    On Jun 20, 12:19 pm, "Jonathan Fine" <> wrote:
    > Hello
    >
    > I want to serialise a dictionary, whose keys and values are ordinary strings
    > (i.e. a sequence of bytes).
    >
    > I can of course use pickle, but it has two big faults for me.
    > 1. It should not be used with untrusted data.
    > 2. I want non-Python programs to be able to read and write these
    > dictionaries.
    >
    > I don't want to use XML because:
    > 1. It is verbose.
    > 2. It forces other applications to load an XML parser.
    >
    > I've written, in about 80 lines, Python code that will pack and unpack (to
    > use the language of the struct module) such a dictionary. And then I
    > thought I might be reinventing the wheel. But so far I've not found
    > anything much like this out there. (The closest is work related to 'binary
    > XML' -http://en.wikipedia.org/wiki/Binary_XML.)
    >
    > So, what I'm looking for is something like and extension of struct that
    > allows dictionaries to be stored. Does anyone know of any related work?
    >
    > --
    > Jonathan Fine


    You could use YAML or KSON then compress the output if size is an
    issue.

    - Paddy.
    Paddy, Jun 20, 2007
    #7
  8. Jonathan Fine wrote:

    > Thank you for this suggestion. The growing adoption of JSON in Ajax
    > programming is a strong argument for my using it in my application, although
    > I think I'd prefer something a little more binary.
    >
    > So it looks like I'll be using JSON.


    Well, I tried. But I came across two problems (see below).

    First, there's bloat. For binary byte data, one average one
    character becomes just over 4.

    Second, there's the inconvenience. I can't simple take a
    sequence of bytes and encode them using JSON. I have to
    turn them into Unicode first. And I guess there's a similar
    problem at the other end.

    So I'm going with me own solution:
    http://mathtran.cvs.sourceforge.net/mathtran/py/bytedict.py?revision=1.1&view=markup

    It seems to be related to cerializer:
    http://home.gna.org/oomadness/en/cerealizer/index.html

    It seems to me that JSON works well for Unicode text, but not
    with binary data. Indeed, Unicode hides the binary form of
    the stored data, presenting only the code points. But I don't
    have Unicode strings!

    Here's my test script, which is why I'm not using JSON:
    ===
    import simplejson

    x = u''
    for i in range(256):
    x += unichr(i)

    print len(simplejson.dumps(x)), '\n'

    simplejson.dumps(chr(128))
    ===

    Here's the output
    ===
    1046 # 256 bytes => 256 * 4 + 34 bytes

    Traceback (most recent call last):
    <snip>
    File "/usr/lib/python2.4/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
    UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0:
    unexpected code byte
    ===

    --
    Jonathan
    Jonathan Fine, Jun 22, 2007
    #8
  9. Jonathan Fine

    John Machin Guest

    On Jun 22, 5:08 pm, Jonathan Fine <> wrote:
    > Jonathan Fine wrote:
    > > Thank you for this suggestion. The growing adoption of JSON in Ajax
    > > programming is a strong argument for my using it in my application, although
    > > I think I'd prefer something a little more binary.

    >
    > > So it looks like I'll be using JSON.

    >
    > Well, I tried. But I came across two problems (see below).
    >
    > First, there's bloat. For binary byte data, one average one
    > character becomes just over 4.
    >
    > Second, there's the inconvenience. I can't simple take a
    > sequence of bytes and encode them using JSON. I have to
    > turn them into Unicode first. And I guess there's a similar
    > problem at the other end.
    >
    > So I'm going with me own solution:http://mathtran.cvs.sourceforge.net/mathtran/py/bytedict.py?revision=...
    >


    def unpack(bytes, unpack_entry=unpack_entry):
    '''Return dictionary gotten by unpacking supplied bytes.
    Both keys and values in the returned dictionary are byte-strings.
    '''
    bytedict = {}
    ptr = 0
    while 1:
    key, val, ptr = unpack_entry(bytes, ptr)
    bytedict[key] = val
    if ptr == len(bytes):
    break
    # That's beautiful code -- as pretty as a cane-toad.
    # Well-behaved too, a very elegant response to unpack(pack({}))
    # Try this:
    blen = len(bytes)
    while ptr < blen:
    key, val, ptr = unpack_entry(bytes, ptr)
    bytedict[key] = val

    return bytedict

    HTH,
    John
    John Machin, Jun 22, 2007
    #9
  10. John Machin wrote:

    > def unpack(bytes, unpack_entry=unpack_entry):
    > '''Return dictionary gotten by unpacking supplied bytes.
    > Both keys and values in the returned dictionary are byte-strings.
    > '''
    > bytedict = {}
    > ptr = 0
    > while 1:
    > key, val, ptr = unpack_entry(bytes, ptr)
    > bytedict[key] = val
    > if ptr == len(bytes):
    > break
    > # That's beautiful code -- as pretty as a cane-toad.


    Well, it's nearly right. It has a transposition error.

    > # Well-behaved too, a very elegant response to unpack(pack({}))


    Yes, you're right. An attempt to read bytes that aren't there.

    > # Try this:
    > blen = len(bytes)
    > while ptr < blen:
    > key, val, ptr = unpack_entry(bytes, ptr)
    > bytedict[key] = val
    >
    > return bytedict


    I've committed such a change. Thank you.

    --
    Jonathan
    Jonathan Fine, Jun 23, 2007
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Panos Laganakos

    Packing a list of lists with struct.pack()

    Panos Laganakos, Apr 24, 2006, in forum: Python
    Replies:
    5
    Views:
    492
    Panos Laganakos
    Apr 27, 2006
  2. Karthik
    Replies:
    2
    Views:
    843
    Lawrence D'Oliveiro
    Jun 19, 2009
  3. fishboy

    packing a "variable length struct"

    fishboy, Jun 27, 2011, in forum: C Programming
    Replies:
    11
    Views:
    1,150
    fishboy
    Jul 1, 2011
  4. Replies:
    13
    Views:
    338
    glen herrmannsfeldt
    May 5, 2013
  5. Replies:
    4
    Views:
    106
    Ned Batchelder
    Nov 13, 2013
Loading...

Share This Page