Is there any way to minimize str()/unicode() objects memory usage[Python 2.6.4] ?

Discussion in 'Python' started by dmtr, Aug 7, 2010.

  1. dmtr

    dmtr Guest

    I'm running into some performance / memory bottlenecks on large lists.
    Is there any easy way to minimize/optimize memory usage?

    Simple str() and unicode objects() [Python 2.6.4/Linux/x86]:
    >>> sys.getsizeof('') 24 bytes
    >>> sys.getsizeof('0') 25 bytes
    >>> sys.getsizeof(u'') 28 bytes
    >>> sys.getsizeof(u'0') 32 bytes


    Lists of str() and unicode() objects (see ref. code below):
    >>> [str(i) for i in xrange(0, 10000000)] 370 Mb (37 bytes/item)
    >>> [unicode(i) for i in xrange(0, 10000000)] 613 Mb (63 bytes/item)


    Well... 63 bytes per item for very short unicode strings... Is there
    any way to do better than that? Perhaps some compact unicode objects?

    -- Regards, Dmitry

    ----
    import os, time, re
    start = time.time()
    l = [unicode(i) for i in xrange(0, 10000000)]
    dt = time.time() - start
    vm = re.findall("(VmPeak.*|VmSize.*)", open('/proc/%d/status' %
    os.getpid()).read())
    print "%d keys, %s, %f seconds, %f keys per second" % (len(l), vm, dt,
    len(l) / dt)
     
    dmtr, Aug 7, 2010
    #1
    1. Advertising

  2. Re: Is there any way to minimize str()/unicode() objects memoryusage[Python 2.6.4] ?

    On Fri, 06 Aug 2010 17:45:31 -0700, dmtr wrote:

    > I'm running into some performance / memory bottlenecks on large lists.
    > Is there any easy way to minimize/optimize memory usage?


    Yes, lots of ways. For example, do you *need* large lists? Often a better
    design is to use generators and iterators to lazily generate data when
    you need it, rather than creating a large list all at once.

    An optimization that sometimes may help is to intern strings, so that
    there's only a single copy of common strings rather than multiple copies
    of the same one.

    Can you compress the data and use that? Without knowing what you are
    trying to do, and why, it's really difficult to advise a better way to do
    it (other than vague suggestions like "use generators instead of lists").

    Very often, it is cheaper and faster to just put more memory in the
    machine than to try optimizing memory use. Memory is cheap, your time and
    effort is not.

    [...]
    > Well... 63 bytes per item for very short unicode strings... Is there
    > any way to do better than that? Perhaps some compact unicode objects?


    If you think that unicode objects are going to be *smaller* than byte
    strings, I think you're badly informed about the nature of unicode.

    Python is not a low-level language, and it trades off memory compactness
    for ease of use. Python strings are high-level rich objects, not merely a
    contiguous series of bytes. If all else fails, you might have to use
    something like the array module, or even implement your own data type in
    C.

    But as a general rule, as I mentioned above, the best way to minimize the
    memory used by a large list is to not use a large list. I can't emphasise
    that enough -- look into generators and iterators, and lazily handle your
    data whenever possible.


    --
    Steven
     
    Steven D'Aprano, Aug 7, 2010
    #2
    1. Advertising

  3. On 08/07/2010 02:45 AM, dmtr wrote:
    > I'm running into some performance / memory bottlenecks on large lists.
    > Is there any easy way to minimize/optimize memory usage?
    >
    > Simple str() and unicode objects() [Python 2.6.4/Linux/x86]:
    >>>> sys.getsizeof('') 24 bytes
    >>>> sys.getsizeof('0') 25 bytes
    >>>> sys.getsizeof(u'') 28 bytes
    >>>> sys.getsizeof(u'0') 32 bytes

    >
    > Lists of str() and unicode() objects (see ref. code below):
    >>>> [str(i) for i in xrange(0, 10000000)] 370 Mb (37 bytes/item)
    >>>> [unicode(i) for i in xrange(0, 10000000)] 613 Mb (63 bytes/item)

    >
    > Well... 63 bytes per item for very short unicode strings... Is there
    > any way to do better than that? Perhaps some compact unicode objects?


    There is a certain price you pay for having full-feature Python objects.

    What are you trying to accomplish anyway? Maybe the array module can be
    of some help. Or numpy?



    >
    > -- Regards, Dmitry
    >
    > ----
    > import os, time, re
    > start = time.time()
    > l = [unicode(i) for i in xrange(0, 10000000)]
    > dt = time.time() - start
    > vm = re.findall("(VmPeak.*|VmSize.*)", open('/proc/%d/status' %
    > os.getpid()).read())
    > print "%d keys, %s, %f seconds, %f keys per second" % (len(l), vm, dt,
    > len(l) / dt)
     
    Thomas Jollans, Aug 7, 2010
    #3
  4. dmtr

    dmtr Guest

    Steven, thank you for answering. See my comments inline. Perhaps I
    should have formulated my question a bit differently: Are there any
    *compact* high performance containers for unicode()/str() objects in
    Python? By *compact* I don't mean compression. Just optimized for
    memory usage, rather than performance.

    What I'm really looking for is a dict() that maps short unicode
    strings into tuples with integers. But just having a *compact* list
    container for unicode strings would help a lot (because I could add a
    __dict__ and go from it).


    > Yes, lots of ways. For example, do you *need* large lists? Often a better
    > design is to use generators and iterators to lazily generate data when
    > you need it, rather than creating a large list all at once.


    Yes. I do need to be able to process large data sets.
    No, there is no way I can use an iterator or lazily generate data when
    I need it.


    > An optimization that sometimes may help is to intern strings, so that
    > there's only a single copy of common strings rather than multiple copies
    > of the same one.


    Unfortunately strings are unique (think usernames on facebook or
    wikipedia). And I can't afford storing them in db/memcached/redis/
    etc... Too slow.


    > Can you compress the data and use that? Without knowing what you are
    > trying to do, and why, it's really difficult to advise a better way to do
    > it (other than vague suggestions like "use generators instead of lists").


    Yes. I've tried. But I was unable to find a good, unobtrusive way to
    do that. Every attempt either adds some unnecessary pesky code, or
    slow, or something like that. See more at: http://bugs.python.org/issue9520


    > Very often, it is cheaper and faster to just put more memory in the
    > machine than to try optimizing memory use. Memory is cheap, your time and
    > effort is not.


    Well... I'd really prefer to use say 16 bytes for 10 chars strings and
    fit data into 8Gb
    Rather than paying extra $1k for 32Gb.

    > > Well...  63 bytes per item for very short unicode strings... Is there
    > > any way to do better than that? Perhaps some compact unicode objects?

    >
    > If you think that unicode objects are going to be *smaller* than byte
    > strings, I think you're badly informed about the nature of unicode.


    I don't think that that unicode objects are going to be *smaller*!
    But AFAIK internally CPython uses UTF-8? No? And 63 bytes per item
    seems a bit excessive.
    My question was - is there any way to do better than that....


    > Python is not a low-level language, and it trades off memory compactness
    > for ease of use. Python strings are high-level rich objects, not merely a
    > contiguous series of bytes. If all else fails, you might have to use
    > something like the array module, or even implement your own data type in
    > C.


    Are there any *compact* high performance containers (with dict, list
    interface) in Python?

    -- Regards, Dmitry
     
    dmtr, Aug 7, 2010
    #4
  5. dmtr

    dmtr Guest

    > > Well...  63 bytes per item for very short unicode strings... Is there
    > > any way to do better than that? Perhaps some compact unicode objects?

    >
    > There is a certain price you pay for having full-feature Python objects.


    Are there any *compact* Python objects? Optimized for compactness?

    > What are you trying to accomplish anyway? Maybe the array module can be
    > of some help. Or numpy?


    Ultimately a dict that can store ~20,000,000 entries: (u'short
    string' : (int, int, int, int, int, int, int)).

    -- Regards, Dmitry
     
    dmtr, Aug 7, 2010
    #5
  6. dmtr

    Chris Rebert Guest

    On Fri, Aug 6, 2010 at 6:39 PM, dmtr <> wrote:
    <snip>
    >> > Well...  63 bytes per item for very short unicode strings... Is there
    >> > any way to do better than that? Perhaps some compact unicode objects?

    >>
    >> If you think that unicode objects are going to be *smaller* than byte
    >> strings, I think you're badly informed about the nature of unicode.

    >
    > I don't think that that unicode objects are going to be *smaller*!
    > But AFAIK internally CPython uses UTF-8?


    Nope. unicode objects internally use UCS-2 or UCS-4, depending on how
    CPython was ./configure-d; the former is the default.
    See PEP 261.

    Cheers,
    Chris
    --
    http://blog.rebertia.com
     
    Chris Rebert, Aug 7, 2010
    #6
  7. dmtr

    Neil Hodgson Guest

    dmtr:

    > What I'm really looking for is a dict() that maps short unicode
    > strings into tuples with integers. But just having a *compact* list
    > container for unicode strings would help a lot (because I could add a
    > __dict__ and go from it).


    Add them all into one string or array and use indexes into that string.

    Neil
     
    Neil Hodgson, Aug 7, 2010
    #7
  8. dmtr

    Carl Banks Guest

    On Aug 6, 6:56 pm, dmtr <> wrote:
    > > > Well...  63 bytes per item for very short unicode strings... Is there
    > > > any way to do better than that? Perhaps some compact unicode objects?

    >
    > > There is a certain price you pay for having full-feature Python objects..

    >
    > Are there any *compact* Python objects? Optimized for compactness?


    Yes, but probably not in the way that'd be useful to you.

    Look at the array module, and also consider the third-party numpy
    library. They store compact arrays of numeric types (mostly) but they
    have character type storage as well. That probably won't help you,
    though, since you have variable-length strings.

    I don't know of any third-party types that can do what you want, but
    there might be some. Search PyPI.


    > > What are you trying to accomplish anyway? Maybe the array module can be
    > > of some help. Or numpy?

    >
    > Ultimately a dict that can store ~20,000,000 entries: (u'short
    > string' : (int, int, int, int, int, int, int)).


    My recommendation would be to use sqlite3. Only if you know for sure
    that it's too slow--meaning that you've actually tried it and it was
    too slow, and nothing else--then should you bother with a

    For that I'd probably go with a binary tree rather than a hash. So
    you have a huge numpy character array that stores all 20 million short
    strings end-to-end (in lexical order, so that you can look up the
    strings with a binary search), then you have an numpy integer array
    that stores the indices into this string where the word boundaries
    are, and then an Nx7 numpy integer array storing the int return
    vslues. That's three compact arrays.


    Carl Banks
     
    Carl Banks, Aug 7, 2010
    #8
  9. On 08/06/2010 07:56 PM, dmtr wrote:
    > Ultimately a dict that can store ~20,000,000 entries: (u'short
    > string' : (int, int, int, int, int, int, int)).


    I think you really need a real database engine. With the proper
    indexes, MySQL could be very fast storing and retrieving this
    information for you. And it will use your RAM to cache as it sees fit.
    Don't try to reinvent the wheel here.
     
    Michael Torrie, Aug 7, 2010
    #9
  10. dmtr

    dmtr Guest

    On Aug 6, 10:56 pm, Michael Torrie <> wrote:
    > On 08/06/2010 07:56 PM, dmtr wrote:
    >
    > > Ultimately a dict that can store ~20,000,000 entries: (u'short
    > > string' : (int, int, int, int, int, int, int)).

    >
    > I think you really need a real database engine.  With the proper
    > indexes, MySQL could be very fast storing and retrieving this
    > information for you.  And it will use your RAM to cache as it sees fit.
    >  Don't try to reinvent the wheel here.


    No, I've tried. DB solutions are not even close in terms of the speed.
    Processing would take weeks :( Memcached or REDIS sort of work, but
    they are still a bit on the slow side, to be a pleasure to work with.
    The standard dict() container is *a lot* faster. It is also hassle
    free (accepting unicode keys/etc). I just wish there was a bit more
    compact dict container, optimized for large dataset and memory, not
    for speed. And with the default dict() I'm also running into some kind
    of nonlinear performance degradation, apparently after
    10,000,000-13,000,000 keys. But I can't recreate this with a solid
    test case (see http://bugs.python.org/issue9520 ) :(

    -- Dmitry
     
    dmtr, Aug 7, 2010
    #10
  11. dmtr

    Guest

    Re: Is there any way to minimize str()/unicode() objects memory usage ?[Python 2.6.4] ?

    dmtr <> wrote:
    >
    > What I'm really looking for is a dict() that maps short unicode
    > strings into tuples with integers. But just having a *compact* list
    > container for unicode strings would help a lot (because I could add a
    > __dict__ and go from it).
    >


    At this point, I'd suggest to use one of the dbm modules, and pack the
    integers with struct.pack into a short string(s).
    Depending on your usage pattern, there are marked performance
    differences between dbhash, gdbm, and dbm implementations, so perhaps
    it would pay off to invest sometime in benchmarking.
    If your data are write-once, then cdb has excellent performance (but a
    different API).
    The file will be usually cached in RAM, so no need to worry about I/O
    bottlenecks... and if small enough, you can always put it into a
    ramdisk.

    If your strings are long enough, you can improve memory usage with
    a use of zlib.compress (dumb and unoptimal way of using compression, but
    easy and present in the std library) - but always verify if the
    compressed strings are _shorter_ than originals.

    --
    -----------------------------------------------------------
    | Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
    | __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk |
    -----------------------------------------------------------
    Antivirus alert: file .signature infected by signature virus.
    Hi! I'm a signature virus! Copy me into your signature file to help me spread!
     
    , Aug 7, 2010
    #11
  12. dmtr

    Peter Otten Guest

    Re: Is there any way to minimize str()/unicode() objects memory usage [Python 2.6.4] ?

    dmtr wrote:

    >> > Well... 63 bytes per item for very short unicode strings... Is there
    >> > any way to do better than that? Perhaps some compact unicode objects?

    >>
    >> There is a certain price you pay for having full-feature Python objects.

    >
    > Are there any *compact* Python objects? Optimized for compactness?
    >
    >> What are you trying to accomplish anyway? Maybe the array module can be
    >> of some help. Or numpy?

    >
    > Ultimately a dict that can store ~20,000,000 entries: (u'short
    > string' : (int, int, int, int, int, int, int)).


    I don't know to what extent it still applys but switching off cyclic garbage
    collection with

    import gc
    gc.disable()

    while building large datastructures used to speed up things significantly.
    That's what I would try first with your real data.

    Encoding your unicode strings as UTF-8 could save some memory.

    When your integers fit into two bytes, say, you can use an array.array()
    instead of the tuple.

    Peter
     
    Peter Otten, Aug 7, 2010
    #12
  13. dmtr

    dmtr Guest

    On Aug 6, 11:50 pm, Peter Otten <> wrote:
    > I don't know to what extent it still applys but switching off cyclic garbage
    > collection with
    >
    > import gc
    > gc.disable()



    Haven't tried it on the real dataset. On the synthetic test it (and
    sys.setcheckinterval(100000)) gave ~2% speedup and no change in memory
    usage. Not significant. I'll try it on the real dataset though.


    > while building large datastructures used to speed up things significantly..
    > That's what I would try first with your real data.
    >
    > Encoding your unicode strings as UTF-8 could save some memory.


    Yes... In fact that's what I'm trying now... .encode('utf-8')
    definitely creates some clutter in the code, but I guess I can
    subclass dict... And it does saves memory! A lot of it. Seems to be a
    bit faster too....

    > When your integers fit into two bytes, say, you can use an array.array()
    > instead of the tuple.


    Excellent idea. Thanks! And it seems to work too, at least for the
    test code. Here are some benchmarks (x86 desktop):

    Unicode key / tuple:
    >>> for i in xrange(0, 1000000): d[unicode(i)] = (i, i+1, i+2, i+3, i+4, i+5, i+6)

    1000000 keys, ['VmPeak:\t 224704 kB', 'VmSize:\t 224704 kB'],
    4.079240 seconds, 245143.698209 keys per second

    >>> for i in xrange(0, 1000000): d[unicode(i).encode('utf-8')] = array.array('i', (i, i+1, i+2, i+3, i+4, i+5, i+6))

    1000000 keys, ['VmPeak:\t 201440 kB', 'VmSize:\t 201440 kB'],
    4.985136 seconds, 200596.331486 keys per second

    >>> for i in xrange(0, 1000000): d[unicode(i).encode('utf-8')] = (i, i+1, i+2, i+3, i+4, i+5, i+6)

    1000000 keys, ['VmPeak:\t 125652 kB', 'VmSize:\t 125652 kB'],
    3.572301 seconds, 279931.625282 keys per second

    Almost halved the memory usage. And faster too. Nice.

    -- Dmitry
     
    dmtr, Aug 7, 2010
    #13
  14. dmtr

    dmtr Guest

    Correction. I've copy-pasted it wrong! array.array('i', (i, i+1, i+2, i
    +3, i+4, i+5, i+6)) was the best.

    >>> for i in xrange(0, 1000000): d[unicode(i)] = (i, i+1, i+2, i+3, i+4, i+5, i+6)

    1000000 keys, ['VmPeak:\t 224704 kB', 'VmSize:\t 224704 kB'],
    4.079240 seconds, 245143.698209 keys per second

    >>> for i in xrange(0, 1000000): d[unicode(i).encode('utf-8')] = (i, i+1, i+2, i+3, i+4, i+5, i+6)

    1000000 keys, ['VmPeak:\t 201440 kB', 'VmSize:\t 201440 kB'],
    4.985136 seconds, 200596.331486 keys per second

    >>> for i in xrange(0, 1000000): d[unicode(i).encode('utf-8')] = array.array('i', (i, i+1, i+2, i+3, i+4, i+5, i+6))

    1000000 keys, ['VmPeak:\t 125652 kB', 'VmSize:\t 125652 kB'],
    3.572301 seconds, 279931.625282 keys per second

    -- Dmitry
     
    dmtr, Aug 7, 2010
    #14
  15. dmtr

    Peter Otten Guest

    Re: Is there any way to minimize str()/unicode() objects memory usage [Python 2.6.4] ?

    dmtr wrote:

    > On Aug 6, 11:50 pm, Peter Otten <> wrote:
    >> I don't know to what extent it still applys but switching off cyclic
    >> garbage collection with
    >>
    >> import gc
    >> gc.disable()

    >
    >
    > Haven't tried it on the real dataset. On the synthetic test it (and
    > sys.setcheckinterval(100000)) gave ~2% speedup and no change in memory
    > usage. Not significant. I'll try it on the real dataset though.
    >
    >
    >> while building large datastructures used to speed up things
    >> significantly. That's what I would try first with your real data.
    >>
    >> Encoding your unicode strings as UTF-8 could save some memory.

    >
    > Yes... In fact that's what I'm trying now... .encode('utf-8')
    > definitely creates some clutter in the code, but I guess I can
    > subclass dict... And it does saves memory! A lot of it. Seems to be a
    > bit faster too....
    >
    >> When your integers fit into two bytes, say, you can use an array.array()
    >> instead of the tuple.

    >
    > Excellent idea. Thanks! And it seems to work too, at least for the
    > test code. Here are some benchmarks (x86 desktop):
    >
    > Unicode key / tuple:
    >>>> for i in xrange(0, 1000000): d[unicode(i)] = (i, i+1, i+2, i+3, i+4,
    >>>> i+5, i+6)

    > 1000000 keys, ['VmPeak:\t 224704 kB', 'VmSize:\t 224704 kB'],
    > 4.079240 seconds, 245143.698209 keys per second
    >
    >>>> for i in xrange(0, 1000000): d[unicode(i).encode('utf-8')] =
    >>>> array.array('i', (i, i+1, i+2, i+3, i+4, i+5, i+6))

    > 1000000 keys, ['VmPeak:\t 201440 kB', 'VmSize:\t 201440 kB'],
    > 4.985136 seconds, 200596.331486 keys per second
    >
    >>>> for i in xrange(0, 1000000): d[unicode(i).encode('utf-8')] = (i, i+1,
    >>>> i+2, i+3, i+4, i+5, i+6)

    > 1000000 keys, ['VmPeak:\t 125652 kB', 'VmSize:\t 125652 kB'],
    > 3.572301 seconds, 279931.625282 keys per second
    >
    > Almost halved the memory usage. And faster too. Nice.


    > def benchmark_dict(d, N):
    > start = time.time()
    >
    > for i in xrange(N):
    > length = lengths[random.randint(0, 255)]
    > word = ''.join([ letters[random.randint(0, 255)] for i in

    xrange(length) ])
    > d[word] += 1
    >
    > dt = time.time() - start
    > vm = re.findall("(VmPeak.*|VmSize.*)", open('/proc/%d/status' %

    os.getpid()).read())
    > print "%d keys (%d unique), %s, %f seconds, %f keys per second" % (N,

    len(d), vm, dt, N / dt)
    >


    Looking at your benchmark, random.choice(letters) has probably less overhead
    than letters[random.randint(...)]. You might even try to inline it as

    letters[int(random.random())*256)]

    Peter
     
    Peter Otten, Aug 7, 2010
    #15
  16. dmtr

    dmtr Guest

    > Looking at your benchmark, random.choice(letters) has probably less overhead
    > than letters[random.randint(...)]. You might even try to inline it as


    Right... random.choice()... I'm a bit new to python, always something
    to learn. But anyway in that benchmark (from http://bugs.python.org/issue9520
    ) the code that generate 'words' takes 90% of the time. And I'm really
    looking at deltas between different methods, not the absolute value. I
    was also using different code to get benchmarks for my previous
    message... Here's the code:


    #!/usr/bin/python
    # -*- coding: utf-8 -*-
    import os, time, re, array

    start = time.time()
    d = dict()
    for i in xrange(0, 1000000): d[unicode(i).encode('utf-8')] =
    array.array('i', (i, i+1, i+2, i+3, i+4, i+5, i+6))
    dt = time.time() - start
    vm = re.findall("(VmPeak.*|VmSize.*)", open('/proc/%d/status' %
    os.getpid()).read())
    print "%d keys, %s, %f seconds, %f keys per second" % (len(d), vm, dt,
    len(d) / dt)
     
    dmtr, Aug 7, 2010
    #16
  17. dmtr

    dmtr Guest

    I guess with the actual dataset I'll be able to improve the memory
    usage a bit, with BioPython::trie. That would probably be enough
    optimization to continue working with some comfort. On this test code
    BioPython::trie gives a bit of improvement in terms of memory. Not
    much though...

    >>> d = dict()
    >>> for i in xrange(0, 1000000): d[unicode(i).encode('utf-8')] = array.array('i', (i, i+1, i+2, i+3, i+4, i+5, i+6))


    1000000 keys, ['VmPeak:\t 125656 kB', 'VmSize:\t 125656 kB'],
    3.525858 seconds, 283618.896034 keys per second


    >>> from Bio import trie
    >>> d = trie.trie()
    >>> for i in xrange(0, 1000000): d[unicode(i).encode('utf-8')] = array.array('i', (i, i+1, i+2, i+3, i+4, i+5, i+6))


    1000000 keys, ['VmPeak:\t 108932 kB', 'VmSize:\t 108932 kB'],
    4.142797 seconds, 241382.814950 keys per second
     
    dmtr, Aug 7, 2010
    #17
  18. dmtr

    Guest

    Re: Is there any way to minimize str()/unicode() objects memory usage ?[Python 2.6.4] ?

    dmtr <> wrote:
    > I guess with the actual dataset I'll be able to improve the memory
    > usage a bit, with BioPython::trie. That would probably be enough
    > optimization to continue working with some comfort. On this test code
    > BioPython::trie gives a bit of improvement in terms of memory. Not
    > much though...
    >
    >>>> d = dict()
    >>>> for i in xrange(0, 1000000): d[unicode(i).encode('utf-8')] = array.array('i', (i, i+1, i+2, i+3, i+4, i+5, i+6))

    >


    Using struct.pack('7i',i, i+1, i+2, i+3, i+4, i+5, i+6) instead of
    array.array gives 20% improvement in time with (not surprisingly)
    the same memory usage.

    --
    -----------------------------------------------------------
    | Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
    | __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk |
    -----------------------------------------------------------
    Antivirus alert: file .signature infected by signature virus.
    Hi! I'm a signature virus! Copy me into your signature file to help me spread!
     
    , Aug 7, 2010
    #18
  19. dmtr

    Nobody Guest

    Re: Is there any way to minimize str()/unicode() objects memory usage [Python 2.6.4] ?

    On Fri, 06 Aug 2010 18:39:27 -0700, dmtr wrote:

    > Steven, thank you for answering. See my comments inline. Perhaps I
    > should have formulated my question a bit differently: Are there any
    > *compact* high performance containers for unicode()/str() objects in
    > Python? By *compact* I don't mean compression. Just optimized for
    > memory usage, rather than performance.
    >
    > What I'm really looking for is a dict() that maps short unicode
    > strings into tuples with integers.


    Use UTF-8 encoded strings (str/bytes) as keys rather than unicode objects.
     
    Nobody, Aug 7, 2010
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Erik  Bethke
    Replies:
    1
    Views:
    1,950
    Erik Bethke
    Feb 8, 2005
  2. san
    Replies:
    3
    Views:
    664
    Bruce Barker
    Jan 8, 2007
  3. Thomas Ploch
    Replies:
    0
    Views:
    406
    Thomas Ploch
    Jan 28, 2008
  4. gizli

    str and unicode proper usage

    gizli, Dec 14, 2009, in forum: Python
    Replies:
    2
    Views:
    295
    Lie Ryan
    Dec 14, 2009
  5. Marco
    Replies:
    6
    Views:
    1,433
    Marco
    May 17, 2012
Loading...

Share This Page