Internals and complexity of types, containers and algorithms

Discussion in 'Python' started by Harald Luessen, Jun 25, 2007.

  1. Hi, I am new to python and I miss some understanding of the internals
    of some types and containers. With my C/C++ background I hope to get
    some hints to chose the best data structure for my programs. Here are
    some questions:

    - Is there a brief description (not source) how the types tuple,
    string, list and dict are represented internally. Is a list behind
    the scenes just a double linked list or an array or a mixture of
    these things? Is a dict a tree or a hash array and what is the
    collision mechanism? Is the string an array with some header info?

    - What is the big-O complexity of the most common algorithms for
    these types and containers? Is it O(n), O(n*log(n)) or O(n**2)?
    I mean inserting, appending (front or back), finding something
    and so on.

    - The same questions for important and common library containers
    if you can name some.

    - Is this information somewhere in the web? Why is it not written
    in the documentation?

    - When I want to use a representation of a game board like chess
    in C/C++ I use an array[64] or bigger of int or char for the pieces.
    What data structure or type would be useful in Python when the
    performance ist most important? Is it list or string or an array
    from a library or what else?

    Harald
    Harald Luessen, Jun 25, 2007
    #1
    1. Advertising

  2. > - Is there a brief description (not source) how the types tuple,
    > string, list and dict are represented internally. Is a list behind
    > the scenes just a double linked list or an array or a mixture of
    > these things?


    Sure, see below:

    - tuples are represented as arrays, with a single block for the
    entire objects (object header, tuple size, and data)
    - list are represented as arrays, with two memory blocks:
    one for object header and sizes, and the other one for the
    "guts", i.e. the actual data. The list uses over-allocation,
    to avoid resizing on each addition.
    - strings are implemented as arrays, with a single block for
    the entire string. In addition to header, size, and data,
    it also contains a cached hash and a pointer to the interned
    version of the string (if any).
    - dicts are implemented as hash tables, with open addressing.

    > Is a dict a tree or a hash array and what is the
    > collision mechanism?


    It's a hash array. Not sure what you mean by "collision
    mechanism" - perhaps "open addressing" is the answer?
    In case you also want to know the probing:

    j(n+1) = 5*j(n) + 1 + perturb(n)
    perturb(n+1) = perturb(n) >> 5

    > Is the string an array with some header info?


    Exactly so.

    > - What is the big-O complexity of the most common algorithms for
    > these types and containers? Is it O(n), O(n*log(n)) or O(n**2)?
    > I mean inserting, appending (front or back), finding something
    > and so on.


    O(1) for indexed access in lists, tuples, and strings, and for
    computing the length of any container.
    O(n) for finding in lists, tuples, and strings. O(1) for inserting
    and finding in dictionaries, assuming few collisions, except
    when rehashing occurs. Amortized O(1) for inserting into lists.
    O(n) for removing from lists.
    O(1) for inserting inot

    > - Is this information somewhere in the web?


    Most likely.

    > Why is it not written in the documentation?


    Because nobody has contributed such documentation.

    > - When I want to use a representation of a game board like chess
    > in C/C++ I use an array[64] or bigger of int or char for the pieces.
    > What data structure or type would be useful in Python when the
    > performance ist most important? Is it list or string or an array
    > from a library or what else?


    Depends on whether and how you want to modify the chessboard.
    If you do want to modify, use a list.

    Regards,
    Martin
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Jun 25, 2007
    #2
    1. Advertising

  3. Harald Luessen

    James Stroud Guest

    Harald Luessen wrote:
    > Hi, I am new to python and I miss some understanding of the internals
    > of some types and containers. With my C/C++ background I hope to get
    > some hints to chose the best data structure for my programs. Here are
    > some questions:


    This depends on how you define "best". If you want speed and
    optimization, you can use the numpy package built with ATLAS tuned for a
    specific machine.

    Beyond speed, "best" in the python community usually means "most suited"
    from an idiomatic perspective and from the perspective of structure that
    lends itself to long term maintainability because [C]python data
    structures seem to undergo optimizations in their implementation at each
    revision.

    Your best bet is probably to forget about implementation and write code
    that makes sense. For example, some have suggested a tuple-keyed
    dictionary to represent a chess board:

    board = ((c,r) for r in xrange(1, 9) for c in 'abcdefgh')
    starting = 'RNBQKBNR' + 'P' * 8 + ' ' * 32 + 'p' * 8 + 'rnbqkbnr'
    position = dict(zip(board, starting))

    Of course we see here the chief problem with algebraic notation: it
    begins numbering at 1 which is painfully reminiscient of fortran.

    James
    James Stroud, Jun 25, 2007
    #3
  4. On Mon, 25 Jun 2007 James Stroud wrote:

    >Harald Luessen wrote:
    >> Hi, I am new to python and I miss some understanding of the internals
    >> of some types and containers. With my C/C++ background I hope to get
    >> some hints to chose the best data structure for my programs. Here are
    >> some questions:

    >
    >This depends on how you define "best". If you want speed and
    >optimization, you can use the numpy package built with ATLAS tuned for a
    >specific machine.


    Are there arrays in numpy that can efficiently be used for
    other things than matrix arithmetic? Are they faster than lists
    but still easy to use?

    no_piece = 0
    wpawn = 1
    ....
    board[square] = no_piece
    board[square+8] = pawn
    ....

    could be a typical pawn move.

    >Beyond speed, "best" in the python community usually means "most suited"
    >from an idiomatic perspective and from the perspective of structure that
    >lends itself to long term maintainability because [C]python data
    >structures seem to undergo optimizations in their implementation at each
    >revision.


    I like the python way of "best" code. But in this particular
    question the emphasis was on performance and speed.

    >Your best bet is probably to forget about implementation and write code
    >that makes sense. For example, some have suggested a tuple-keyed
    >dictionary to represent a chess board:
    >
    >board = ((c,r) for r in xrange(1, 9) for c in 'abcdefgh')
    >starting = 'RNBQKBNR' + 'P' * 8 + ' ' * 32 + 'p' * 8 + 'rnbqkbnr'
    >position = dict(zip(board, starting))


    I am not new to board game programming and I am used to
    an array approach or even bitboards. Therefore the dictionary
    looks a little bit strange to me.

    Harald
    Harald Luessen, Jun 26, 2007
    #4
  5. On Mon, 25 Jun 2007 Martin v. Löwis wrote:

    >Sure, see below:
    >
    >- tuples are represented as arrays, with a single block for the
    > entire objects (object header, tuple size, and data)
    >- list are represented as arrays, with two memory blocks:
    > one for object header and sizes, and the other one for the
    > "guts", i.e. the actual data. The list uses over-allocation,
    > to avoid resizing on each addition.
    >- strings are implemented as arrays, with a single block for
    > the entire string. In addition to header, size, and data,
    > it also contains a cached hash and a pointer to the interned
    > version of the string (if any).
    >- dicts are implemented as hash tables, with open addressing.

    .... and more interesting things ...

    Thank you. That was the information I was looking for.
    I just forgot to ask for sets.

    Harald
    Harald Luessen, Jun 26, 2007
    #5
  6. Harald Luessen

    Paul Rubin Guest

    "Martin v. Löwis" <> writes:
    > Amortized O(1) for inserting into lists.


    I think you mean amortized O(1) for appending to lists.
    Paul Rubin, Jun 26, 2007
    #6
  7. Harald Luessen wrote:
    > On Mon, 25 Jun 2007 Martin v. Löwis wrote:
    >
    >> Sure, see below:
    >>
    >> - tuples are represented as arrays, with a single block for the
    >> entire objects (object header, tuple size, and data)
    >> - list are represented as arrays, with two memory blocks:
    >> one for object header and sizes, and the other one for the
    >> "guts", i.e. the actual data. The list uses over-allocation,
    >> to avoid resizing on each addition.
    >> - strings are implemented as arrays, with a single block for
    >> the entire string. In addition to header, size, and data,
    >> it also contains a cached hash and a pointer to the interned
    >> version of the string (if any).
    >> - dicts are implemented as hash tables, with open addressing.

    > ... and more interesting things ...
    >
    > Thank you. That was the information I was looking for.
    > I just forgot to ask for sets.


    Generally, sets are dictionaries without values (and some tweaks related
    to handling set operations).

    - Josiah
    Josiah Carlson, Jun 27, 2007
    #7
  8. Paul Rubin schrieb:
    > "Martin v. Löwis" <> writes:
    >> Amortized O(1) for inserting into lists.

    >
    > I think you mean amortized O(1) for appending to lists.


    Indeed so; insertion is O(n).

    Regards,
    Martin
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Jun 27, 2007
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Gawelek
    Replies:
    0
    Views:
    399
    Gawelek
    Nov 23, 2003
  2. Jobs Gooogle
    Replies:
    2
    Views:
    467
    Patricia Shanahan
    May 11, 2007
  3. Jobs Gooogle
    Replies:
    1
    Views:
    308
    Victor Bazarov
    May 10, 2007
  4. Jobs Gooogle

    .Net VC++ Java C++ Windows Internals Unix Internals

    Jobs Gooogle, May 10, 2007, in forum: C Programming
    Replies:
    0
    Views:
    345
    Jobs Gooogle
    May 10, 2007
  5. Jobs Gooogle
    Replies:
    0
    Views:
    117
    Jobs Gooogle
    May 10, 2007
Loading...

Share This Page