Another Python rookie trying to port to C/C++ question.

Discussion in 'Python' started by Don Bruder, Sep 25, 2003.

  1. Don Bruder

    Don Bruder Guest

    A week or two ago, I asked here about porting Python to C. Got some good
    answers (Note 1) but now I've got another question. Actually, more a
    request for clarification of a topic that both the Python tutorial and
    docs leave a touch murky to my understanding.

    Dictionaries/"dict" types...

    Am I understanding/interpreting correctly when I go with the idea that a
    "dict" variable can be looked at as (effectively) two parallel arrays?
    Something like this C construct:

    struct DictStruct
    {
    char *KeyArray[<somenumber>];
    ValType ValArray[<somenumber>];
    }

    Where "ValType" would be likely be a union, to allow
    assigning/retrieving the actual values correctly depending on what their
    individiual types were, and "<somenumber>" is just that: Some
    more-or-less arbitrarily selected value that provides enough space to
    handle all expected numbers of key/value pairs?

    Similarly, would something like:

    struct DictStruct
    {
    char Key[25];
    ValType Value;
    DictStruct *PrevDictEntry;
    DictStruct *NextDictEntry;
    }

    (again, ValType would probably be a union to cover each type of value
    that might be wanted in the dictionary - but in this case, Key directly
    contains the Key's name) then building a double-linked list of struct
    DictStructs end up behaving at least reasonably like a Python "dict"?

    I expect that either will be a reasonable facsimile of how Python
    handles dicts, with, of course, the corresponding overhead (which may be
    surprisingly little, I'm thinking... but I have yet to code that part,
    so I may find out it's going to be a nightmare) of "We want the value
    that goes with key, so we've got to walk the KeyArray[] until we find a
    match, then extract the value from the other array", or a similar
    procedure for the doubly-linked list.

    Or am I way out in left field with the way I'm RTFM?



    (Note 1)
    Along with quite a bit of entirely unwelcome "Python evangelism" --
    Which part of "I don't care how good, bad, or indifferent Python is
    compared to C/C++, or any other language. I want this Python code to be
    written in C/C++, end of discussion." wasn't clear enough for those of
    you who took it upon yourselves to preach (including the idiot who
    decided to spew a relatively impressive (as such things go) string of
    insults about how stupid I am, how far up my ass my head is, and how
    pathetic my family all the way back to Adam must be for producing such a
    throwback because I've got no interest in adopting "the perfect
    language...") at me on the merits of Python, both on group and
    in-mailbox?

    --
    Don Bruder - <--- Preferred Email - SpamAssassinated.
    Hate SPAM? See <http://www.spamassassin.org> for some seriously great info.
    I will choose a path that's clear: I will choose Free Will! - N. Peart
    Fly trap info pages: <http://www.sonic.net/~dakidd/Horses/FlyTrap/index.html>
     
    Don Bruder, Sep 25, 2003
    #1
    1. Advertising

  2. Don Bruder <> writes:

    [rewriting a Python program in C/C++]
    > Something like this C construct:
    >
    > struct DictStruct
    > {
    > char *KeyArray[<somenumber>];
    > ValType ValArray[<somenumber>];
    > }
    >

    [...]
    > Similarly, would something like:
    >
    > struct DictStruct
    > {
    > char Key[25];
    > ValType Value;
    > DictStruct *PrevDictEntry;
    > DictStruct *NextDictEntry;
    > }
    >
    > [...] end up behaving at least reasonably like a Python "dict"?


    Yes.

    Except that it would be less flexible, *much* slower, and
    probably much more buggy.

    Thomas
     
    Thomas Heller, Sep 25, 2003
    #2
    1. Advertising

  3. Don Bruder

    Peter Otten Guest

    Don Bruder wrote:

    > A week or two ago, I asked here about porting Python to C. Got some good
    > answers (Note 1) but now I've got another question. Actually, more a
    > request for clarification of a topic that both the Python tutorial and
    > docs leave a touch murky to my understanding.
    >
    > Dictionaries/"dict" types...


    As you seem to feel at home with C, you could have a look into
    Objects/dictobject.c and Include/dictobject.h of the source distribution.

    > Which part of "I don't care how good, bad, or indifferent Python is
    > compared to C/C++, or any other language. I want this Python code to be
    > written in C/C++, end of discussion." wasn't clear enough for those of


    Smart people use efficient tools and build on existing libraries. If you are
    serious about the C++ part of your subject line, the Standard library aka
    STL has a map template that resembles Python's dict and might enhance both
    development and execution speed.

    Off topic (or not):
    A single person can start a discussion but not limit it, I think that's a
    feature of newsgroups rather than an annoyance.


    Peter
     
    Peter Otten, Sep 25, 2003
    #3
  4. Don Bruder

    Peter Otten Guest

    Don Bruder wrote:

    > A week or two ago, I asked here about porting Python to C. Got some good
    > answers (Note 1) but now I've got another question. Actually, more a
    > request for clarification of a topic that both the Python tutorial and
    > docs leave a touch murky to my understanding.
    >
    > Dictionaries/"dict" types...


    As you seem to feel at home with C, you could have a look into
    Objects/dictobject.c and Include/dictobject.h of the source distribution.

    > Which part of "I don't care how good, bad, or indifferent Python is
    > compared to C/C++, or any other language. I want this Python code to be
    > written in C/C++, end of discussion." wasn't clear enough for those of


    Smart people use efficient tools and build on existing libraries. The
    Standard library aka STL has a <map> template that might enhance both
    development and execution speed.

    Off topic (or not):
    A single person can start a discussion but not end it, I think that's a
    feature of newsgroups rather than an annoyance.


    Peter
     
    Peter Otten, Sep 25, 2003
    #4
  5. Don Bruder

    Ulrich Petri Guest

    "Don Bruder" <> schrieb im Newsbeitrag
    news:GSGcb.24394$...
    >


    > I expect that either will be a reasonable facsimile of how Python
    > handles dicts, with, of course, the corresponding overhead (which may be
    > surprisingly little, I'm thinking... but I have yet to code that part,
    > so I may find out it's going to be a nightmare) of "We want the value
    > that goes with key, so we've got to walk the KeyArray[] until we find a
    > match, then extract the value from the other array", or a similar
    > procedure for the doubly-linked list.


    Python hashes the keys for fast lookup.


    > (Note 1)
    > Along with quite a bit of entirely unwelcome "Python evangelism" --
    > Which part of "I don't care how good, bad, or indifferent Python is
    > compared to C/C++, or any other language. I want this Python code to be
    > written in C/C++, end of discussion." wasn't clear enough for those of
    > you who took it upon yourselves to preach (including the idiot who
    > decided to spew a relatively impressive (as such things go) string of
    > insults about how stupid I am, how far up my ass my head is, and how
    > pathetic my family all the way back to Adam must be for producing such a
    > throwback because I've got no interest in adopting "the perfect
    > language...") at me on the merits of Python, both on group and
    > in-mailbox?


    You are treated as you treat others.
    It might suprise you but this is a newsgroup not your personal information
    lookup place.
     
    Ulrich Petri, Sep 25, 2003
    #5
  6. Don Bruder

    Don Bruder Guest

    In article <bkvklh$e2i$02$-online.com>,
    Peter Otten <> wrote:

    > Don Bruder wrote:
    >
    > > A week or two ago, I asked here about porting Python to C. Got some good
    > > answers (Note 1) but now I've got another question. Actually, more a
    > > request for clarification of a topic that both the Python tutorial and
    > > docs leave a touch murky to my understanding.
    > >
    > > Dictionaries/"dict" types...

    >
    > As you seem to feel at home with C, you could have a look into
    > Objects/dictobject.c and Include/dictobject.h of the source distribution.


    I may just go poking around there for more details.
    Someone (via email) suggests that dicts are "supposed to be" (my
    synopsis, not his words) a bit "murky", because all that's supposed to
    be relevant is stuffing things into them, and pulling things out. That's
    fine, completely admirable, and fully acceptable. *IF* You're working
    directly in Python. I'm not. I'm working in a combination of C and C++,
    attempting to duplicate the functionality of a program originally
    written in Python. As such, the details *MUST* get clear if I'm to
    create equivalent functionality. If only because I need to read data
    that was written (presumably...) by a Python program. Hence, it becomes
    neccesary to get down "in the muck" and actually know what's going on
    behind the nice, tidy "dict.whosit.get()" (or whatever other operators
    might be involved) interface that Python provides.
    >
    > > Which part of "I don't care how good, bad, or indifferent Python is
    > > compared to C/C++, or any other language. I want this Python code to be
    > > written in C/C++, end of discussion." wasn't clear enough for those of

    >
    > Smart people use efficient tools and build on existing libraries.


    There's a statement that needed saying.... Not...


    > If you are
    > serious about the C++ part of your subject line, the Standard library aka
    > STL has a map template that resembles Python's dict and might enhance both
    > development and execution speed.


    I'm *REASONABLY* serious. This port started life as a "Do you really
    know as much about C++ as you think you do? Prove it! Take this Python
    program and make it C++!" sort of "mid-term" test in my self-taught C++
    course. It's rapidly degenerating (due to some technicalities of C++
    that I've never before encountered, and therefore had no expectation of
    trouble from) into "Let's just get the damn thing working in straight C,
    then worry about trying to turn it into C++ later."

    >
    > Off topic (or not):
    > A single person can start a discussion but not limit it, I think that's a
    > feature of newsgroups rather than an annoyance.


    Perhaps, and perhaps not. Guess it depends on your perspective. Having a
    stream of insult and invective dumped into your mailbox because you've
    made it plain that you don't consider somebody's "pet" language the
    be-all and end-all of computer programming is hardly what I'd call a
    "feature"...

    Be that as it may, fortunately I have a thick enough hide to
    figuratively "flip off" the jerk that was involved, and go on with life
    knowing full well that in at least one respect, I'm so far superior to
    him that words can't describe the gulf between us. Doesn't mean I'm not
    going to mention that I think he's a hopeless putz, but I'm sure not
    going to let it get me down to the point of giving up.

    --
    Don Bruder - <--- Preferred Email - SpamAssassinated.
    Hate SPAM? See <http://www.spamassassin.org> for some seriously great info.
    I will choose a path that's clear: I will choose Free Will! - N. Peart
    Fly trap info pages: <http://www.sonic.net/~dakidd/Horses/FlyTrap/index.html>
     
    Don Bruder, Sep 25, 2003
    #6
  7. Don Bruder

    Don Bruder Guest

    In article <>,
    Thomas Heller <> wrote:

    > Don Bruder <> writes:
    >
    > [rewriting a Python program in C/C++]
    > > Something like this C construct:
    > >
    > > struct DictStruct
    > > {
    > > char *KeyArray[<somenumber>];
    > > ValType ValArray[<somenumber>];
    > > }
    > >

    > [...]
    > > Similarly, would something like:
    > >
    > > struct DictStruct
    > > {
    > > char Key[25];
    > > ValType Value;
    > > DictStruct *PrevDictEntry;
    > > DictStruct *NextDictEntry;
    > > }
    > >
    > > [...] end up behaving at least reasonably like a Python "dict"?

    >
    > Yes.
    >
    > Except that it would be less flexible, *much* slower, and
    > probably much more buggy.
    >
    > Thomas


    Well, since I'm not worried about "flexible", have my doubts about the
    "much slower" part, and have no reason to believe that choice of
    language automatically makes a routine buggy, I'll go with the tenative
    plan I had. Seems you're confirming that I'm understanding things
    properly when it comes to "This is what a dict is, and here's how it
    works." Now to put finishing touches on my understanding of exactly what
    goes on inside that "black box" that is the "dict"
    class/type/whatever-it-is-you-Python-types-like-to-call-such-things, and
    make some working code out of it...

    --
    Don Bruder - <--- Preferred Email - SpamAssassinated.
    Hate SPAM? See <http://www.spamassassin.org> for some seriously great info.
    I will choose a path that's clear: I will choose Free Will! - N. Peart
    Fly trap info pages: <http://www.sonic.net/~dakidd/Horses/FlyTrap/index.html>
     
    Don Bruder, Sep 25, 2003
    #7
  8. On Thu, 25 Sep 2003 18:55:34 GMT, Don Bruder <> wrote:

    >Dictionaries/"dict" types...
    >
    >Am I understanding/interpreting correctly when I go with the idea that a
    >"dict" variable can be looked at as (effectively) two parallel arrays?
    >Something like this C construct:
    >
    >struct DictStruct
    > {
    > char *KeyArray[<somenumber>];
    > ValType ValArray[<somenumber>];
    > }


    Sort of, in a sense, but not quite. The types of items within the
    dictionary (both key and value) are not constrained. This is no
    problem as all Python objects are held using a reference scheme. So
    (assuming that a pair-object scheme is used in the dictionary) the
    struct will be defined much like...

    struct DictPair
    {
    PyObject *Key;
    PyObject *Data;
    }

    The items will almost certainly be held in a single 'array' of pair
    objects.

    The Python dictionaries use hashing. I don't know the details of the
    hashing used. Anyway, there are many ways to handle dictionary-like
    data structures.

    C++ has the std::map template, which uses a form of binary tree called
    a red-black tree (the red-black refers to a partial balancing system).

    Tree-based mappings can be slightly more flexible than hashing (for
    instance, a tree can be 'walked' in sorted order - the hashing process
    does not preserve relative ordering) but hashing tends to be faster.

    An interesting approach to dictionary-like objects is the ternary
    tree. This is rather like cascading binary trees - each subtree
    identifies one character and cascades into the subtree that finds the
    next character. The third link in the 'ternary' is the 'I've
    identified one character and moving on to the next' link.

    A ternary tree can be faster than hashing - in particular at deciding
    that a particular key is not present (hashing needs to calculate a
    hash over the whole tree before looking up the result - a ternary tree
    search may fail at the first character, without looking at the whole
    key).

    Digital trees (or tries, IIRC) can be even faster still - but wastes a
    lot of memory. In a digital tree, each node has an array subscripted
    by character/digit code of pointers to the next subtree.

    Multiway trees (most often used on disk for database indexes - B trees
    and B+ trees being types of multiway trees) may well be appropriate in
    main store on modern desktop machines with caching and virtual memory.

    If these options are just too complicated, sorted arrays can be
    extremely effective as long as you don't have too many items. The C++
    library includes std::vector (a resizable array) and binary search
    algorithms which can ease the implementation.


    Odds are, if you are translating Python to C++, you should use an
    std::map to replace a dictionary. There are some different
    assumptions, though. Dictionaries assume that the key is hashable, but
    don't require relative comparisons ('<', '<=' etc). The std::map
    requires relative comparisons.

    This simply arises out of the different approaches - Python using
    hashing and C++ using binary trees. Going from Python to C++, 99 times
    out of 100 this is not a problem. From C++ to Python there is the
    minor issue that Python dictionaries can't be _directly_ iterated in
    sorted order, though there are certainly easy ways around that.


    --
    Steve Horne

    steve at ninereeds dot fsnet dot co dot uk
     
    Stephen Horne, Sep 25, 2003
    #8
  9. On Thu, 25 Sep 2003 15:19:00 -0700, Brian Quinlan <>
    wrote:

    >> Well, since I'm not worried about "flexible", have my doubts about the
    >> "much slower" part,

    >
    >Your implementation is O(n) with the number of entries while the Python
    >implementation is O(log(n)). If n is small then your implementation
    >might be acceptably fast. Can't you just use the STL hash_map template
    >class?


    Are you sure Python dictionaries are O(log n)? - Remember, they are
    based on hashing.

    Simple hashing is O(1). Handling of collisions and stuff will affect
    that (and I'm no expert on hashing techniques) but I can't help
    wondering if you're confusing Pythons hashing with tree-based
    techniques.


    --
    Steve Horne

    steve at ninereeds dot fsnet dot co dot uk
     
    Stephen Horne, Sep 26, 2003
    #9
  10. Don Bruder

    Tom Anderson Guest

    On Thu, 25 Sep 2003, Don Bruder wrote:

    > [...] As such, the details *MUST* get clear if I'm to create equivalent
    > functionality. If only because I need to read data that was written
    > (presumably...) by a Python program. Hence, it becomes neccesary to get
    > down "in the muck" and actually know what's going on behind the nice,
    > tidy "dict.whosit.get()" (or whatever other operators might be involved)
    > interface that Python provides.


    is this data in 'pickle' format (as opposed to text, XML, or some specific
    binary format)? if it is, reading it in C/C++ is going to be very hard
    indeed; you might consider writing a small python program to read it and
    convert it to something more tractable. if it isn't, you don't need to
    'get down in the muck'; you just need to write a C program to do the job
    that needs to be done.

    if pickling is entirely new to you, then your data probably isn't pickled,
    so you don't need to worry about it.

    > > Off topic (or not): A single person can start a discussion but not
    > > limit it, I think that's a feature of newsgroups rather than an
    > > annoyance.

    >
    > Perhaps, and perhaps not. Guess it depends on your perspective. Having a
    > stream of insult and invective dumped into your mailbox because you've
    > made it plain that you don't consider somebody's "pet" language the
    > be-all and end-all of computer programming is hardly what I'd call a
    > "feature"...


    i'm sorry that happened; i hope you won't think badly of python and
    pythoneers because of one moron. discreetly let the relevant people know
    who he is and we'll fetch the Comfy Chair ... 8)

    tom

    --
    Things fall apart - it's scientific
     
    Tom Anderson, Sep 26, 2003
    #10
  11. Don Bruder

    Tom Anderson Guest

    On Thu, 25 Sep 2003, Don Bruder wrote:

    > In article <>,
    > Thomas Heller <> wrote:
    >
    > > Don Bruder <> writes:
    > >
    > > [rewriting a Python program in C/C++]
    > > > Something like this C construct:
    > > >
    > > > struct DictStruct
    > > > {
    > > > char *KeyArray[<somenumber>];
    > > > ValType ValArray[<somenumber>];
    > > > }
    > > >

    > > [...]
    > > > Similarly, would something like:
    > > >
    > > > struct DictStruct
    > > > {
    > > > char Key[25];
    > > > ValType Value;
    > > > DictStruct *PrevDictEntry;
    > > > DictStruct *NextDictEntry;
    > > > }
    > > >
    > > > [...] end up behaving at least reasonably like a Python "dict"?

    > >
    > > Yes.
    > >
    > > Except that it would be less flexible, *much* slower, and
    > > probably much more buggy.

    >
    > Well, since I'm not worried about "flexible", have my doubts about the
    > "much slower" part,


    think again. python's dictionaries use an efficient implementation, are
    written in C, and have been tuned over the course of several years. your
    sketched design uses a very slow implementation, and is brand new.

    > and have no reason to believe that choice of language automatically
    > makes a routine buggy,


    it's maturity that matters in this case; you will make mistakes, just like
    the python (and STL) implementers did, but they've already had years to
    find and fix them.

    if you don't fancy using STL (which will be the case if you're steering
    clear of C++ - and might well be even if you weren't!), at least go and
    read up on how to implement dictionaries; they're a very well-studied
    abstract type. hashtables are simple and efficient, and binary trees are
    good too. there are more complicated things like ternary trees, red-black
    trees, Patricia trees (my personal favourite :) ) and skip lists, but
    you'll probably be fine with hashtables. if you can lay your hands on a
    good data structures and algorithms textbook, you'll be fine (Sedgewick's
    'Algorithms in C' is where i learnt my craft).

    tom

    --
    Things fall apart - it's scientific
     
    Tom Anderson, Sep 26, 2003
    #11
  12. On Fri, 26 Sep 2003 00:33:46 +0100, Tom Anderson
    <> wrote:

    >Patricia trees (my personal favourite :)


    Oh goody - Something new to look up!


    --
    Steve Horne

    steve at ninereeds dot fsnet dot co dot uk
     
    Stephen Horne, Sep 26, 2003
    #12
  13. Don Bruder

    Duncan Booth Guest

    Stephen Horne <$$$$$$$$$$$$$$$$$@$$$$$$$$$$$$$$$$$$$$.co.uk> wrote in
    news::

    > On Thu, 25 Sep 2003 15:19:00 -0700, Brian Quinlan <>
    > wrote:
    >
    >>> Well, since I'm not worried about "flexible", have my doubts about the
    >>> "much slower" part,

    >>
    >>Your implementation is O(n) with the number of entries while the Python
    >>implementation is O(log(n)). If n is small then your implementation
    >>might be acceptably fast. Can't you just use the STL hash_map template
    >>class?

    >
    > Are you sure Python dictionaries are O(log n)? - Remember, they are
    > based on hashing.
    >
    > Simple hashing is O(1). Handling of collisions and stuff will affect
    > that (and I'm no expert on hashing techniques) but I can't help
    > wondering if you're confusing Pythons hashing with tree-based
    > techniques.
    >


    Python dictionaries are effectively O(1). Dictionaries are never more than
    2/3rds full, so on average a dictionary lookup only needs a couple of
    probes. Python's collision resolution is designed such that in most cases
    lookups that collide on the first attempt will diverge for the next
    attempt.

    As some of the other posters have said dictionaries in Python have been
    tuned over many years, and since the whole of Python is based on fast
    dictionary lookups it is very unlikely that the OP will get close to the
    same efficiency. That may not matter though as Python dictionaries are
    tuned for general use, so a specific application might be able to do better
    by losing some of the generality.

    Some of the tuning extends beyond dictionaries, for example strings are
    commonly used as dictionary keys, and Python avoids recalculating hash
    values more than once for each string. It is unlikely that a C or C++
    programmer would go to the effort of using a string type that cached hash
    values throughout their application, so for many uses, string lookups in
    dictionaries will almost certainly be slower (although there could be gains
    elsewhere).

    --
    Duncan Booth
    int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
    "\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?
     
    Duncan Booth, Sep 26, 2003
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Big E

    Rookie Question

    Big E, Jun 17, 2004, in forum: ASP .Net
    Replies:
    2
    Views:
    369
    avnrao
    Jun 17, 2004
  2. Greg Smith

    Rookie question

    Greg Smith, Oct 7, 2005, in forum: Java
    Replies:
    4
    Views:
    369
    Monique Y. Mudama
    Oct 8, 2005
  3. Gerald Klix
    Replies:
    0
    Views:
    1,328
    Gerald Klix
    Oct 26, 2005
  4. THurkmans
    Replies:
    14
    Views:
    1,962
    Mike Treseler
    Aug 11, 2009
  5. Replies:
    14
    Views:
    303
Loading...

Share This Page