Another Python rookie trying to port to C/C++ question.

D

Don Bruder

A week or two ago, I asked here about porting Python to C. Got some good
answers (Note 1) but now I've got another question. Actually, more a
request for clarification of a topic that both the Python tutorial and
docs leave a touch murky to my understanding.

Dictionaries/"dict" types...

Am I understanding/interpreting correctly when I go with the idea that a
"dict" variable can be looked at as (effectively) two parallel arrays?
Something like this C construct:

struct DictStruct
{
char *KeyArray[<somenumber>];
ValType ValArray[<somenumber>];
}

Where "ValType" would be likely be a union, to allow
assigning/retrieving the actual values correctly depending on what their
individiual types were, and "<somenumber>" is just that: Some
more-or-less arbitrarily selected value that provides enough space to
handle all expected numbers of key/value pairs?

Similarly, would something like:

struct DictStruct
{
char Key[25];
ValType Value;
DictStruct *PrevDictEntry;
DictStruct *NextDictEntry;
}

(again, ValType would probably be a union to cover each type of value
that might be wanted in the dictionary - but in this case, Key directly
contains the Key's name) then building a double-linked list of struct
DictStructs end up behaving at least reasonably like a Python "dict"?

I expect that either will be a reasonable facsimile of how Python
handles dicts, with, of course, the corresponding overhead (which may be
surprisingly little, I'm thinking... but I have yet to code that part,
so I may find out it's going to be a nightmare) of "We want the value
that goes with key, so we've got to walk the KeyArray[] until we find a
match, then extract the value from the other array", or a similar
procedure for the doubly-linked list.

Or am I way out in left field with the way I'm RTFM?



(Note 1)
Along with quite a bit of entirely unwelcome "Python evangelism" --
Which part of "I don't care how good, bad, or indifferent Python is
compared to C/C++, or any other language. I want this Python code to be
written in C/C++, end of discussion." wasn't clear enough for those of
you who took it upon yourselves to preach (including the idiot who
decided to spew a relatively impressive (as such things go) string of
insults about how stupid I am, how far up my ass my head is, and how
pathetic my family all the way back to Adam must be for producing such a
throwback because I've got no interest in adopting "the perfect
language...") at me on the merits of Python, both on group and
in-mailbox?
 
T

Thomas Heller

[rewriting a Python program in C/C++]
Something like this C construct:

struct DictStruct
{
char *KeyArray[<somenumber>];
ValType ValArray[<somenumber>];
}
[...]
Similarly, would something like:

struct DictStruct
{
char Key[25];
ValType Value;
DictStruct *PrevDictEntry;
DictStruct *NextDictEntry;
}

[...] end up behaving at least reasonably like a Python "dict"?

Yes.

Except that it would be less flexible, *much* slower, and
probably much more buggy.

Thomas
 
P

Peter Otten

Don said:
A week or two ago, I asked here about porting Python to C. Got some good
answers (Note 1) but now I've got another question. Actually, more a
request for clarification of a topic that both the Python tutorial and
docs leave a touch murky to my understanding.

Dictionaries/"dict" types...

As you seem to feel at home with C, you could have a look into
Objects/dictobject.c and Include/dictobject.h of the source distribution.
Which part of "I don't care how good, bad, or indifferent Python is
compared to C/C++, or any other language. I want this Python code to be
written in C/C++, end of discussion." wasn't clear enough for those of

Smart people use efficient tools and build on existing libraries. If you are
serious about the C++ part of your subject line, the Standard library aka
STL has a map template that resembles Python's dict and might enhance both
development and execution speed.

Off topic (or not):
A single person can start a discussion but not limit it, I think that's a
feature of newsgroups rather than an annoyance.


Peter
 
P

Peter Otten

Don said:
A week or two ago, I asked here about porting Python to C. Got some good
answers (Note 1) but now I've got another question. Actually, more a
request for clarification of a topic that both the Python tutorial and
docs leave a touch murky to my understanding.

Dictionaries/"dict" types...

As you seem to feel at home with C, you could have a look into
Objects/dictobject.c and Include/dictobject.h of the source distribution.
Which part of "I don't care how good, bad, or indifferent Python is
compared to C/C++, or any other language. I want this Python code to be
written in C/C++, end of discussion." wasn't clear enough for those of

Smart people use efficient tools and build on existing libraries. The
Standard library aka STL has a <map> template that might enhance both
development and execution speed.

Off topic (or not):
A single person can start a discussion but not end it, I think that's a
feature of newsgroups rather than an annoyance.


Peter
 
U

Ulrich Petri

I expect that either will be a reasonable facsimile of how Python
handles dicts, with, of course, the corresponding overhead (which may be
surprisingly little, I'm thinking... but I have yet to code that part,
so I may find out it's going to be a nightmare) of "We want the value
that goes with key, so we've got to walk the KeyArray[] until we find a
match, then extract the value from the other array", or a similar
procedure for the doubly-linked list.

Python hashes the keys for fast lookup.

(Note 1)
Along with quite a bit of entirely unwelcome "Python evangelism" --
Which part of "I don't care how good, bad, or indifferent Python is
compared to C/C++, or any other language. I want this Python code to be
written in C/C++, end of discussion." wasn't clear enough for those of
you who took it upon yourselves to preach (including the idiot who
decided to spew a relatively impressive (as such things go) string of
insults about how stupid I am, how far up my ass my head is, and how
pathetic my family all the way back to Adam must be for producing such a
throwback because I've got no interest in adopting "the perfect
language...") at me on the merits of Python, both on group and
in-mailbox?

You are treated as you treat others.
It might suprise you but this is a newsgroup not your personal information
lookup place.
 
D

Don Bruder

Peter Otten said:
As you seem to feel at home with C, you could have a look into
Objects/dictobject.c and Include/dictobject.h of the source distribution.

I may just go poking around there for more details.
Someone (via email) suggests that dicts are "supposed to be" (my
synopsis, not his words) a bit "murky", because all that's supposed to
be relevant is stuffing things into them, and pulling things out. That's
fine, completely admirable, and fully acceptable. *IF* You're working
directly in Python. I'm not. I'm working in a combination of C and C++,
attempting to duplicate the functionality of a program originally
written in Python. As such, the details *MUST* get clear if I'm to
create equivalent functionality. If only because I need to read data
that was written (presumably...) by a Python program. Hence, it becomes
neccesary to get down "in the muck" and actually know what's going on
behind the nice, tidy "dict.whosit.get()" (or whatever other operators
might be involved) interface that Python provides.
Smart people use efficient tools and build on existing libraries.

There's a statement that needed saying.... Not...

If you are
serious about the C++ part of your subject line, the Standard library aka
STL has a map template that resembles Python's dict and might enhance both
development and execution speed.

I'm *REASONABLY* serious. This port started life as a "Do you really
know as much about C++ as you think you do? Prove it! Take this Python
program and make it C++!" sort of "mid-term" test in my self-taught C++
course. It's rapidly degenerating (due to some technicalities of C++
that I've never before encountered, and therefore had no expectation of
trouble from) into "Let's just get the damn thing working in straight C,
then worry about trying to turn it into C++ later."
Off topic (or not):
A single person can start a discussion but not limit it, I think that's a
feature of newsgroups rather than an annoyance.

Perhaps, and perhaps not. Guess it depends on your perspective. Having a
stream of insult and invective dumped into your mailbox because you've
made it plain that you don't consider somebody's "pet" language the
be-all and end-all of computer programming is hardly what I'd call a
"feature"...

Be that as it may, fortunately I have a thick enough hide to
figuratively "flip off" the jerk that was involved, and go on with life
knowing full well that in at least one respect, I'm so far superior to
him that words can't describe the gulf between us. Doesn't mean I'm not
going to mention that I think he's a hopeless putz, but I'm sure not
going to let it get me down to the point of giving up.
 
D

Don Bruder

Thomas Heller said:
[rewriting a Python program in C/C++]
Something like this C construct:

struct DictStruct
{
char *KeyArray[<somenumber>];
ValType ValArray[<somenumber>];
}
[...]
Similarly, would something like:

struct DictStruct
{
char Key[25];
ValType Value;
DictStruct *PrevDictEntry;
DictStruct *NextDictEntry;
}

[...] end up behaving at least reasonably like a Python "dict"?

Yes.

Except that it would be less flexible, *much* slower, and
probably much more buggy.

Thomas

Well, since I'm not worried about "flexible", have my doubts about the
"much slower" part, and have no reason to believe that choice of
language automatically makes a routine buggy, I'll go with the tenative
plan I had. Seems you're confirming that I'm understanding things
properly when it comes to "This is what a dict is, and here's how it
works." Now to put finishing touches on my understanding of exactly what
goes on inside that "black box" that is the "dict"
class/type/whatever-it-is-you-Python-types-like-to-call-such-things, and
make some working code out of it...
 
S

Stephen Horne

Dictionaries/"dict" types...

Am I understanding/interpreting correctly when I go with the idea that a
"dict" variable can be looked at as (effectively) two parallel arrays?
Something like this C construct:

struct DictStruct
{
char *KeyArray[<somenumber>];
ValType ValArray[<somenumber>];
}

Sort of, in a sense, but not quite. The types of items within the
dictionary (both key and value) are not constrained. This is no
problem as all Python objects are held using a reference scheme. So
(assuming that a pair-object scheme is used in the dictionary) the
struct will be defined much like...

struct DictPair
{
PyObject *Key;
PyObject *Data;
}

The items will almost certainly be held in a single 'array' of pair
objects.

The Python dictionaries use hashing. I don't know the details of the
hashing used. Anyway, there are many ways to handle dictionary-like
data structures.

C++ has the std::map template, which uses a form of binary tree called
a red-black tree (the red-black refers to a partial balancing system).

Tree-based mappings can be slightly more flexible than hashing (for
instance, a tree can be 'walked' in sorted order - the hashing process
does not preserve relative ordering) but hashing tends to be faster.

An interesting approach to dictionary-like objects is the ternary
tree. This is rather like cascading binary trees - each subtree
identifies one character and cascades into the subtree that finds the
next character. The third link in the 'ternary' is the 'I've
identified one character and moving on to the next' link.

A ternary tree can be faster than hashing - in particular at deciding
that a particular key is not present (hashing needs to calculate a
hash over the whole tree before looking up the result - a ternary tree
search may fail at the first character, without looking at the whole
key).

Digital trees (or tries, IIRC) can be even faster still - but wastes a
lot of memory. In a digital tree, each node has an array subscripted
by character/digit code of pointers to the next subtree.

Multiway trees (most often used on disk for database indexes - B trees
and B+ trees being types of multiway trees) may well be appropriate in
main store on modern desktop machines with caching and virtual memory.

If these options are just too complicated, sorted arrays can be
extremely effective as long as you don't have too many items. The C++
library includes std::vector (a resizable array) and binary search
algorithms which can ease the implementation.


Odds are, if you are translating Python to C++, you should use an
std::map to replace a dictionary. There are some different
assumptions, though. Dictionaries assume that the key is hashable, but
don't require relative comparisons ('<', '<=' etc). The std::map
requires relative comparisons.

This simply arises out of the different approaches - Python using
hashing and C++ using binary trees. Going from Python to C++, 99 times
out of 100 this is not a problem. From C++ to Python there is the
minor issue that Python dictionaries can't be _directly_ iterated in
sorted order, though there are certainly easy ways around that.
 
S

Stephen Horne

Your implementation is O(n) with the number of entries while the Python
implementation is O(log(n)). If n is small then your implementation
might be acceptably fast. Can't you just use the STL hash_map template
class?

Are you sure Python dictionaries are O(log n)? - Remember, they are
based on hashing.

Simple hashing is O(1). Handling of collisions and stuff will affect
that (and I'm no expert on hashing techniques) but I can't help
wondering if you're confusing Pythons hashing with tree-based
techniques.
 
T

Tom Anderson

[...] As such, the details *MUST* get clear if I'm to create equivalent
functionality. If only because I need to read data that was written
(presumably...) by a Python program. Hence, it becomes neccesary to get
down "in the muck" and actually know what's going on behind the nice,
tidy "dict.whosit.get()" (or whatever other operators might be involved)
interface that Python provides.

is this data in 'pickle' format (as opposed to text, XML, or some specific
binary format)? if it is, reading it in C/C++ is going to be very hard
indeed; you might consider writing a small python program to read it and
convert it to something more tractable. if it isn't, you don't need to
'get down in the muck'; you just need to write a C program to do the job
that needs to be done.

if pickling is entirely new to you, then your data probably isn't pickled,
so you don't need to worry about it.
Perhaps, and perhaps not. Guess it depends on your perspective. Having a
stream of insult and invective dumped into your mailbox because you've
made it plain that you don't consider somebody's "pet" language the
be-all and end-all of computer programming is hardly what I'd call a
"feature"...

i'm sorry that happened; i hope you won't think badly of python and
pythoneers because of one moron. discreetly let the relevant people know
who he is and we'll fetch the Comfy Chair ... 8)

tom
 
T

Tom Anderson

Thomas Heller said:
[rewriting a Python program in C/C++]
Something like this C construct:

struct DictStruct
{
char *KeyArray[<somenumber>];
ValType ValArray[<somenumber>];
}
[...]
Similarly, would something like:

struct DictStruct
{
char Key[25];
ValType Value;
DictStruct *PrevDictEntry;
DictStruct *NextDictEntry;
}

[...] end up behaving at least reasonably like a Python "dict"?

Yes.

Except that it would be less flexible, *much* slower, and
probably much more buggy.

Well, since I'm not worried about "flexible", have my doubts about the
"much slower" part,

think again. python's dictionaries use an efficient implementation, are
written in C, and have been tuned over the course of several years. your
sketched design uses a very slow implementation, and is brand new.
and have no reason to believe that choice of language automatically
makes a routine buggy,

it's maturity that matters in this case; you will make mistakes, just like
the python (and STL) implementers did, but they've already had years to
find and fix them.

if you don't fancy using STL (which will be the case if you're steering
clear of C++ - and might well be even if you weren't!), at least go and
read up on how to implement dictionaries; they're a very well-studied
abstract type. hashtables are simple and efficient, and binary trees are
good too. there are more complicated things like ternary trees, red-black
trees, Patricia trees (my personal favourite :) ) and skip lists, but
you'll probably be fine with hashtables. if you can lay your hands on a
good data structures and algorithms textbook, you'll be fine (Sedgewick's
'Algorithms in C' is where i learnt my craft).

tom
 
D

Duncan Booth

Are you sure Python dictionaries are O(log n)? - Remember, they are
based on hashing.

Simple hashing is O(1). Handling of collisions and stuff will affect
that (and I'm no expert on hashing techniques) but I can't help
wondering if you're confusing Pythons hashing with tree-based
techniques.

Python dictionaries are effectively O(1). Dictionaries are never more than
2/3rds full, so on average a dictionary lookup only needs a couple of
probes. Python's collision resolution is designed such that in most cases
lookups that collide on the first attempt will diverge for the next
attempt.

As some of the other posters have said dictionaries in Python have been
tuned over many years, and since the whole of Python is based on fast
dictionary lookups it is very unlikely that the OP will get close to the
same efficiency. That may not matter though as Python dictionaries are
tuned for general use, so a specific application might be able to do better
by losing some of the generality.

Some of the tuning extends beyond dictionaries, for example strings are
commonly used as dictionary keys, and Python avoids recalculating hash
values more than once for each string. It is unlikely that a C or C++
programmer would go to the effort of using a string type that cached hash
values throughout their application, so for many uses, string lookups in
dictionaries will almost certainly be slower (although there could be gains
elsewhere).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top