save dictionary to a file without brackets.

Chris Angelico · Aug 9, 2012

O(n) for all other entries in the dict which suffer a hash collision
with the searched entry.

True, a sensible choice of hash function will reduce n to 1 in common
cases, but it becomes an important consideration for larger datasets.

Click to expand...

I'm glad you're wrong for CPython's dictionaries. The only time the
lookup would degenerate to O[n] would be if the hash table had only one
slot. CPython sensibly increases the hash table size when it becomes
too small for efficiency.

Where have you seen dictionaries so poorly implemented?

In vanilla CPython up to version (I think) 3.3, where it's possible to
DoS the hash generator. Hash collisions are always possible, just
ridiculously unlikely unless deliberately exploited.

(And yes, I know an option was added to older versions to randomize
the hashes there too. It's not active by default, so "vanilla CPython"
is still vulnerable.)

ChrisA

Andrew Cooper · Aug 9, 2012

Sligtly off topic, but looking up a value in a dictionary is actually
O(n) for all other entries in the dict which suffer a hash collision
with the searched entry.

True, a sensible choice of hash function will reduce n to 1 in common
cases, but it becomes an important consideration for larger datasets.

~Andrew

Click to expand...

I'm glad you're wrong for CPython's dictionaries. The only time the
lookup would degenerate to O[n] would be if the hash table had only one
slot. CPython sensibly increases the hash table size when it becomes
too small for efficiency.

Where have you seen dictionaries so poorly implemented?

Different n, which I should have made more clear. I was using it for
consistency with O() notation. My statement was O(n) where n is the
number of hash collisions.

The choice of hash algorithm (or several depending on the
implementation) should specifically be chosen to reduce collisions to
aid in efficient space utilisation and lookup times, but any
implementation must allow for collisions. There are certainly runtime
methods of improving efficiency using amortized operations.

As for poor implementations,

class Foo(object):

...

def __hash__(self):
return 0

I seriously found that in some older code I had the misfortune of
reading. It didn't remain in that state for long.

~Andrew

Chris Angelico · Aug 10, 2012

On 08/09/2012 06:03 PM, Andrew Cooper wrote:
I'm glad you're wrong for CPython's dictionaries. The only time the
lookup would degenerate to O[n] would be if the hash table had only one
slot. CPython sensibly increases the hash table size when it becomes
too small for efficiency.

Where have you seen dictionaries so poorly implemented?

Click to expand...

PHP?

http://www.phpclasses.org/blog/post/171-PHP-Vulnerability-May-Halt-Millions-of-Servers.html

That's the same hash collision attack that I alluded to above, and it
strikes *many* language implementations. Most released a patch fairly
quickly and quietly (Pike, Lua, V8 (JavaScript/ECMAScript), PHP), but
CPython dared not, on account of various applications depending on
hash order (at least for tests). It's not (for once) an indictment of
PHP (maybe that should be an "inarrayment"?), it's a consequence of a
hashing algorithm that favored simplicity over cryptographic
qualities.

(It feels weird to be defending PHP...)

ChrisA

Roy Smith · Aug 10, 2012

Andrew Cooper said:
As for poor implementations,

class Foo(object):
def __hash__(self):
return 0

I seriously found that in some older code I had the misfortune of
reading.

Python assumes you are a consenting adult. If you wish to engage in
activities which are hazardous to your health, so be it. But then
again, you could commit this particular stupidity just as easily in C++
or any other language which lets you define your own hash() function.

Chris Angelico · Aug 10, 2012

Python assumes you are a consenting adult. If you wish to engage in
activities which are hazardous to your health, so be it.

.... you mean, Python lets you make a hash of it?

*ducks for cover*

ChrisA

Roy Smith · Aug 10, 2012

Chris Angelico said:
... you mean, Python lets you make a hash of it?

Only if you order it with spam, spam, spam, spam, spam, spam, and spam.

Mark Lawrence · Aug 10, 2012

Only if you order it with spam, spam, spam, spam, spam, spam, and spam.

Now now gentlemen we're getting slightly off topic here and wouldn't
want to upset the people who insist on staying on topic. Or would we?

Dave Angel · Aug 10, 2012

On 09/08/2012 22:34, Roman Vashkevich wrote:
Actually, they are different.
Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
Dict uses hashing to get a value from the dict and this is why it's O(1).

Sligtly off topic, but looking up a value in a dictionary is actually
O(n) for all other entries in the dict which suffer a hash collision
with the searched entry.

True, a sensible choice of hash function will reduce n to 1 in common
cases, but it becomes an important consideration for larger datasets.

~Andrew

Click to expand...

I'm glad you're wrong for CPython's dictionaries. The only time the
lookup would degenerate to O[n] would be if the hash table had only one
slot. CPython sensibly increases the hash table size when it becomes
too small for efficiency.

Where have you seen dictionaries so poorly implemented?

Click to expand...

Different n, which I should have made more clear. I was using it for
consistency with O() notation. My statement was O(n) where n is the
number of hash collisions.

That's a little like doing a survey, and reporting the results as
showing that 100% of the women hit their husbands, among the population
of women who hit their husbands.

In your original message, you already stated the assumption that a
proper hash algorithm would be chosen, then went on to apparently claim
that large datasets would still have an order n problem. That last is
what I was challenging.

The rest of your message here refers to client code, not the system.

Dave Angel · Aug 10, 2012

O(n) for all other entries in the dict which suffer a hash collision
with the searched entry.

True, a sensible choice of hash function will reduce n to 1 in common
cases, but it becomes an important consideration for larger datasets.

Click to expand...

I'm glad you're wrong for CPython's dictionaries. The only time the
lookup would degenerate to O[n] would be if the hash table had only one
slot. CPython sensibly increases the hash table size when it becomes
too small for efficiency.

Where have you seen dictionaries so poorly implemented?

Click to expand...

In vanilla CPython up to version (I think) 3.3, where it's possible to
DoS the hash generator. Hash collisions are always possible, just
ridiculously unlikely unless deliberately exploited.

(And yes, I know an option was added to older versions to randomize
the hashes there too. It's not active by default, so "vanilla CPython"
is still vulnerable.)

ChrisA

Thank you to you and others, who have corrected my over-general
response. I was not intending to claim anything about either a
deliberate DOS, nor a foolishly chosen hash function. But the message I
was replying to seemed to me to claim that for large datasets, even a
good hash algorithm would end up giving O(n) performance.

Tim Chase · Aug 10, 2012

Now now gentlemen we're getting slightly off topic here and wouldn't
want to upset the people who insist on staying on topic. Or would we?

We apologise for the off-topicness in the thread. Those responsible
have been sacked...

-tkc

Dave Angel · Aug 10, 2012

We apologise for the off-topicness in the thread. Those responsible
have been sacked...

-tkc

Paper or plastic?

Chris Angelico · Aug 10, 2012

We apologise for the off-topicness in the thread. Those responsible
have been sacked...

So if you take every mapping variable in your program and name them
"dFoo", "dBar", "dQuux", etc, for "dict"... would that be a dirty
Hungarian dictionary?

Excuse me, I'll go and sack myself now.

ChrisA

88888 Dihedral · Aug 10, 2012

Andrew Cooperæ–¼ 2012å¹´8æœˆ10æ—¥æ˜ŸæœŸäº”UTC+8ä¸Šåˆ6æ™‚03åˆ†26ç§’å¯«é“ï¼š

Sligtly off topic, but looking up a value in a dictionary is actually

O(n) for all other entries in the dict which suffer a hash collision

with the searched entry.

True, a sensible choice of hash function will reduce n to 1 in common

cases, but it becomes an important consideration for larger datasets.

~Andrew

This is the classical problem of storing the hash collision items one by one.
Those items should be stored by some order.

Steven D'Aprano · Aug 10, 2012

We apologise for the off-topicness in the thread. Those responsible
have been sacked...

Sacked? They were beaten to death with a large halibut!

Mark Lawrence · Aug 10, 2012

Sacked? They were beaten to death with a large halibut!

Well whatever you do *DON'T* mention Cython. I mentioned it just now but
I think I've got away with it.

Roy Smith · Aug 10, 2012

Mark Lawrence said:
Well whatever you do *DON'T* mention Cython. I mentioned it just now but
I think I've got away with it.

What if I spell it Kython?

Mark Lawrence · Aug 10, 2012

What if I spell it Kython?

What a silly bunt!!!

Dennis Lee Bieber · Aug 10, 2012

Sacked? They were beaten to death with a large halibut!

I think the thread is floundering...

88888 Dihedral · Aug 10, 2012

Dave Angelæ–¼ 2012å¹´8æœˆ10æ—¥æ˜ŸæœŸäº”UTC+8ä¸Šåˆ5æ™‚47åˆ†45ç§’å¯«é“ï¼š

Actually, they are different.

Click to expand...

Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.

Click to expand...

Dict uses hashing to get a value from the dict and this is why it's O(1).

Click to expand...

Sure, that's why

for key in dict:

print key[0], key[1], dict[key]

is probably slower than

for (edge1, edge2), cost in d.iteritems(): # or .items()

print edge1, edge2, cost

So, the latter is both faster and easier to read. Why are you arguing against it?

Also, please stop top-posting. It's impolite here, and makes it much harder to figure out who is saying what, in what order.

--

DaveA

OK, lets estimate the hash colision rate first.

For those items hashed to the same key, I'll store a sorted list with a
known lenth m to be accessed in O(LOG(M)).

Of couse another hash can be attatched.

88888 Dihedral · Aug 10, 2012

Dave Angelæ–¼ 2012å¹´8æœˆ10æ—¥æ˜ŸæœŸäº”UTC+8ä¸Šåˆ5æ™‚47åˆ†45ç§’å¯«é“ï¼š

Actually, they are different.

Click to expand...

Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.

Click to expand...

Dict uses hashing to get a value from the dict and this is why it's O(1).

Click to expand...

Sure, that's why

for key in dict:

print key[0], key[1], dict[key]

is probably slower than

for (edge1, edge2), cost in d.iteritems(): # or .items()

print edge1, edge2, cost

So, the latter is both faster and easier to read. Why are you arguing against it?

Also, please stop top-posting. It's impolite here, and makes it much harder to figure out who is saying what, in what order.

--

DaveA

OK, lets estimate the hash colision rate first.

For those items hashed to the same key, I'll store a sorted list with a
known lenth m to be accessed in O(LOG(M)).

Of couse another hash can be attatched.

Reading/writing a dictionary to file problem :(	1	Mar 31, 2020
Save instance when rotating	0	Sep 27, 2023
Save instance when rotating screen	1	Sep 27, 2023
Require code that starts within the middle of the for loop ranges.	3	Oct 31, 2022
Dictionary help	1	Feb 18, 2014
Need help on brackets to make game character jump, move right and left etc - urgent help	1	Oct 24, 2022
How to save textBox values into a xml-file(with naming an choosing directory)?	1	Aug 23, 2022
Dictionary and List	1	Apr 26, 2021

save dictionary to a file without brackets.

Chris Angelico

Andrew Cooper

Chris Angelico

Roy Smith

Chris Angelico

Roy Smith

Mark Lawrence

Dave Angel

Dave Angel

Tim Chase

Dave Angel

Chris Angelico

88888 Dihedral

Steven D'Aprano

Mark Lawrence

Roy Smith

Mark Lawrence

Dennis Lee Bieber

88888 Dihedral

88888 Dihedral

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads