Set & Frozenset?

Hans Larsen · Mar 8, 2009

Could you help me ?
How could I "take" an elemment from a set or a frozenset .-) ?

From a string (unicode? Python<3), or from a tuple,or
from a list: Element by index or slice.
From a dict: by key.
But what concerning a set or frozenset!

hope somebody can help!

Diez B. Roggisch · Mar 8, 2009

Hans said:
Could you help me ?
How could I "take" an elemment from a set or a frozenset .-) ?

From a string (unicode? Python<3), or from a tuple,or
from a list: Element by index or slice.
From a dict: by key.
But what concerning a set or frozenset!

hope somebody can help!

You iterate over them. If you only want one value, use

iter(the_set).next()

Diez

Tim Golden · Mar 8, 2009

Diez said:
You iterate over them. If you only want one value, use

iter(the_set).next()

or the_set.pop ()

TJG

Tim Golden · Mar 8, 2009

Tim said:
or the_set.pop ()

Which will, in addition, remove it from the set.
(May not be what you want

).

TJG

Alan G Isaac · Mar 9, 2009

Hans said:
You iterate over them. If you only want one value, use
iter(the_set).next()

I recall a claim that

for result in myset: break

is the most efficient way to get one result.
Is this right? (It seems nearly the same.)

Alan Isaac

Matt Nordhoff · Mar 9, 2009

Alan said:
I recall a claim that

for result in myset: break

is the most efficient way to get one result.
Is this right? (It seems nearly the same.)

Alan Isaac

Checking Python 2.5 on Linux, your solution is much faster, but seeing
as they both come in under a microsecond, it hardly matters.
--

Lie Ryan · Mar 10, 2009

Matt said:
Checking Python 2.5 on Linux, your solution is much faster, but seeing
as they both come in under a microsecond, it hardly matters.

It's unexpected...

myset=iter(myset)')
0.49165520000002516

0.32933007999997699

I'd never expect that for-loop assignment is even faster than a
precreated iter object (the second test)... but I don't think this
for-looping variable leaking behavior is guaranteed, isn't it?

Note: the second one exhausts the iter object.

Terry Reedy · Mar 10, 2009

I'd never expect that for-loop assignment is even faster than a
precreated iter object (the second test)... but I don't think this
for-looping variable leaking behavior is guaranteed, isn't it?

It is an intentional, documented feature:

"Names in the target list are not deleted when the loop is finished, but
if the sequence is empty, it will not have been assigned to at all by
the loop."

Paul Rubin · Mar 10, 2009

Terry Reedy said:
It is an intentional, documented feature: ...

I prefer thinking of it as a documented bug. It is fixed in 3.x.
I usually avoid the [... for x in xiter] listcomp syntax in favor of
list(... for x in xiter) just as an effort to be a bit less bug-prone.

Terry Reedy · Mar 10, 2009

Paul said:
I prefer thinking of it as a documented bug. It is fixed in 3.x.

Nope to both. We are talking about for-loop statements.

R. David Murray · Mar 10, 2009

Lie Ryan said:
It's unexpected...

0.32933007999997699

I'd never expect that for-loop assignment is even faster than a
precreated iter object (the second test)... but I don't think this
for-looping variable leaking behavior is guaranteed, isn't it?

My guess would be that what's controlling the timing here is
name lookup. Three in the first example, two in the second,
and one in the third.

Lie Ryan · Mar 11, 2009

R. David Murray said:
My guess would be that what's controlling the timing here is
name lookup. Three in the first example, two in the second,
and one in the third.

You got it:
myset=iter(myset).next')
0.26465903999999796

----------------------

The following is a complete benchmark:

number=10000000)
8.5145002000000432

myset=iter(myset)', number=10000000)
4.5509802800000898

number=10000000)
2.9994213600000421

myset=iter(myset).next', number=10000000)
2.2228832400001011

----------------------
I also performed additional timing for overhead:

Local name lookup:1.1086400799999865

Global name lookup:1.8149410799999259

Attribute lookup:myset=iter(myset)', number=10000000)
3.3011333999997987

Combined multiple name lookup that troubled first testnumber=10000000)
6.5599374800000305

Creating iterables:4.259406719999788

----------------------
So adjusting the overheads:

Attribute lookup:myset=iter(myset)', number=10000000)
3.3011333999997987
The timing for Attribute also include a local name lookup (myset), so
the real attribute lookup time shold be:
3.3011333999997987 - 1.1086400799999865 = 2.1924933199998122

Creating iterables:4.259406719999788
Creating iterable involve global name lookup, so the real time should be:
4.259406719999788 - 1.8149410799999259 = 2.4444656399998621

----------------------
To summarize the adjusted overheads:

Local name lookup: 1.1086400799999865
Global name lookup: 1.8149410799999259
Attribute lookup: 2.1924933199998122
Creating iterables: 2.4444656399998621

----------------------
Back to the problem, now we'll be adjusting the timing of each codes:
'res=iter(myset).next()': 8.5145002000000432
Adjusting with the "Combined multiple name lookup"
8.5145002000000432 - 6.5599374800000305 = 1.9545627200000126
Another way to do the adjustment:
Adjusting global name lookup (iter):
8.5145002000000432 - 1.8149410799999259 = 6.6995591200001172
Adjusting iterable creation:
6.6995591200001172 - 2.4444656399998621 = 4.2550934800002551
Adjusting attribute lookup:
4.2550934800002551 - 2.1924933199998122 = 2.0626001600004429

'res=myset.next()': 4.5509802800000898
Adjusting with |unadjusted| attribute lookup:
4.5509802800000898 - 3.3011333999997987 = 1.2498468800002911
Another way to do the adjustment:
Adjusting with local name lookup:
4.5509802800000898 - 1.1086400799999865 = 3.4423402000001033
Adjusting with attribute lookup:
3.4423402000001033 - 2.1924933199998122 = 1.2498468800002911

'for res in myset: break': 2.9994213600000421
Adjusting for local name lookup (myset):
2.9994213600000421 - 1.1086400799999865 = 1.8907812800000556

'res=myset()': 2.2228832400001011
Adjusting for local name lookup
2.2228832400001011 - 1.1086400799999865 = 1.1142431600001146

----------------------

To summarize:
'res=iter(myset).next()': 1.9545627200000126 / 2.0626001600004429
'res=myset.next()': 1.2498468800002911 / 1.2498468800002911
'for res in myset: break': 1.8907812800000556
'res=myset()': 1.1142431600001146

----------------------

To conclude, 'for res in myset: break' is actually not much faster than
'res=iter(myset).next()' except the former saves a lot of name lookup.
The problem with 'res=iter(myset).next()' is too many name lookup and
creating iter() object.

The fastest method is 'res=myset()' which eliminates the name lookup, it
is twice as fast as any other methods after all the overheads are
eliminated.

DISCLAIMER: I cannot guarantee there aren't any mistake.

PS: The result of the benchmark must be taken with a grain of salt. It
is only apparent after 10000000 (10**7) iteration, which means a second
difference is only 10**-7 difference in reality.

__eq__() inconvenience when subclassing set	9	Oct 28, 2009
Optimising literals away	11	Aug 30, 2010
ANN: A new version (0.3.4) of the Python module which wraps GnuPG hasbeen released.	0	Jun 5, 2013
Trying to use clangd with VSCodium, CMake_World_COMPILER not set	1	Nov 4, 2024
@PyNoobs: The Fundamental Five Built-in Functions, and Beyond!	4	Jul 26, 2011
Rich Text Format (RTF) Document Builder in C++: Code and Features	0	Sep 28, 2025
Javascript set language function issue	2	Nov 24, 2024
Builtin classes list, set, dict reimplemented via B-trees	1	Sep 14, 2005

Set & Frozenset?

Hans Larsen

Diez B. Roggisch

Tim Golden

Tim Golden

Alan G Isaac

Matt Nordhoff

Lie Ryan

Terry Reedy

Paul Rubin

Terry Reedy

R. David Murray

Lie Ryan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads