Disable automatic interning

George Sakkis · Mar 18, 2009

Is there a way to turn off (either globally or explicitly per
instance) the automatic interning optimization that happens for small
integers and strings (and perhaps other types) ? I tried several
workarounds but nothing worked:
True

George

R. David Murray · Mar 18, 2009

George Sakkis said:
Is there a way to turn off (either globally or explicitly per
instance) the automatic interning optimization that happens for small
integers and strings (and perhaps other types) ? I tried several
workarounds but nothing worked:

No. It's an implementation detail.

What use case do you have for wanting to disable it?

George Sakkis · Mar 18, 2009

No. It's an implementation detail.

What use case do you have for wanting to disable it?

I'm working on some graph generation problem where the node identity
is significant (e.g. "if node1 is node2: # do something) but ideally I
wouldn't want to impose any constraint on what a node is (i.e. require
a base Node class). It's not a show stopper, but it would be
problematic if something broke when nodes happen to be (small)
integers or strings.

George

Martin v. Löwis · Mar 18, 2009

Is there a way to turn off (either globally or explicitly per

I'm working on some graph generation problem where the node identity
is significant (e.g. "if node1 is node2: # do something) but ideally I
wouldn't want to impose any constraint on what a node is (i.e. require
a base Node class). It's not a show stopper, but it would be
problematic if something broke when nodes happen to be (small)
integers or strings.

In essence, yes, there is a way for turning off the automatic interning
optimization: in your API, explicitly wrap all nodes with another
object, and use *that* object as the node.

Regards,
Martin

Terry Reedy · Mar 18, 2009

And explicitly defined as such and definitely hardcoded, and used by the
interpreter itself, and for good reason. After starting up 3.0.1580
Subtracting the extra two ref for each call and the two needed for the
two cached objects, that is 1200 ints *not* allocated on startup, plus
hundreds more for the other values.

I'm working on some graph generation problem where the node identity
is significant (e.g. "if node1 is node2: # do something) but ideally I
wouldn't want to impose any constraint on what a node is (i.e. require
a base Node class). It's not a show stopper, but it would be
problematic if something broke when nodes happen to be (small)
integers or strings.

I do not get this. Regardless of class, if you want to compare by
identity, each node should be a unique object with a unique value. Auto
interning makes that easier, not harder. Robust code would not,
however, depend on that help. (IE, it would explicitly make sure that
the 'equal' entries in the edge matrix or adjacency lists were identical.)

tjr

Daniel Fetchinson · Mar 18, 2009

Is there a way to turn off (either globally or explicitly per

I'm working on some graph generation problem where the node identity
is significant (e.g. "if node1 is node2: # do something) but ideally I
wouldn't want to impose any constraint on what a node is (i.e. require
a base Node class). It's not a show stopper, but it would be
problematic if something broke when nodes happen to be (small)
integers or strings.

But if two different nodes are both identified by, let's say the
string 'x' then you surely consider this an error anyway, don't you?
What's the point of identifying two different nodes by the same
string? If you use different strings (or numbers, even small ones) for
different nodes the whole problem will not arise.

Cheers,
Daniel

George Sakkis · Mar 19, 2009

But if two different nodes are both identified by, let's say the
string 'x' then you surely consider this an error anyway, don't you?
What's the point of identifying two different nodes by the same
string?

In this particular problem the graph represents web surfing behavior
and in the simplest case the nodes are plain URLs. Now suppose a
session log has recorded the URL sequence [u1, u2, u1]. There are two
scenarios for the second occurrence of u1: it's either caused by a
forward action (e.g. clicking on a link to u1 from page u2) or a back
action (i.e. the user clicked the back button). If this information is
available, it makes sense to differentiate them. One way to do so is
to represent the result of every forward action with a brand-new node
and the result of a back action with an existing node. So even though
the state of the two occurrences of u1 are the same, they are not
necessarily represented by a single node.

If it was always possible to make a copy of a string instance (say,
with a str.new() classmethod), then it would be sufficient to pass "map
(str.new, session_urls)" to the graph generator. Equality would still
work as before but all instances in the sequence would be guaranteed
to be unique. Thankfully, as Martin mentioned, this is easy even
without str.new(), simply by wrapping each url in an instance of a
small Node class.

George

George Sakkis · Mar 19, 2009

this is completely normal (i do exactly this all the time), BUT you should
use "==", not "is".

Typically, but not always; for example check out the identity map [1]
pattern used in SQLAlchemy [2].

George

[1] http://martinfowler.com/eaaCatalog/identityMap.html
[2] http://www.sqlalchemy.org/docs/05/session.html#what-does-the-session-do

Daniel Fetchinson · Mar 19, 2009

I'm working on some graph generation problem where the node identity

But if two different nodes are both identified by, let's say the
string 'x' then you surely consider this an error anyway, don't you?
What's the point of identifying two different nodes by the same
string?

Click to expand...

In this particular problem the graph represents web surfing behavior
and in the simplest case the nodes are plain URLs. Now suppose a
session log has recorded the URL sequence [u1, u2, u1]. There are two
scenarios for the second occurrence of u1: it's either caused by a
forward action (e.g. clicking on a link to u1 from page u2) or a back
action (i.e. the user clicked the back button). If this information is
available, it makes sense to differentiate them. One way to do so is
to represent the result of every forward action with a brand-new node
and the result of a back action with an existing node. So even though
the state of the two occurrences of u1 are the same, they are not
necessarily represented by a single node.

Okay, I think I understand what you want to accomplish but in this
case I would use a different data structure such that u1 is always
represented by the same string, same identifier, whatever, let's say
'x', and then I'd be happy with 'x' is 'x' being always True. The
relationship between u1 and u2, and u2 and u1 would be represented by
additional data so the difference between the first u1 and the second
u1 would be clear once this additional data is available, because it
would be used in the comparison explicitly.

Cheers,
Daniel

Hrvoje Niksic · Mar 23, 2009

George Sakkis said:
I'm working on some graph generation problem where the node identity
is significant (e.g. "if node1 is node2: # do something) but ideally I
wouldn't want to impose any constraint on what a node is

I'm not sure if it helps in your case, but you can easily turn off the
optimization by subclassing from the string type:
.... pass
....False

Since mystr can have additional functionality on top of str, the
caching doesn't apply to the subclass instances.

Automatic memoization!!	6	Jul 28, 2008
Automatic/Manual memory allocation	2	Feb 19, 2010
ANN: eGenix PyRun - One file Python Runtime 1.2.0	0	Apr 30, 2013
'Needless flexibilities' and structured records [very long]	10	Mar 15, 2013
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
Pythonification of the asterisk-based collection packing/unpacking syntax	121	Dec 17, 2011
ANN: Sequel 3.3.0 Released	0	Aug 3, 2009
7.0 wishlist?	321	Oct 29, 2008

Disable automatic interning

George Sakkis

R. David Murray

George Sakkis

Martin v. Löwis

Terry Reedy

Daniel Fetchinson

George Sakkis

George Sakkis

Daniel Fetchinson

Hrvoje Niksic

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads