Disable automatic interning

G

George Sakkis

Is there a way to turn off (either globally or explicitly per
instance) the automatic interning optimization that happens for small
integers and strings (and perhaps other types) ? I tried several
workarounds but nothing worked:
True


George
 
R

R. David Murray

George Sakkis said:
Is there a way to turn off (either globally or explicitly per
instance) the automatic interning optimization that happens for small
integers and strings (and perhaps other types) ? I tried several
workarounds but nothing worked:

No. It's an implementation detail.

What use case do you have for wanting to disable it?
 
G

George Sakkis

No.  It's an implementation detail.

What use case do you have for wanting to disable it?

I'm working on some graph generation problem where the node identity
is significant (e.g. "if node1 is node2: # do something) but ideally I
wouldn't want to impose any constraint on what a node is (i.e. require
a base Node class). It's not a show stopper, but it would be
problematic if something broke when nodes happen to be (small)
integers or strings.

George
 
M

Martin v. Löwis

Is there a way to turn off (either globally or explicitly per
I'm working on some graph generation problem where the node identity
is significant (e.g. "if node1 is node2: # do something) but ideally I
wouldn't want to impose any constraint on what a node is (i.e. require
a base Node class). It's not a show stopper, but it would be
problematic if something broke when nodes happen to be (small)
integers or strings.

In essence, yes, there is a way for turning off the automatic interning
optimization: in your API, explicitly wrap all nodes with another
object, and use *that* object as the node.

Regards,
Martin
 
T

Terry Reedy

And explicitly defined as such and definitely hardcoded, and used by the
interpreter itself, and for good reason. After starting up 3.0.1580
Subtracting the extra two ref for each call and the two needed for the
two cached objects, that is 1200 ints *not* allocated on startup, plus
hundreds more for the other values.
I'm working on some graph generation problem where the node identity
is significant (e.g. "if node1 is node2: # do something) but ideally I
wouldn't want to impose any constraint on what a node is (i.e. require
a base Node class). It's not a show stopper, but it would be
problematic if something broke when nodes happen to be (small)
integers or strings.

I do not get this. Regardless of class, if you want to compare by
identity, each node should be a unique object with a unique value. Auto
interning makes that easier, not harder. Robust code would not,
however, depend on that help. (IE, it would explicitly make sure that
the 'equal' entries in the edge matrix or adjacency lists were identical.)

tjr
 
D

Daniel Fetchinson

Is there a way to turn off (either globally or explicitly per
I'm working on some graph generation problem where the node identity
is significant (e.g. "if node1 is node2: # do something) but ideally I
wouldn't want to impose any constraint on what a node is (i.e. require
a base Node class). It's not a show stopper, but it would be
problematic if something broke when nodes happen to be (small)
integers or strings.

But if two different nodes are both identified by, let's say the
string 'x' then you surely consider this an error anyway, don't you?
What's the point of identifying two different nodes by the same
string? If you use different strings (or numbers, even small ones) for
different nodes the whole problem will not arise.

Cheers,
Daniel
 
G

George Sakkis

But if two different nodes are both identified by, let's say the
string 'x' then you surely consider this an error anyway, don't you?
What's the point of identifying two different nodes by the same
string?

In this particular problem the graph represents web surfing behavior
and in the simplest case the nodes are plain URLs. Now suppose a
session log has recorded the URL sequence [u1, u2, u1]. There are two
scenarios for the second occurrence of u1: it's either caused by a
forward action (e.g. clicking on a link to u1 from page u2) or a back
action (i.e. the user clicked the back button). If this information is
available, it makes sense to differentiate them. One way to do so is
to represent the result of every forward action with a brand-new node
and the result of a back action with an existing node. So even though
the state of the two occurrences of u1 are the same, they are not
necessarily represented by a single node.

If it was always possible to make a copy of a string instance (say,
with a str.new() classmethod), then it would be sufficient to pass "map
(str.new, session_urls)" to the graph generator. Equality would still
work as before but all instances in the sequence would be guaranteed
to be unique. Thankfully, as Martin mentioned, this is easy even
without str.new(), simply by wrapping each url in an instance of a
small Node class.

George
 
D

Daniel Fetchinson

I'm working on some graph generation problem where the node identity
But if two different nodes are both identified by, let's say the
string 'x' then you surely consider this an error anyway, don't you?
What's the point of identifying two different nodes by the same
string?

In this particular problem the graph represents web surfing behavior
and in the simplest case the nodes are plain URLs. Now suppose a
session log has recorded the URL sequence [u1, u2, u1]. There are two
scenarios for the second occurrence of u1: it's either caused by a
forward action (e.g. clicking on a link to u1 from page u2) or a back
action (i.e. the user clicked the back button). If this information is
available, it makes sense to differentiate them. One way to do so is
to represent the result of every forward action with a brand-new node
and the result of a back action with an existing node. So even though
the state of the two occurrences of u1 are the same, they are not
necessarily represented by a single node.

Okay, I think I understand what you want to accomplish but in this
case I would use a different data structure such that u1 is always
represented by the same string, same identifier, whatever, let's say
'x', and then I'd be happy with 'x' is 'x' being always True. The
relationship between u1 and u2, and u2 and u1 would be represented by
additional data so the difference between the first u1 and the second
u1 would be clear once this additional data is available, because it
would be used in the comparison explicitly.

Cheers,
Daniel
 
H

Hrvoje Niksic

George Sakkis said:
I'm working on some graph generation problem where the node identity
is significant (e.g. "if node1 is node2: # do something) but ideally I
wouldn't want to impose any constraint on what a node is

I'm not sure if it helps in your case, but you can easily turn off the
optimization by subclassing from the string type:
.... pass
....False

Since mystr can have additional functionality on top of str, the
caching doesn't apply to the subclass instances.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,169
Latest member
ArturoOlne
Top