Characters contain themselves?

  • Thread starter WENDUM Denis 47.76.11 (agent)
  • Start date
W

WENDUM Denis 47.76.11 (agent)

While testing recursive algoritms dealing with generic lists I stumbled
on infinite loops which were triggered by the fact that (at least for my
version of Pyton) characters contain themselves.See session:

system prompt% python
Python 2.3.5 (#2, Feb 9 2005, 00:38:15)
[GCC 3.3.5 (Debian 1:3.3.5-8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
....
>>> 'a' is 'a' True
>>> 'a' in 'a' True
>>> 'a' in ['a'] True
>>> ....

Leading to paradoxes and loops objects which contain themselves (and
other kinds of monsters) are killed in set theory with the Axiom of
Foundation:=)

But let's go back to more earthly matters. I couldn't find any clue in a
python FAQ after having googled with the following "Python strings FAQ"
about why this design choice and how to avoid falling in this trap
without having to litter my code everywhere with tests for stringiness
each time I process a generic list of items.

Any hints would be appreciated.
 
R

Rene Pijlman

WENDUM Denis 47.76.11 (agent):
While testing recursive algoritms dealing with generic lists I stumbled
on infinite loops which were triggered by the fact that (at least for my
version of Pyton) characters contain themselves.

No, strings contain characters. And 'a' is a string consisting of one
character.

"The items of a string are characters. There is no separate character
type; a character is represented by a string of one item."
http://docs.python.org/ref/types.html

(One item of what type, one might ask)
 
G

gry

In fact, not just characters, but strings contain themselves:
True

This is a very nice(i.e. clear and concise) shortcut for:
True

Which I always found contorted and awkward.

Could you be a bit more concrete about your complaint?

-- George
[Thanks, I did enjoy looking up the Axiom of Foundation!]
 
M

Mark Jackson

Rene Pijlman said:
WENDUM Denis 47.76.11 (agent):

No, strings contain characters. And 'a' is a string consisting of one
character.

"The items of a string are characters. There is no separate character
type; a character is represented by a string of one item."
http://docs.python.org/ref/types.html

(One item of what type, one might ask)

Good point. ". . .represented by a string of length one" would be
better.
 
B

Bruno Desthuilliers

WENDUM Denis 47.76.11 (agent) a écrit :
While testing recursive algoritms dealing with generic lists I stumbled
on infinite loops which were triggered by the fact that (at least for my
version of Pyton) characters contain themselves.

There is *no* character type in Python. 'a' is a string, and of course
the string 'a' is a substring of string 'a'.
 
S

Steven D'Aprano

While testing recursive algoritms dealing with generic lists I stumbled
on infinite loops which were triggered by the fact that (at least for my
version of Pyton) characters contain themselves.

A "character" is just a string of length 1.


[snip]
Leading to paradoxes and loops objects which contain themselves (and
other kinds of monsters) are killed in set theory with the Axiom of
Foundation:=)

Let me see if I understand you... you would like behaviour like this?
False # not the real result

Ain't going to happen.

But let's go back to more earthly matters. I couldn't find any clue in a
python FAQ after having googled with the following "Python strings FAQ"
about why this design choice and how to avoid falling in this trap
without having to litter my code everywhere with tests for stringiness
each time I process a generic list of items.

Any hints would be appreciated.

I think that if you want to do work with sets, use sets, not strings.

In Python 2.3, you need:

In Python 2.4, Sets (note the initial capital) are a built-in.
 
S

Steven D'Aprano

But let's go back to more earthly matters. I couldn't find any clue in a
python FAQ after having googled with the following "Python strings FAQ"
about why this design choice and how to avoid falling in this trap
without having to litter my code everywhere with tests for stringiness
each time I process a generic list of items.

Any hints would be appreciated.

Here is another suggestion.

If you absolutely must use strings rather than sets or lists, create a
custom string class:
.... def __contains__(self, other):
.... if self == other: return False
.... return super(Mystr, self).__contains__(other)
....False


Hope this helps.
 
A

Atanas Banov

congratulations for (ostensibly) discovering the Barber's paradox (if
the village barber shaves all and only those who don't shave
tehmselves, who shaves the barber?
http://en.wikipedia.org/wiki/Barber_paradox) in python ! :-D

as far as i see it, you complaint is not just that any string X
contains itself but that string X can contain another string Y (i.e.
object of class string to contain another of class string) - where you
understand "contain" as per the operator "in" to be set-theory
operator, when in fact the meaning put for strings is instead "has a
substring".

therefore your grudge is not just with
'a' in 'a'
but also with
'a' in 'abcd'

here is excerpt from the reference manual:
----------------------------------------
The operators in and not in test for set membership. x in s evaluates
to true if x is a member of the set s, and false otherwise. x not in s
returns the negation of x in s. The set membership test has
traditionally been bound to sequences; an object is a member of a set
if the set is a sequence and contains an element equal to that object.
However, it is possible for an object to support membership tests
without being a sequence. In particular, dictionaries support
membership testing as a nicer way of spelling key in dict; other
mapping types may follow suit.

For the list and tuple types, x in y is true if and only if there
exists an index i such that x == y is true.

For the Unicode and string types, x in y is true if and only if x is a
substring of y. An equivalent test is y.find(x) != -1. Note, x and y
need not be the same type; consequently, u'ab' in 'abc' will return
True. Empty strings are always considered to be a substring of any
other string, so "" in "abc" will return True.
----------------------------------------

it is apparent "in" was overriden for strings for convenience's sake,
not to get freaky on the therory of sets.

what can you do about it? well, you can check for string type
specifically but there are no guarantees in life: someone else can
define new type with "in" that behaves like that: say "interval(x,y)",
where "interval(x,y) in interval(a,b)" checks if [x,y] is a
sub-interval of [a,b] - very intuitive - but there you have the problem
again!

or you can specifically check if the objects are from a "semanthically
supported group" of classes - but that will hamper authomatic extension
by introducing new types.

- Nas
While testing recursive algoritms dealing with generic lists I stumbled
'a' is 'a' True
'a' in 'a' True
'a' in ['a'] True
....

Leading to paradoxes and loops objects which contain themselves (and
other kinds of monsters) are killed in set theory with the Axiom of
Foundation:=)

But let's go back to more earthly matters. I couldn't find any clue in a
python FAQ after having googled with the following "Python strings FAQ"
about why this design choice and how to avoid falling in this trap
without having to litter my code everywhere with tests for stringiness
each time I process a generic list of items.

Any hints would be appreciated.
 
K

Kent Johnson

WENDUM said:
While testing recursive algoritms dealing with generic lists I stumbled
on infinite loops which were triggered by the fact that (at least for my
version of Pyton) characters contain themselves.See session:

Strings are sequences and this is a problem for recursive functions that
process nested lists. You do need to test for stringiness somehow. IIRC
the usual solutions are

- explicit test for string: isinstance(xx, basestring)

- explicit test for list: isinstance(xx, list) - this fails for
user-defined sequences that don't subclass list

- test for __iter__ attribute. IIUC this relies on an implementation
detail that strings don't define __iter__:

In [6]: hasattr('aa', '__iter__')
Out[6]: False

In [7]: hasattr([], '__iter__')
Out[7]: True

Kent
 
A

Azolex

WENDUM said:
While testing recursive algoritms dealing with generic lists I stumbled
on infinite loops which were triggered by the fact that (at least for my
version of Pyton) characters contain themselves.

Note that the empty string is contained in all strings, including itself.
True
 
W

WENDUM Denis 47.76.11 (agent)

In fact, not just characters, but strings contain themselves:


True

This is a very nice(i.e. clear and concise) shortcut for:


True

Which I always found contorted and awkward.

Could you be a bit more concrete about your complaint?

-- George
[Thanks, I did enjoy looking up the Axiom of Foundation!]
Here is the minimal context triggering an infinite descent in recursion.
From the answers I've got it seems I'll have to check if I'm iterating
on a string or on another kind of "list".
Python 2.3.5 (#2, Feb 9 2005, 00:38:15)
[GCC 3.3.5 (Debian 1:3.3.5-8)] on linux2
Type "help", "copyright", "credits" or "license" for more information..... try: # assuming List supports iteration
.... new=[f(item) for item in List]
.... except:# if no iteration is possible
.... new=1
.... return new
....
f([1, [2,3]]) # f substitutes each non iterable item by 1 (silly butshows the problem) [1, [1, 1]]
f([1, [2.3, (5,6,7)]]) [1, [1, [1, 1, 1]]]
f('bac') [[[[[[[[1]]]]]]], [[[[[[[1]]]]]]], [[[[[[[1]]]]]]]]
sys.setrecursionlimit(5)
f('bac') [[[1]], [[1]], [[1]]]
# each item in 'bac' is a character,
ie. a string of length one on wich one can iterate
and so triggers an infinite descent of recursion.
...
 
S

Scott David Daniels

WENDUM said:
... From the answers I've got it seems I'll have to check if I'm
iterating on a string or on another kind of "list"....... try: # assuming List supports iteration
... new=[f(item) for item in List]
... except:# if no iteration is possible
... new=1
... return new
f([1, [2,3]]) # f substitutes each non iterable item by 1 (silly
butshows the problem) [1, [1, 1]]
f([1, [2.3, (5,6,7)]]) [1, [1, [1, 1, 1]]]
f('bac') [[[[[[[[1]]]]]]], [[[[[[[1]]]]]]], [[[[[[[1]]]]]]]]
sys.setrecursionlimit(5)
f('bac') [[[1]], [[1]], [[1]]]
# each item in 'bac' is a character,
ie. a string of length one on wich one can iterate
and so triggers an infinite descent of recursion.

I'd rewrite your 'f' as follows:
def copier(List):
try: # assuming List supports iteration
new = [f(item) for item in List]
except TypeError: # Non-iterables get replaced
return '?'
except RuntimeError: # Recursion failures get shown
return 'Recursion'
return new

Note, even if you got your wish and somehow you couldn't have an issue
with characters in strings, you would have to deal with this:

simple_list = [3.1415]
container = [simple_list]
simple_list.append(container)

Your function (and mine) assumes the argument is a DAG (Directed Acyclic
Graph), but there is no such guarantee about data structures in python.

--Scott David Daniels
(e-mail address removed)
 
G

Graham Fawcett

WENDUM said:
While testing recursive algoritms dealing with generic lists I stumbled
on infinite loops which were triggered by the fact that (at least for my
version of Pyton) characters contain themselves. [snip]
Leading to paradoxes and loops objects which contain themselves (and
other kinds of monsters) are killed in set theory with the Axiom of
Foundation:=)

You could always use an "is-proper-subset-of" function, which is closer
to the intent of your algorithm. Using Jamitzky's very clever infix
recipe [1], you can even write it as an infix operator:

#Jamitzky's infix-operator class, abbreviated
class Infix:
def __init__(self, function):
self.function = function
def __ror__(self, other):
return Infix(lambda x, self=self, other=other:
self.function(other, x))
def __or__(self, other):
return self.function(other)
def __call__(self, value1, value2):
return self.function(value1, value2)

# define our is-proper-subset operator...
ips = Infix(lambda a, b: (a is not b) and (a in b))
False

Of course, |ips| looks like Lips-L, which is odd; but you could pick a
better name. ;-)

Graham

[1] http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/384122
 
S

Sion Arrowsmith

Graham Fawcett said:
You could always use an "is-proper-subset-of" function, which is closer
to the intent of your algorithm. Using Jamitzky's very clever infix
recipe [1], you can even write it as an infix operator:

#Jamitzky's infix-operator class, abbreviated
class Infix:
[ ... ]

# define our is-proper-subset operator...
ips = Infix(lambda a, b: (a is not b) and (a in b))

Unfortunately:
True

Which might not be what you want. On the other hand, it's a simple
fix:
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,015
Latest member
AmbrosePal

Latest Threads

Top