transforming a list into a string

T

Terry Reedy

Tim Peters said:
Absolutely. Note that Peter Otten previously posted a lovely O(N)
solution in this thread, although it may be too clever for some
tastes:
from itertools import izip
items = ['1','2','7','8','12','13']
it = iter(items)
",".join(["{%s,%s}" % i for i in izip(it, it)])
'{1,2},{7,8},{12,13}'

While this usage of izip(it,it) is clever (and zip(it,it) behaves the
same), it *breaks* the documented behavior of zip and hence of izip, and
depends on seemingly inessential implementation details of zip/izip (which
is only documented for izip).

The Lib Manual 2.1 zip entry calls the args 'sequences' whereas it should
say 'iterables'. It goes on "This function returns a list of tuples, where
the i-th tuple contains the i-th element from each of the argument
sequences. ... The returned list is truncated in length to the length of
the shortest argument sequence." In fact, even back in 2.2:
it = iter([1,2,3,4])
zip(it,it)
[(1, 2), (3, 4)]

I believe the definition of zip would allow iterator inputs to be handled
by list()ing them (for instance, left to right). A reason to do this might
be to fix the otherwise unknown lengths so that the min could be determined
so that the output list could be preallocated. If this were done, however,
zip(it,it) would output [] instead.

If zip were to build output tuples from the end, which the definition would
also seem to allow, then zip(it,it) above would be [(2,1), (4,3)] instead.
Zip's behavior seems to me undefined for this corner case.

So the doc needs to be updated to specify the args as iterables, not
restricted to sequences, and qualify the 'usual' behavior as depending on
having distinct iterator inputs.

The 5.14.1 Itertool functions izip entry says simply "Like zip() except
that it returns an iterator instead of a list." It also give 'equivalent'
Python code which happens to pin down the behavior for this corner case. I
wonder if this code should really be taken as determinative documentation
rather than as illustrative of possible implementation. (I may ask RH if
need be.) If the former, then the zip doc could reference the izip
equivalent code as specifying its behavior.

Terry J. Reedy
 
C

Christopher T King

[Christopher T King]
...
Curious, why isn't slicing of generators defined, using islice(), so "it1
= iter(items)[0::2]" is valid?

If only some iterators implement it (like generator-iterators), then the
general interchangeability of iterable objects we enjoy today would be
damaged too.

Ah, I see your point. But most functions that expect iterators use iter()
on them first (to iterize sequences), do they not? So long as iter()
supplies the necessary __getslice__ implementation, the world would be
happy. This situation would mirror the situation with list(): though a
user-defined sequence might not implement __getslice__ (and still be
usable as a list), the object returned by list() is guaranteed to.
 
T

Tim Peters

[Christopher T King]
Ah, I see your point. But most functions that expect iterators use iter()
on them first (to iterize sequences), do they not? So long as iter()
supplies the necessary __getslice__ implementation, the world would be
happy. This situation would mirror the situation with list(): though a
user-defined sequence might not implement __getslice__ (and still be
usable as a list), the object returned by list() is guaranteed to.

The difference is that list() creates a concrete list object from its
argument, but there's no such thing as "a concrete iter object".
Iteration is a protocol, not a type. iter(obj) invokes
obj.__iter__(), and I don't know of any existing __iter__
implementation that returns an object that supports slicing. The only
thing required of __iter__ is that it return an object that supports
the iteration protocol (meaning an object that supports next() and
__iter__() methods). So again, adding the ability to slice too would
mean requiring more of __iter__ methods -- or changing the
implementation of iter() to ignore __iter__ methods and make something
up itself. It's A Visible Change no matter how you cut it.
 
C

Christopher T King

The difference is that list() creates a concrete list object from its
argument, but there's no such thing as "a concrete iter object".

You've got me there.
Iteration is a protocol, not a type. iter(obj) invokes
obj.__iter__(),

Only if obj.__iter__() is defined; otherwise it makes something up using
__getitem__ as appropriate.
and I don't know of any existing __iter__ implementation that returns an
object that supports slicing.

True. Built-in classes (such as list & tupleiterator) would have to be
extended with this function (trivial if they all subclass from a common
base class). As to user classes, I'm proposing this on the assumption
that a programmer would implement their own __getitem__ (I've been saying
__getslice__ previously, __getitem__ is what I should be saying) if the
functionality is so desired, which can be as trivial as setting
__getitem__=islice, provided islice is extended to accept slice objects.
Though, this would break __getitem__ in the case of getting a single
item (I can see where this is heading)...
The only thing required of __iter__ is that it return an object that
supports the iteration protocol (meaning an object that supports next()
and __iter__() methods). So again, adding the ability to slice too
would mean requiring more of __iter__ methods -- or changing the
implementation of iter() to ignore __iter__ methods and make something
up itself. It's A Visible Change no matter how you cut it.

You've made your point. I guess this will just have to go on the "it
would be neat if it worked this way, but it just doesn't" list for now ;)
 
B

Bruce Eckel

Iteration is a protocol, not a type.

I know the term "protocol" has been used to describe a language
feature in a number of languages, but since we have no official
"protocol" support in Python I'm interested in what "we" mean by this
term. I'm going to guess that a protocol is like an interface in Java,
except that it doesn't have a concrete definition anywhere, but it is
implied through convention and use. Thus a protocol is a "latent
interface." Am I close? I'd like to understand this term better.

Bruce Eckel
 
P

Phil Frost

"Protocol" in python has no meaning beyond normal English. Specifically
the iteration protocol says the iterable must have a __iter__ method
which returns an object that has an __iter__ which returns self, and a
next() method that returns the next thing in the thing being iterated.
When there are no more things, next() raises StopIteration. All of this
is a simple protocol defined by the python language. It doesn't
introduce anything new to the language besides the protocol itself;
__iter__ and next are regular methods and StopIteration is raised just
as any other class can be raised as an exception.
 
K

kosh

jblazi said:
Let us assume I have a list like

['1','2','7','8','12','13]

and would like to transoform it into the string

'{1,2},{7,8},{12,13}'

Which is the simplest way of achiebing this? (The list is in fact much
longer and I may have to cut the resulting strings into chunks of 100 or
so.)
from itertools import izip
items = ['1','2','7','8','12','13']
it = iter(items)
",".join(["{%s,%s}" % i for i in izip(it, it)])

'{1,2},{7,8},{12,13}'


Peter

This way also works I am not sure how it compares on large sets for memory
usage, cpu time etc.

items = ['1','2','7','8','12','13']
if len(items)%2 == 0:
#the len check is to make sure that the length of the set is evenly
#divisible by 2
output = "{%s,%s}" * (len(items)/2)% tuple(items)
 
D

Duncan Booth

Absolutely. Note that Peter Otten previously posted a lovely O(N)
solution in this thread, although it may be too clever for some
tastes:
from itertools import izip
items = ['1','2','7','8','12','13']
it = iter(items)
",".join(["{%s,%s}" % i for i in izip(it, it)]) '{1,2},{7,8},{12,13}'

A bit too clever for mine, mostly because neither izip() nor zip() is
guaranteed to process its arguments in a left-to-right order (although
there's no reason for them not to).

You should read an earlier thread on this topic:

http://groups.google.co.uk/[email protected]

I make exactly that point, that the order isn't guaranteed, and was
refuted fairly convincingly by Peter Otten. The documentation says that
izip is equivalent to a particular reference implementation. Any
implementation which didn't preserve the left to right ordering wouldn't
match the reference:

Peter said:
Passing the same iterator multiple times to izip is a pretty neat
idea, but I would still be happier if the documentation explicitly
stated that it consumes its arguments left to right.

From the itertools documentation:

"""
izip(*iterables)

Make an iterator that aggregates elements from each of the iterables.
Like zip() except that it returns an iterator instead of a list. Used
for lock-step iteration over several iterables at a time. Equivalent
to:

def izip(*iterables):
iterables = map(iter, iterables)
while iterables:
result = [i.next() for i in iterables]
yield tuple(result)
"""

I'd say the "Equivalent to [reference implementation]" statement
should meet your request.
 
A

Aahz

[cc'ing Alex so he can jump in if he wants]

I know the term "protocol" has been used to describe a language feature
in a number of languages, but since we have no official "protocol"
support in Python I'm interested in what "we" mean by this term. I'm
going to guess that a protocol is like an interface in Java, except
that it doesn't have a concrete definition anywhere, but it is implied
through convention and use. Thus a protocol is a "latent interface." Am
I close? I'd like to understand this term better.

Alex Martelli gave an excellent presentation on Design Patterns at OSCON,
where he made the point that "interface" is roughly equivalent to syntax,
whereas "protocol" is roughly equivalent to syntax plus semantics. In
other words, computer langauges rarely (if ever -- although I suppose
Eiffel comes close) enforce protocols in any meaningful way.

I'm hoping Alex posts his slides soon.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"To me vi is Zen. To use vi is to practice zen. Every command is a
koan. Profound to the user, unintelligible to the uninitiated. You
discover truth everytime you use it." (e-mail address removed)
 
T

Tom B.

jblazi said:
Let us assume I have a list like

['1','2','7','8','12','13]

and would like to transoform it into the string

'{1,2},{7,8},{12,13}'

Which is the simplest way of achiebing this? (The list is in fact much
longer and I may have to cut the resulting strings into chunks of 100 or
so.)

TIA,

jb
Heres a one-liner,
items = ['1','2','7','8','12','13']
[(items[x],items[x+1]) for x in range(0,len(items)-1,2)]
[('1', '2'), ('7', '8'), ('12', '13')]

Tom
 
T

Tom B.

jblazi said:
Let us assume I have a list like

['1','2','7','8','12','13]

and would like to transoform it into the string

'{1,2},{7,8},{12,13}'

Which is the simplest way of achiebing this? (The list is in fact much
longer and I may have to cut the resulting strings into chunks of 100 or
so.)

TIA,

jb
Try,'
items = ['1','2','7','8','12','13']
import string
string.join([('{'+str(items[x])+','+str(items[x+1])+'}') for x in
range(0,len(items)-1,2)])

Tom
 
D

Dan Christensen

Tim Peters said:
The only thing required of __iter__ is that it return an object that
supports the iteration protocol (meaning an object that supports
next() and __iter__() methods). So again, adding the ability to
slice too would mean requiring more of __iter__ methods -- or
changing the implementation of iter() to ignore __iter__ methods and
make something up itself. It's A Visible Change no matter how you
cut it.

What if object[a:b:c] did the following?

1) if __getslice__ is supplied by object, use it.
2) if __getitem__ is supplied, use it.
3) if __iter__ is supplied, use islice(__iter__(object),slice(a,b,c)).

(IIUC, 1) and 2) are what is done currently. As Christopher pointed
out, for 3) to work, islice would have to modified to accept a slice.)

Would this be backward compatible (a "Visible Change"), or would it
break something?

It seems to parallel other dispatching mechanisms, e.g. (IIUC) "in" is
implemented just using the iterator protocol, unless the object
supplies __contains__.

I like the idea of not needing to know about islice, and just using
the [a:b:c] notation. One of python's strongest features is its
consistency of notation.

Dan
 
B

Bruce Eckel

Alex Martelli gave an excellent presentation on Design Patterns at OSCON,
where he made the point that "interface" is roughly equivalent to syntax,
whereas "protocol" is roughly equivalent to syntax plus semantics. In
other words, computer langauges rarely (if ever -- although I suppose
Eiffel comes close) enforce protocols in any meaningful way.

But what would the syntax be in Python? -- there is none. Perhaps you
mean that in Java a protocol would be an interface plus agreed-upon
semantics, whereas in Python the protocol would be the agreed-upon
(e.g. "latent") interface plus the agreed-upon semantics. In a sense,
both the syntax and semantics would be latent; they would only be
exposed and tested during use, since there is no formalized way to
define them.

In Python, of course, you could simply use a class with "pass" methods
everywhere to define a protocol. If there were some way to write
assertions about the protocol you could include those in the class.

Bruce Eckel
 
P

Peter Hansen

Bruce said:
But what would the syntax be in Python? -- there is none. Perhaps you
mean that in Java a protocol would be an interface plus agreed-upon
semantics, whereas in Python the protocol would be the agreed-upon
(e.g. "latent") interface plus the agreed-upon semantics. In a sense,
both the syntax and semantics would be latent; they would only be
exposed and tested during use, since there is no formalized way to
define them.

In Python, of course, you could simply use a class with "pass" methods
everywhere to define a protocol. If there were some way to write
assertions about the protocol you could include those in the class.

My reading of Alex' description seems to be the opposite. By definining
protocol to include the semantics, you are basically ensuring that an
implementation with "pass" would not be valid, except for a useless
protocol that defined null semantics. But yes, what you say about the
"latency" is true, and that's how Python works in many areas. You
need decent automated tests to catch certain errors in Python, though
tools like PyChecker can help with some of the ones that compilers
will find in statically typed languages.

-Peter
 
A

Alex Martelli

[cc'ing Alex so he can jump in if he wants]

I know the term "protocol" has been used to describe a language
feature
in a number of languages, but since we have no official "protocol"
support in Python I'm interested in what "we" mean by this term. I'm
going to guess that a protocol is like an interface in Java, except
that it doesn't have a concrete definition anywhere, but it is implied
through convention and use. Thus a protocol is a "latent interface."
Am
I close? I'd like to understand this term better.

Alex Martelli gave an excellent presentation on Design Patterns at
OSCON,
where he made the point that "interface" is roughly equivalent to
syntax,
whereas "protocol" is roughly equivalent to syntax plus semantics. In
other words, computer langauges rarely (if ever -- although I suppose
Eiffel comes close) enforce protocols in any meaningful way.

I'm hoping Alex posts his slides soon.

Sorry, ADSL down, leaving for the Alps tomorrow, emergency netting from
an internet cafe` at $$/hour. Hope to have my slides up when I get
back (Aug 17 or thereabouts). But yes, what I deduced as the meaning
of "protocol" from its occasional usage in the literature (about
components, mostly) is that it adds to the interface (syntax==method
names/signatures) the semantics and pragmatics ("you cannot call any
other method after calling close" being more pragmatics than semantics,
at least in linguistics terms). Languages do enforce protocols (e.g.,
raise an exception if you do mistakenly call something else after
calling close), just not "at compile time" (not even Eiffel, where
contract checking is runtime -- not even Haskell for its typeclasses,
which is what I believe comes closest to compiletime protocol
checking).


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,276
Latest member
Sawatmakal

Latest Threads

Top