Indexing list of lists

H

Hilde Roth

This may have been asked before but I can't find it. If I have
a rectangular list of lists, say, l = [[1,10],[2,20],[3,30]], is
there a handy syntax for retrieving the ith item of every sublist?
I know about [i[0] for i in l] but I was hoping for something more
like l[;0].

Hilde
 
J

John J. Lee

This may have been asked before but I can't find it. If I have
a rectangular list of lists, say, l = [[1,10],[2,20],[3,30]], is
there a handy syntax for retrieving the ith item of every sublist?
I know about [i[0] for i in l] but I was hoping for something more
like l[;0].

If you need that kind of thing a lot, look at Numeric (or its
replacement, numarray), or perhaps the standard library's array
module.


John
 
D

David Eppstein

This may have been asked before but I can't find it. If I have
a rectangular list of lists, say, l = [[1,10],[2,20],[3,30]], is
there a handy syntax for retrieving the ith item of every sublist?
I know about [i[0] for i in l] but I was hoping for something more
like l[;0].
l = [[1,10],[2,20],[3,30]]
zip(*l)[0]
(1, 2, 3)
 
P

Peter Otten

Hilde said:
This may have been asked before but I can't find it. If I have
a rectangular list of lists, say, l = [[1,10],[2,20],[3,30]], is
there a handy syntax for retrieving the ith item of every sublist?
I know about [i[0] for i in l] but I was hoping for something more
like l[;0].

If efficiency is not an issue and/or you need
[item[index] for item in theList] for more than one index at a time, you can
do:
s = [[1,2],[3,4]]
t = zip(*s)
t [(1, 3), (2, 4)]
t[1] (2, 4)

This creates a transposed (?) copy of the "matrix". The side effect of
creating tupples instead of inner lists should do no harm if you need only
read access to the entries.

Peter
 
H

Hilde Roth

Thanks for the suggestion but zip is not nice for large lists
and as for array/numpy, although I chose a numeric example in
the posting, I don't see why only numeric arrays should enjoy
the benefit of such a notation.

l[;0] is illegal right now but does anyone of any other bit
of syntax it might conflict with if proposed as an extension?

Hilde
 
P

Peter Otten

Hilde said:
Thanks for the suggestion but zip is not nice for large lists
and as for array/numpy, although I chose a numeric example in
the posting, I don't see why only numeric arrays should enjoy
the benefit of such a notation.

l[;0] is illegal right now but does anyone of any other bit
of syntax it might conflict with if proposed as an extension?

I think that in alist[from:to:step] the step argument is already overkill.
If there is sufficient demand for column extraction, I would rather make it
a method of list, as alist[:columnIndex] can easily be confused with
alist[;toIndex] (or was it the other way round :). Would you allow
slicing, too, or make slicing and column extraction mutually exclusive?
Here's how to extract rows 2,4,6 and then columns 4 to 5:

m = n[2:7:2;4:6] # not valid python

Also, ";" is already used (though seldom found in real code) as an alternate
way to delimit statements.
So your suggestion might further complicate the compiler without compelling
benefits over the method approach.

Peter
 
H

Hilde Roth

Would you allow slicing, too, or make slicing and column extraction
mutually exclusive?

This is not about "column extraction" but about operating on different
dimensions of the array, which there doesn't seem to be any handy way
of doing right now. So yes I would allow both: within each ";" separated
index group, the current syntax (whatever it is) would apply.
Also, ";" is already used (though seldom found in real code) as an
alternate way to delimit statements.

I doubt it can be found within sqare brackets, which is not difficult
to disambiguate.

While we are at it, I also don't understand why sequences can't be
used as indices. Why not, say, l[[2,3]] or l[(2, 3)]? Why a special
slice concept? To me, it's not just the step argument in the slice
that is overkill...

Hilde
 
P

Peter Otten

Hilde said:
This is not about "column extraction" but about operating on different
dimensions of the array, which there doesn't seem to be any handy way

Like it or not, there are no "different dimensions", just lists of lists of
lists... so the N dimension case would resolve to (N-1) 2 dimension
operations.
While we are at it, I also don't understand why sequences can't be
used as indices. Why not, say, l[[2,3]] or l[(2, 3)]? Why a special
slice concept? To me, it's not just the step argument in the slice
that is overkill...

(a) Why not alist[[2, -3, 7]]?
OK with me.

[alist[2], alist[-3], alist[7]]

and

[alist for i in [2, -3, 7]]

are not particularly cumbersome, though.

(b) Why a special slice concept?
It covers the most common cases of list item extraction with a concise
syntax.

Peter
 
L

Lukasz Pankowski

While we are at it, I also don't understand why sequences can't be
used as indices. Why not, say, l[[2,3]] or l[(2, 3)]? Why a special
slice concept? To me, it's not just the step argument in the slice
that is overkill...

1. It will be more typing and harder to visually parse

l[:3] would be l[(0, 3)]
l[3:] would be l[(3,-1)]

2. Slicing two dimensional object will not be possible as the notion
you proposed is used just for that (ex. l[1,2] which is equivallent
to l[(1,2)] see below), and Numeric and numarray use it. See what
happens with an object which on indexing just returns the index
.... def __getitem__(self, i): return i
....
c = C()
c[3] 3
c[:3] slice(0, 3, None)
# multi dimensional indexing
c[1,3] (1, 3)
c[1:3,3:5]
(slice(1, 3, None), slice(3, 5, None))
 
J

John J. Lee

of doing right now. So yes I would allow both: within each ";" separated
index group, the current syntax (whatever it is) would apply.

Ain't going to happen. If you want that kind of thing without forking
your own version of Python, Numeric/numarray is the closest you'll get
(no special syntax, but lots of useful functions and 'ufuncs').


[...]
While we are at it, I also don't understand why sequences can't be
used as indices. Why not, say, l[[2,3]] or l[(2, 3)]? Why a special
[...]

I'd guess you can subclass numarray's arrays and get this behaviour.
Or simply write your own sequence object and override __getitem__.

I'd guess it's highly unlikely ever to be part of the standard
sequence protocol, though.


John
 
J

John J. Lee

Thanks for the suggestion but zip is not nice for large lists
and as for array/numpy, although I chose a numeric example in
the posting, I don't see why only numeric arrays should enjoy
the benefit of such a notation.

Numeric (and presumably numarray) can handle arbitrary Python objects.

l[;0] is illegal right now but does anyone of any other bit
of syntax it might conflict with if proposed as an extension?

BTW, just remembered that Numeric/numarray *does* use commas for
multi-dimensional indexing. Apparently (glancing at the language ref)
that's not actually indexing with a tuple, but part of Python's
syntax. Same goes for that other obscure bit of Python syntax, the
ellipsis, as used by Numeric: foo[a, ..., b].


John
 
H

Hilde Roth

Like it or not, there are no "different dimensions", just lists of lists
of lists...

You are being too litteral. A list of list is like a 2D array from an
indexing point of view, a list of lists of lists like a 3D array etc.
E.g., (((1,10),(2,20),(3,30)),((-1,'A'),(-2,'B'),(-3,'C'))) is a
2 x 3 x 2 rectangular data structure and has 3 dimensions. Hence,
e.g., l[0;2;1] ~ l[0][2][1] = 30
[alist[2], alist[-3], alist[7]] and [alist for i in [2, -3, 7]]


I agree that comprehensions alleviate the problem to an extent.
However the first notation is definitely cumbersome for all but the
shortest index lists.
It covers the most common cases of list item extraction with a concise
syntax.

Maybe but
1/ it is more or less redundant: the (x)range syntax could have been
extended with the same effect
2/ it lacks generality since it can only generate arithmetic progressions

Hilde
 
H

Hilde Roth

It will be harder to parse.

No because in many cases the reason why you want to use the syntax
l[seq] is that you already have seq, so you would refer to it by name
and "l" is definitely not hard to parse.
l[:3] would be l[(0, 3)]
l[3:] would be l[(3,-1)]

Not at all. I was suggesting to use a semi-colon not a colon. Thus if
l is (10,20,30,40), l[:3] -> (10,20,30) whereas l[(0,3)] -> (10, 30),
i.e., same as in your class-based example, minus the parentheses, which
I now realize are superfluous (odd that python allows you to omit the
parentheses but not the commas).
2. Slicing two dimensional object will not be possible as the notion
you proposed is used just for that (ex. l[1,2] which is equivalent
to l[(1,2)] see below), and Numeric and numarray use it.

Same misunderstanding as above, I believe. Let, e.g., l be
((1,10,100),(2,20,200),(3,30,300),(4,40,400)). Then l[2:; 1:] ->
[(30, 300), (40, 400)]. This is equivalent to [i[1:] for i in l[2:]]
but, at least to me, it is the index notation that is easier to parse.

Incidentally, it strikes me that there are a lot of superfluous
commas. Why not just (1 10 100) or even 1 10 100 instead of (1,10,100)?
The commas do make the expression harder to parse visually.
# multi dimensional indexing
c[1,3]

I disagree that this is "multidimensional". You are passing a list
and getting back a list, so this is still flat. I think you are
confusing dimensionality and cardinality.
Numeric and numarray use it

This is good if it is true (I have yet to look at these two because
my work is not primarily numerical) but, again, the restriction of this
syntax to arrays, and arrays of numeric values at that, strikes me
as completely arbitrary.

The bottom line is that python claims to be simple but has a syntax
which, in this little corner at least, is neither simple nor regular:
xranges and slices are both sequence abstractions but are used in
different contexts and have different syntaxes; C-style arrays are
treated differently than lists of lists of ..., although they are
conceptually equivalent; numeric structures are treated differently
than non-numeric ones etc etc

Oh well. Maybe a future PEP will streamline all that.

Hilde
 
D

David C. Fox

Hilde said:
Like it or not, there are no "different dimensions", just lists of lists
of lists...


You are being too litteral. A list of list is like a 2D array from an
indexing point of view, a list of lists of lists like a 3D array etc.
E.g., (((1,10),(2,20),(3,30)),((-1,'A'),(-2,'B'),(-3,'C'))) is a
2 x 3 x 2 rectangular data structure and has 3 dimensions. Hence,
e.g., l[0;2;1] ~ l[0][2][1] = 30

Only if all the sublists are of the same length, which is guaranteed for
a multi-dimensional array, but not for a list of lists.

What do you expect a[;1] to return if a = [[], [1, 2, 3], [4], 5]?

That's why Numeric has a specific type for multi-dimensional arrays.

David
 
H

Hilde Roth

Numeric (and presumably numarray) can handle arbitrary Python objects.

Oh. I will look at them pronto, then. As for:
I'd guess it's highly unlikely ever to be part of the standard
sequence protocol, though.

Passing a slice is passing an abstraction of a sequence (since indexing
a list with a slice returns a list). Given that, it seems hard to justify
not accepting the sequence itself... Passing the abstraction rather than
the thing itself should be an optimization, as in xrange vs. range, left
to the discretion of the programmer.

-- O.L.
 
L

Lukasz Pankowski

No because in many cases the reason why you want to use the syntax
l[seq] is that you already have seq, so you would refer to it by name
and "l" is definitely not hard to parse.


So you want l[seq] to be a shorter way for current

[l for i in seq]

Which is pythonic as it is explicit, easier to read if the code is not
written by you yesterday, although in some situations your interpretation
of l[seq] might be guessable from the context.

Not at all. I was suggesting to use a semi-colon not a colon. Thus if
l is (10,20,30,40), l[:3] -> (10,20,30) whereas l[(0,3)] -> (10, 30),
i.e., same as in your class-based example, minus the parentheses, which
I now realize are superfluous (odd that python allows you to omit the
parentheses but not the commas).

Sorry for my misunderstanding, yes it would be nice to have a
posibility to index a sequence with a list of indices, here is a pure
Python (>= 2.2) implementation of the idea:

class List(list):

def __getitem__(self, index):
if isinstance(index, (tuple, list)):
return [list.__getitem__(self, i) for i in index]
else:
return list.__getitem__(self, index)
l = List(range(0, 100, 10))
l[0,2,3]
[0, 20, 30]

but in this simple using both commas and slices will not work as
expected
[0, 10, [70, 80, 90]]

2. Slicing two dimensional object will not be possible as the notion
you proposed is used just for that (ex. l[1,2] which is equivalent
to l[(1,2)] see below), and Numeric and numarray use it.

Same misunderstanding as above, I believe. Let, e.g., l be
((1,10,100),(2,20,200),(3,30,300),(4,40,400)). Then l[2:; 1:] ->
[(30, 300), (40, 400)]. This is equivalent to [i[1:] for i in l[2:]]
but, at least to me, it is the index notation that is easier to parse.

Incidentally, it strikes me that there are a lot of superfluous
commas. Why not just (1 10 100) or even 1 10 100 instead of (1,10,100)?
The commas do make the expression harder to parse visually.

This will will work until there are no expressions in the sequence.
If there are it is harder to read (and may be more error prone)

(1 + 3 4 + 2)

# multi dimensional indexing
c[1,3]

I disagree that this is "multidimensional". You are passing a list
and getting back a list, so this is still flat. I think you are
confusing dimensionality and cardinality.
That is the notion of multidimensional indexing in Python.
This is good if it is true (I have yet to look at these two because
my work is not primarily numerical) but, again, the restriction of this
syntax to arrays, and arrays of numeric values at that, strikes me
as completely arbitrary.

Here is an example of two dimentional Numeric array and it's indexing:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
a = reshape(arange(9), (3,3))
a[1][0] 3
a[1,0] 3
a[(1,0)]
3

So currently indexing with sequence has a settled meaning of
multidimensional indexing if lists and tuples would allow indexing by
sequence, than either:

1. it might be confused with multidimensional indexing of numeric
types (the same notion for two different things)

2. it will require rework of multidimensional indexing maybe with your
semicolon notion, but will introduce incompatibilities (back and
forward)
The bottom line is that python claims to be simple but has a syntax
which, in this little corner at least, is neither simple nor regular:
xranges and slices are both sequence abstractions but are used in
different contexts and have different syntaxes; C-style arrays are
treated differently than lists of lists of ..., although they are
conceptually equivalent; numeric structures are treated differently
than non-numeric ones etc etc

With multidimensional arrays and list of lists you may both write
a[j], so it is consistent, with arrays you may write a[i,j] if you
know that you have an 2-dim array in your hand.
 
P

Peter Otten

Lukasz said:
Sorry for my misunderstanding, yes it would be nice to have a
posibility to index a sequence with a list of indices, here is a pure
Python (>= 2.2) implementation of the idea:

class List(list):

def __getitem__(self, index):
if isinstance(index, (tuple, list)):
return [list.__getitem__(self, i) for i in index]
else:
return list.__getitem__(self, index)
l = List(range(0, 100, 10))
l[0,2,3]
[0, 20, 30]

This is nice :)
but in this simple using both commas and slices will not work as
expected
[0, 10, [70, 80, 90]]

Your implementation can be extended to handle slices and still remains
simple:

class List3(list):
def __getitem__(self, index):
if hasattr(index, "__getitem__"): # is index list-like?
result = []
for i in index:
if hasattr(i, "start"): # is i slice-like?
result.extend(list.__getitem__(self, i))
else:
result.append(list.__getitem__(self, i))
return result
else:
return list.__getitem__(self, index)

I have used hasattr() instead of isinstance() tests because I think that the
"we take everything that behaves like X" approach is more pythonic than
"must be an X instance".
While __getitem__() is fairly basic for lists, I am not sure if start can be
considered mandatory for slices, though.

Peter
 
H

Hilde Roth

Only if all the sublists are of the same length, which is guaranteed for
a multi-dimensional array, but not for a list of lists.

This is a red herring.
What do you expect a[;1] to return if a = [[], [1, 2, 3], [4], 5]?

Whatever error python returns if you ask, e.g., for (1 2 3)[4].

Hilde
 
H

Hilde Roth

Only if all the sublists are of the same length, which is guaranteed for
a multi-dimensional array, but not for a list of lists.

This is a red herring.
What do you expect a[;1] to return if a = [[], [1, 2, 3], [4], 5]?

Whatever error python returns if you ask, e.g., for (1,2,3)[4].

Hilde
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,190
Latest member
Martindap

Latest Threads

Top