Specify start and length, beside start and end, in slices

N

Noam Raphael

Hello,
Many times I find myself asking for a slice of a specific length, and
writing something like l[12345:12345+10].
This happens both in interactive use and when writing Python programs,
where I have to write an expression twice (or use a temporary variable).

Wouldn't it be nice if the Python grammar had supported this frequent
use? My idea is that the expression above might be expressed as
l[12345:>10].

This change, as far as I can see, is quite small: it affects only the
grammar and byte-compiling, and has no side effects.

The only change in syntax is that short_slice would be changed from
[lower_bound] ":" [upper_bound]
to
([lower_bound] ":" [upper_bound]) | ([lower_bound] ":>" [slice_length])

Just to show what will happen to the byte code: l[12345:12345+10] is
compiled to:
LOAD_GLOBAL 0 (l)
LOAD_CONST 1 (12345)
LOAD_CONST 1 (12345)
LOAD_CONST 2 (10)
BINARY_ADD
SLICE+3

I suggest that l[12345:>10] would be compiled to:
LOAD_GLOBAL 0 (l)
LOAD_CONST 1 (12345)
DUP_TOP
LOAD_CONST 2 (10)
BINARY_ADD
SLICE+3

Well, what do you think? I would like to hear your comments.

Have a good day (or night),
Noam Raphael
 
G

Grant Edwards

Many times I find myself asking for a slice of a specific length, and
writing something like l[12345:12345+10].
[...]

Wouldn't it be nice if the Python grammar had supported this frequent
use? My idea is that the expression above might be expressed as
l[12345:>10].

It's a bit less efficient, but you can currently spell that as

l[12345:][:10]
 
N

Noam Raphael

Grant said:
Many times I find myself asking for a slice of a specific length, and
writing something like l[12345:12345+10].

[...]


Wouldn't it be nice if the Python grammar had supported this frequent
use? My idea is that the expression above might be expressed as
l[12345:>10].


It's a bit less efficient, but you can currently spell that as

l[12345:][:10]
That is true, but if the list is long, it's *much* less efficient.

Thanks for your comment,
Noam
 
P

Peter Hansen

Noam said:
Grant said:
It's a bit less efficient, but you can currently spell that as

l[12345:][:10]
That is true, but if the list is long, it's *much* less efficient.

Considering that the interpreter special-cases some integer math
including the BINARY_ADD, it likely wouldn't take a very long list
to pass the point where they're the same.

I like the idea of the optimization, in a sense, but I don't
like the syntax and doubt that there is much performance gain to be
had. There are probably better places for people to hack on the
interpreter, and which don't need syntax changes.

-Peter
 
N

Noam Raphael

Peter said:
Noam said:
Grant said:
It's a bit less efficient, but you can currently spell that as

l[12345:][:10]
That is true, but if the list is long, it's *much* less efficient.


Considering that the interpreter special-cases some integer math
including the BINARY_ADD, it likely wouldn't take a very long list
to pass the point where they're the same.

I don't understand: If the list is of length 1000000, wouldn't Grant
Edwards' suggestion make 1000000-12345 new references, and then take
only the first ten of them?
 
G

Grant Edwards

It's a bit less efficient, but you can currently spell that as

l[12345:][:10]

That is true, but if the list is long, it's *much* less efficient.

Considering that the interpreter special-cases some integer math
including the BINARY_ADD, it likely wouldn't take a very long list
to pass the point where they're the same.

I'm afraid I don't understand either. Where do integer math
shortcuts enter the picture? It seems to me it's all about
building a (possibly long new list) which you're going to throw
away after you build another list from the front it.

Unless the compiler is smart enough to figure out what you're
aiming at and skip the intermediate list entirely.
I don't understand: If the list is of length 1000000, wouldn't
Grant Edwards' suggestion make 1000000-12345 new references,
and then take only the first ten of them?

Yes, according to my understanding of how things work, that's
what happens (my spelling is pretty inefficient for pulling
small chunks from the beginnings of long lists), so if you do
a lot of that, it may be worth worrying about.
 
P

Peter Hansen

Noam said:
Peter said:
Noam said:
Grant Edwards wrote:

It's a bit less efficient, but you can currently spell that as

l[12345:][:10]

That is true, but if the list is long, it's *much* less efficient.



Considering that the interpreter special-cases some integer math
including the BINARY_ADD, it likely wouldn't take a very long list
to pass the point where they're the same.

I don't understand: If the list is of length 1000000, wouldn't Grant
Edwards' suggestion make 1000000-12345 new references, and then take
only the first ten of them?

Sorry, it was perhaps unclear that I was agreeing with you. For
an extremely short list, it's possible that it would be faster
to do Grant's method, but what I was trying to say is that even
if that's true, I expect that for a list of more than a few dozen
elements it would not be faster. Looking at it again, I suspect
that it would actually never be faster, given that probably
about as many bytecode instructions are executed, and then there's
the extra memory allocation for the temporary list, the copying,
etc.

-Peter
 
P

Peter Hansen

Peter said:
For an extremely short list, it's possible that it would be faster
to do Grant's method, but what I was trying to say is that even
if that's true, I expect that for a list of more than a few dozen
elements it would not be faster. Looking at it again, I suspect
that it would actually never be faster, given that probably
about as many bytecode instructions are executed, and then there's
the extra memory allocation for the temporary list, the copying,

timeit confirms this with variations on this:

c:\>python -c "import timeit as t; t = t.Timer('x[y:][:10]', 'y=10000;
x=range(y)'); print t.timeit()"

and this:

c:\>python -c "import timeit as t; t = t.Timer('x[y:y+10]', 'y=10000;
x=range(y)'); print t.timeit()"

-Peter
 
T

Terry Reedy

Many times I find myself asking for a slice of a specific length, and
writing something like l[12345:12345+10].
This happens both in interactive use and when writing Python programs,
where I have to write an expression twice (or use a temporary variable).

With an expression, I'd go for the temp var.
Wouldn't it be nice if the Python grammar had supported this frequent
use?

I take this as 'directly support' versus the current indirect support via
start+len.
My answer: superficially (in isolation) yes, but overall, in the context of
Python's somewhat minimalistic grammar/syntax, no. Two ways to slice might
easily be seen as one too many. In addition, the rationale for this, your
favorite little addition, would admit perhaps 50 others like it.
My idea is that the expression above might be expressed as l[12345:>10].

Sorry, this strike me as ugly, too much like and easily confused with
l[12345:-10], and too much looking like a syntax error.

Given that some other languages slice with (start,len) arguments (but not
then, that I remember or know of, also with a start,stop option), I am
*sure* that Guido thought carefully about the issue. A plus with his
choice is ability to offset (index) from the end *without* calling the len
function.
This change, as far as I can see, is quite small: it affects only the
grammar and byte-compiling, and has no side effects.

Except the cognitive dissonance of two *almost* identical syntaxes and the
flood of other 'small', 'no side effect' change requests.
Well, what do you think? I would like to hear your comments.

Your wish ...

Terry J. Reedy
 
L

Larry Bates

I think it is odd that I have never encounter
many of these types of constructs repeatedly in
my code. Perhaps you could share a little more
of where you see this type of think popping up
a lot? I suspect that there is another method
for solving the problem that might be faster
and easier to read/program.

Larry Bates,
Syscon, Inc.
 
N

Noam Raphael

Hello,

Terry said:
Many times I find myself asking for a slice of a specific length, and
writing something like l[12345:12345+10].
This happens both in interactive use and when writing Python programs,
where I have to write an expression twice (or use a temporary variable).


With an expression, I'd go for the temp var.

Wouldn't it be nice if the Python grammar had supported this frequent
use?


I take this as 'directly support' versus the current indirect support via
start+len.
My answer: superficially (in isolation) yes, but overall, in the context of
Python's somewhat minimalistic grammar/syntax, no. Two ways to slice might
easily be seen as one too many.

I agree that Python should be kept easy to read and understand. However,
it doesn't mean that there's only one way to do everything. An example
(it's even from slices): the Numeric people asked for the "..." token
and got it, even though you can live without it - it simply makes your
life easier.
In addition, the rationale for this, your
favorite little addition, would admit perhaps 50 others like it.

My idea is that the expression above might be expressed as l[12345:>10].


Sorry, this strike me as ugly, too much like and easily confused with
l[12345:-10], and too much looking like a syntax error.
Well, of course, it *is* a syntax error right now. As for what it looks
like - I can't argue with what it looks like to you, but since '>' is
generally perceived as having something to do with "go in the right
direction", I think that l[12345:>10] can easily be read as "start from
12345, and take 10 steps to the right. Take all the items you passed over."
Given that some other languages slice with (start,len) arguments (but not
then, that I remember or know of, also with a start,stop option), I am
*sure* that Guido thought carefully about the issue. A plus with his
choice is ability to offset (index) from the end *without* calling the len
function.
I think that the fact that other languages use (start, len) quite
contradicts your assumption that only 50 other people would like it. I
don't see what brings you to think that you represent 99.99 percent of
Python users.
I like Python's slicing very much, and I agree that given only one
slicing method, (start, end) should be chosen, but what's wrong with
adding another?
Except the cognitive dissonance of two *almost* identical syntaxes and the
flood of other 'small', 'no side effect' change requests.
Why not judge each 'small, no side effect' change request for its own
sake? Do you think that Python should only undergo big and complex changes?
Your wish ...
Yes, I do like to hear other opinions. Perhaps *you* could have been a
bit more open to hear them...
Terry J. Reedy
Noam Raphael
 
N

Noam Raphael

Hello,

Larry said:
I think it is odd that I have never encounter
many of these types of constructs repeatedly in
my code. Perhaps you could share a little more
of where you see this type of think popping up
a lot? I suspect that there is another method
for solving the problem that might be faster
and easier to read/program.

Larry Bates,
Syscon, Inc.

With pleasure. Here are two examples:

1. Say I have a list with the number of panda bears hunted in each
month, starting from 1900. Now I want to know how many panda bears were
hunted in year y. Currently, I have to write something like this:
sum(huntedPandas[(y-1900)*12:(y-1900)*12+12])
If my suggestion is accepted, I would be able to write:
sum(huntedPandas[(y-1900)*12:>12])

(Yes, I know that it may also be expressed as
sum(huntedPandas[(y-1900)*12:(y-1901)*12]), but it's less clear what I
mean, and it's still longer)

2. Many data files contain fields of fixed length. Just an example: say
I want to get the color of the first pixel of a 24-bit color BMP file.
Say I have a function which gets a 4-byte string and converts it into a
32-bit integer. The four bytes, from byte no. 10, are the size of the
header, in bytes. Right now, if I don't want to use temporary variables,
I have to write:
picture[s2i(picture[10:14]):s2i(picture[10:14])+4]
I think this is nicer (and quicker):
picture[s2i(picture[10:>4]):>4]

Thanks for your interest,
Noam Raphael
 
T

Terry Reedy

Noam Raphael said:
contradicts your assumption that only 50 other people would like it. I
don't see what brings you to think that you represent 99.99 percent of
Python users.

Projecting thoughts into my brain that I never had is stupid. I really
don't like that.
Perhaps *you* could have been a bit more open to hear them...

Making false ad hominen comments is stupid. I don't like that either.

I an disappointed. Sorry I took your request for comments *on the
proposal* seriously.

Terry J. Reedy
 
A

Antoon Pardon

Op 2004-05-21 said:
Many times I find myself asking for a slice of a specific length, and
writing something like l[12345:12345+10].
This happens both in interactive use and when writing Python programs,
where I have to write an expression twice (or use a temporary variable).

With an expression, I'd go for the temp var.
Wouldn't it be nice if the Python grammar had supported this frequent
use?

I take this as 'directly support' versus the current indirect support via
start+len.
My answer: superficially (in isolation) yes, but overall, in the context of
Python's somewhat minimalistic grammar/syntax, no. Two ways to slice might
easily be seen as one too many. In addition, the rationale for this, your
favorite little addition, would admit perhaps 50 others like it.
My idea is that the expression above might be expressed as l[12345:>10].

Sorry, this strike me as ugly, too much like and easily confused with
l[12345:-10], and too much looking like a syntax error.

Given that some other languages slice with (start,len) arguments (but not
then, that I remember or know of, also with a start,stop option), I am
*sure* that Guido thought carefully about the issue. A plus with his
choice is ability to offset (index) from the end *without* calling the len
function.

Well I hate his choice. It is inconsistent with the fact that generally
l[a:b] produces the empty list when a > b.

It is only inconsistent with the Zen of python which says there should
be only way to do something.
 
F

Fredrik Lundh

Noam said:
I have to write:
picture[s2i(picture[10:14]):s2i(picture[10:14])+4]
I think this is nicer (and quicker):
picture[s2i(picture[10:>4]):>4]

that's spelled

picture = Image.open(file)
picture.getpixel((0, 0))

</F>
 
P

Peter Abel

Noam Raphael said:
Hello,
Many times I find myself asking for a slice of a specific length, and
writing something like l[12345:12345+10].
This happens both in interactive use and when writing Python programs,
where I have to write an expression twice (or use a temporary variable).

Wouldn't it be nice if the Python grammar had supported this frequent
use? My idea is that the expression above might be expressed as
l[12345:>10].

This change, as far as I can see, is quite small: it affects only the
grammar and byte-compiling, and has no side effects.

The only change in syntax is that short_slice would be changed from
[lower_bound] ":" [upper_bound]
to
([lower_bound] ":" [upper_bound]) | ([lower_bound] ":>" [slice_length])

Just to show what will happen to the byte code: l[12345:12345+10] is
compiled to:
LOAD_GLOBAL 0 (l)
LOAD_CONST 1 (12345)
LOAD_CONST 1 (12345)
LOAD_CONST 2 (10)
BINARY_ADD
SLICE+3

I suggest that l[12345:>10] would be compiled to:
LOAD_GLOBAL 0 (l)
LOAD_CONST 1 (12345)
DUP_TOP
LOAD_CONST 2 (10)
BINARY_ADD
SLICE+3

Well, what do you think? I would like to hear your comments.

Have a good day (or night),
Noam Raphael

Python has ready a workaround for nearly ervery problem.
What about the following?
# iNCREMENTALslICE
isl=lambda l,start,increment:l.__getslice__(start,start+increment)
l='zero one two three four five six'.split()
l ['zero', 'one', 'two', 'three', 'four', 'five', 'six']
isl(l,3,3) ['three', 'four', 'five']

Regards
Peter
 
N

Noam Raphael

Fredrik said:
Noam Raphael wrote:

I have to write:
picture[s2i(picture[10:14]):s2i(picture[10:14])+4]
I think this is nicer (and quicker):
picture[s2i(picture[10:>4]):>4]


that's spelled

picture = Image.open(file)
picture.getpixel((0, 0))

</F>
Hello,
Thanks for your suggestion, but I meant to give an example for the need
of those slices when handling files of a format which is not already
handled by a module someone wrote.
And what if I want to write a new module for handling images?

Noam Raphael
 
N

Noam Raphael

Terry said:
Projecting thoughts into my brain that I never had is stupid. I really
don't like that.
When you assume that only 50 people would like my suggestion, you assume
that all the other 99.99 percent of Python users wouldn't like it, just
because you don't. If I am wrong - correct me.
Making false ad hominen comments is stupid. I don't like that either.

I an disappointed. Sorry I took your request for comments *on the
proposal* seriously.

Terry J. Reedy
As you may have noticed, I did take your comments seriously, and
referred to every one of them.
I'm sorry if my remark offended you. I will try to be more polite in my
future posts. However, I did sense a tone of impatience in your reply,
and I think you should try to eliminate it in your future posts.

Best wishes,
Noam Raphael
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,763
Messages
2,569,563
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top