pairs from a list

G

George Sakkis

Given the human psychology displayed involved, in the absence of
definitive evidence one way or another it is a far safer bet to assume
that people are unnecessarily asking for "the fastest" out of a misguided
and often ignorant belief that they need it, rather than the opposite.
People who actually need a faster solution usually know enough to preface
their comments with an explanation of why their existing solution is too
slow rather than just a context-free demand for "the fastest" solution.

As I mentioned already, I consider the seeking of the most efficient
solution a legitimate question, regardless of whether a "dumb"
solution is fast enough for an application. Call it a "don't be
sloppy" principle if you wish. It's the same reason I always use
xrange() instead of range() for a loop, although in practice the
difference is rarely measurable.
Fast code is like fast cars. There *are* people who really genuinely need
to have the fastest car available, but that number is dwarfed by the vast
legions of tossers trying to make up for their lack of self-esteem by
buying a car with a spoiler. Yeah, you're going to be traveling SO FAST
on the way to the mall that the car is at risk of getting airborne, sure,
we believe you.

(The above sarcasm naturally doesn't apply to those who actually do need
to travel at 200mph in a school zone, like police, taxi drivers and stock
brokers.)

Good example; it shows that there's more than the utilitarian point of
view. People don't buy these cars because of an actual need but rather
because of the brand, the (perceived) social value and other reasons.

And since you like metaphors, here's another one: caring about
efficient code only when you need it is like keeping notes for a
course only for the material to be included in the final exams,
skipping the more encyclopedic, general knowledge lectures. Sure, you
may pass the class, even with a good grade, but for some people a
class is more than a final grade.

George
 
S

Steven D'Aprano

As I mentioned already, I consider the seeking of the most efficient
solution a legitimate question, regardless of whether a "dumb" solution
is fast enough for an application. Call it a "don't be sloppy" principle
if you wish.

Sure, by why do you limit "efficient" and "don't be sloppy" to mean
"write the fastest executing code you can, regardless of every other
trade-off"?

Because there are trade-offs:

* execution time
* compilation time
* memory use at run-time
* size of source code on disk
* size of compiled code on disk
* ease of writing it in the first place
* ease of maintenance
* correctness
* fragility in the face of malformed data
* readability and ease of comprehension
* elegance (however you judge that!)
* making sure the usage of various resources never exceed some set of
upper bounds
etc.

Some of these are relatively unimportant (e.g. the size of the source
code), but others are extremely important and far too often ignored (e.g.
readability and even correctness).

In fact, "fastest" isn't even a meaningful attribute. Does it mean:

* the worst-case is fastest
* the best-case is fastest
* the average-case is fastest
* fastest on typical data
* all of the above
etc.

What counts as "typical" data? How do you average all the cases?

Asking "what's the fastest" without considering these issues is a good
sign that the person asking is being sloppy, something you should be
against.

It's the same reason I always use xrange() instead of
range() for a loop, although in practice the difference is rarely
measurable.

Well, that's (now) a no-brainer. Once upon a time, range() used to be
faster for small enough lists, and whatever difference there is in
readability is insignificant except for utter newbies, so there's no real
trade-off to make unless you care about the extra "x".

But... do you write list.__len__() instead of len(list) to save a few
nanoseconds?

If you do, you're guilty of pessimization instead of optimization: for
built-in lists, len() is actually faster, by a lot. It's only an
optimization -- and a significant one -- for custom classes.

So, do you write this:

if isinstance(alist, list):
value = len(alist) # optimize for lists
else:
value = alist.__len__() # optimize for classes

every time you want the length of an arbitrary sequence?

If you did, you'd be pessimizing again. That code runs nearly twice as
slowly as just calling the built-in len(). Optimizing the parts doesn't
mean you have optimized the whole.
 
A

Alan G Isaac

Steven said:
In fact, "fastest" isn't even a meaningful attribute. Does it mean:

* the worst-case is fastest
* the best-case is fastest
* the average-case is fastest
* fastest on typical data
* all of the above


I confess that it did not occur to me that there
might be an interesting distinction among these
cases for the question of how to get sequential
pairs from a list. How would one draw these
distinctions in this case?

Thanks,
Alan Isaac

PS Just for context, the sequential pairs were
needed in a simulation, but my question was
primarily prompted by my surprise that the
approaches I looked at differed as much as they did.
 
P

Paddy

For such trivial problems, getting it right alone isn't a particularly
high expectation.

Hi George, not 'alone' but 'first'. And it is the ultimate
expectation. Who wants to recieve wrong software?
The OP didn't mention anything about the context; for all we know,
this might be a homework problem or the body of a tight inner loop.

Thats why I thought to ask about the context.
There is this tendency on c.l.py to assume that every optimization
question is about a tiny subproblem of a 100 KLOC application. Without
further context, we just don't know.

Again, I did not assume; I asked about the wider context. I too don't
believe your statement about c.l.p.
I don't agree with this logic in general.
But you don't defend yourself well by such a far-fetched example.

I've heard quality expressed as "meeting requirements", which I think
is apt. Falling short of requirements everyone knows doesn't give a
quality result, but equally 'exceeding' requirements also detracts
from quality (as does not knowing your requirements).
It is good to learn optimization techniques, which may be part of what
you are saying, but part of that is learning when it pays to apply
them and/or search for them; and when it does not.

- Paddy.
 
G

George Sakkis

Sure, by why do you limit "efficient" and "don't be sloppy" to mean
"write the fastest executing code you can, regardless of every other
trade-off"?

I explicitly didn't limit sloppiness to inefficiency and mentioned
it's a tradeoff:

"... all else being equal or at least comparable (elegance,
conciseness, readability, etc.). Of course it's a tradeoff;
spending a week to save a few milliseconds on average is usually a
waste for most applications, but being a lazy keyboard banger writing
the first thing that pops into mind is not that good either."
But... do you write list.__len__() instead of len(list) to save a few
nanoseconds?

No, of course not, it's not worth it, but that doesn't mean that being
curious about what's faster and using timeit to find out is totally
worthless.

Another example: avoiding attribute lookups within a loop. I rarely
write
bar = foo.bar
for i in big_list: bar(i)

but it's valuable to know that it can make a difference when I really
need it. Always writing the first thing that "just works" prevents one
from even considering that there might be faster (or more elegant,
more general, etc.) alternatives.

George
 
G

George Sakkis

I've heard quality expressed as "meeting requirements", which I think
is apt. Falling short of requirements everyone knows doesn't give a
quality result, but equally 'exceeding' requirements also detracts
from quality (as does not knowing your requirements).
It is good to learn optimization techniques, which may be part of what
you are saying, but part of that is learning when it pays to apply
them and/or search for them; and when it does not.

The OP wanted an answer to a simple question, not a lecture on good
software engineering principles. This whole subthread reminds of a
movie (can't remember which) where someone asks his buddy in the
stadium "what do you want?". His buddy gets it wrong and embarks in a
long diatribe of what he wants in life now, what he wanted as a child,
what's the meaning of one's life and so on. After a couple of minutes
the guy cuts him and asks again:
- "Man, what do you want, burger or hot dog?"
- "Oh, a hot dog".

Sometimes you want to see the tree right in front of you, not the
whole damn forest.

George
 
S

Steven D'Aprano

I explicitly didn't limit sloppiness to inefficiency and mentioned it's
a tradeoff:

Of course you did, and I was being sloppy. The "you" was meant more as a
generic you than you yourself. Sorry for the confusion.

As for your other points, I think we're actually very much in agreement,
except for your tolerance of random posters asking what I believe is an
incoherent question: "what's the fastest way to do ...?". It seems to me
you're willing to give them the benefit of the doubt that they've done
their profiling and considered their trade-offs, or at the very worst are
asking from purely intellectual curiosity. Call me cynical if you like,
but I think that in the absence of any direct evidence supporting those
things, the most likely possibility is the opposite.
 
S

Steven D'Aprano

I confess that it did not occur to me that there might be an interesting
distinction among these cases for the question of how to get sequential
pairs from a list. How would one draw these distinctions in this case?

Well, in this *specific* case I don't think they're terribly interesting
because your problem is so simple. Best-case will, presumably, occur when
the list is empty. Worst-case will occur if somebody passes a non-
terminating iterator to your code (although that depends on the nature of
your solution and how you are using it).

But in the broader context of optimizing, these distinctions are very
important. For example, Quicksort is very fast on randomly ordered data,
but terribly slow on data which is already sorted: about as slow as
Bubblesort. Mergesort's best-case isn't as fast as Quicksort's best-case,
but it's worst-case behaviour is much, much better. And both Quicksort
and Mergesort are complex enough that a simpler, slower algorithm like
Shell Sort will be significantly faster for small data sets.

But I'm sure you already know all this, I'm just mentioning it for the
benefit of anybody else reading who still thinks that a generic "what's
the fastest way" type question is meaningful.
 
S

Steven D'Aprano

The OP wanted an answer to a simple question, not a lecture on good
software engineering principles.


(1) It isn't a simple question. There were at least five alternatives,
one of which would have been "fastest" except it failed the correctness
test. The OP went on to ask:

"I suppose my question should have been, is there an obviously faster way?
Anyway, of the four ways below, the first is substantially fastest.
[Note: actually it wasn't.] Is there an obvious reason why?"

Asking *why* something is faster is very different from asking *which* is
faster. I don't believe anyone actually answered that "why" question.

So, for the record, here's my attempt at an answer:

No, there is no "obvious" reason why one solution should be faster than
another. But perhaps the two most general rules of thumb are:

* the more work you can push into built-ins written in C, and the least
in slow code written in Python, the faster;

* lazy iterators are often (but not always) faster than building the
entire list all at once.



Other than that, the details of why one particular technique is faster
than another depends on the implementation of each, and that's not
obvious.



(2) What the OP wanted and what he needed are not necessarily the same.
Answering the OP's direct question is like giving him a fish -- he'll be
hungry tomorrow. Teaching him good engineering principles is like
teaching him to fish.

(3) This is Usenet, and the advice is free. Sometimes you get more than
you expected, sometimes less. Regardless of what the OP wanted, the
thread will go where the thread will go. The discussions of software
engineering were mostly in response to you, not the OP.
 
G

George Sakkis

As for your other points, I think we're actually very much in agreement,
except for your tolerance of random posters asking what I believe is an
incoherent question: "what's the fastest way to do ...?". It seems to me
you're willing to give them the benefit of the doubt that they've done
their profiling and considered their trade-offs, or at the very worst are
asking from purely intellectual curiosity.

It depends on the specific problem and the context. For small well-
defined algorithmic type problems, I just assume it's intellectual
curiosity or that performance really matters. If the same question was
asked in the context of, say, a typical web application fetching data
from a database and rendering dynamic pages, I might have dismissed it
as YAGNI.

George
 
P

Paddy

It depends on the specific problem and the context. For small well-
defined algorithmic type problems, I just assume it's intellectual
curiosity or that performance really matters. If the same question was
asked in the context of, say, a typical web application fetching data
from a database and rendering dynamic pages, I might have dismissed it
as YAGNI.

George

George, I tend to think that the more context an OP gives, the more
thought they have given their problem. Often, to get a better answer
you need to help the OP divulge context.

I too like the intellectual challenge of exploring small problem, and
from lurking on c.l.p I thought there would be lots of answers of that
ilk, but this time I thought why not contribute in a different way?

Reading c.l.p I am often reminded of good software practice memes in
answers and think its part of what makes c.l.p. a rewarding
experience. It may be hubris to think that a random reader might read
my post and then follow my points before making a routine faster; but
there you go :)

- Paddy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top