max(), sum(), next()

B

bearophileHUGS

Empty Python lists [] don't know the type of the items it will
contain, so this sounds strange:
0

Because that [] may be an empty sequence of someobject:
sum(s for s in ["a", "b"] if len(s) > 2)
0

In a statically typed language in that situation you may answer the
initializer value of the type of the items of the list, as I do in the
sum() in D.

This sounds like a more correct/clean thing to do:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: max() arg is an empty sequence

So it may be better to make the sum([]) too raise a ValueError, in
Python 3/3.1 (if this isn't already true). On the other hand often
enough I have code like this:

This may raise the ValueError both if iterable is empty of if the
predicate on its items is always false, so instead of catching
exceptions, that I try to avoid, I usually end with a normal loop,
that's readable and fast:

max_value = smallvalue
for x in iterable:
if predicate(x):
max_value = max(max_value, fun(x))

Where running speed matters, I may even replace that max(max_value,
fun(x)) with a more normal if/else.

A possible alternative is to add a default to max(), like the next()
built-in of Python 2.6:

This returns smallvalue if there are no items to compute the max of.

Bye,
bearophile
 
S

Sion Arrowsmith

Empty Python lists [] don't know the type of the items it will
contain, so this sounds strange:
sum(...)
sum(sequence, start=0) -> value
Traceback (most recent call last):
File said:
sum((range(x) for x in range(5)), [])
[0, 0, 1, 0, 1, 2, 0, 1, 2, 3]

.... so the list might not know what type it contains, but sum
does. And if you don't tell it, it makes a sensible guess. And
it *is* a case where refusing the temptation to guess is the
wrong thing: how many times would you use sum to do anything
other than sum numeric values? And how tedious would it be to
have to write sum(..., 0) for every other case? Particularly
bearing in mind:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sum() can't sum strings [use ''.join(seq) instead]
 
L

Laszlo Nagy

Empty Python lists [] don't know the type of the items it will
contain, so this sounds strange:

0

Because that [] may be an empty sequence of someobject:

You are right in that sum could be used to sum arbitrary objects.
However, in 99.99% of the cases, you will be summing numerical values.
When adding real numbers, the neutral element is zero. ( X + 0 = X) It
is very logical to return zero for empty sequences.

Same way, if we would have a prod() function, it should return one for
empty sequences because X*1 = X. The neutral element for this operation
is one.

Of course this is not good for summing other types of objects. But how
clumsy would it be to use

sum( L +[0] )

or

if L:
value = sum(L)
else:
value = 0

instead of sum(L).

Once again, this is what sum() is used for in most cases, so this
behavior is the "expected" one.

Another argument to convince you: the sum() function in SQL for empty
row sets returns zero in most relational databases.

But of course it could have been implemented in a different way... I
believe that there have been excessive discussions about this decision,
and the current implementation is very good, if not the best.

Best,

Laszlo
 
M

MRAB

Empty Python lists [] don't know the type of the items it will
contain, so this sounds strange:

Because that [] may be an empty sequence of someobject:

You are right in that sum could be used to sum arbitrary objects.
However, in 99.99% of the cases, you will be summing numerical values.
When adding real numbers, the neutral element is zero. ( X + 0 = X) It
is very logical to return zero for empty sequences.

Same way, if we would have a prod() function, it should return one for
empty sequences because X*1 = X. The neutral element for this operation
is one.

Of course this is not good for summing other types of objects. But how
clumsy would it be to use

sum( L +[0] )

or

if L:
value = sum(L)
else:
value = 0

instead of sum(L).

Once again, this is what sum() is used for in most cases, so this
behavior is the "expected" one.

Another argument to convince you: the sum() function in SQL for empty
row sets returns zero in most relational databases.

But of course it could have been implemented in a different way... I
believe that there have been excessive discussions about this decision,
and the current implementation is very good, if not the best.
An alternative would be for the start value to default to None, which
would mean no start value. At the moment it starts with the start
value and then 'adds' the items in the sequence to it, but it could
start with the first item and then 'add' the following items to it.
So:

sum([1, 2, 3]) => 6
sum(["a", "b", "c"]) => "abc"

For backward compatibility, if the sequence is empty and the start
value is None then return 0.
 
B

bearophileHUGS

Laszlo Nagy:
I believe that there have been excessive discussions about this
decision, and the current implementation is very good, if not the best.

I see. But note that my post is mostly about the max()/min()
functions :)

Bye,
bearophile
 
M

Mensanator

Empty Python lists [] don't know the type of the items it will
contain, so this sounds strange:

Because that [] may be an empty sequence of someobject:

You are right in that sum could be used to sum arbitrary objects.
However, in 99.99% of the cases, you will be summing numerical values.
When adding real numbers, the neutral element is zero. ( X + 0 = X) It
is very logical to return zero for empty sequences.

No it isn't. Nothing is not 0, check with MS-Access, for instance:

Null + 1 returns Null. Any arithmetic expression involving a
Null evaluates to Null. Adding something to an unknown returns
an unknown, as it should.

It is a logical fallacy to equate unknown with 0.

For example, the water table elevation in ft above Mean Sea Level
is WTE = TopOfCasing - DepthToWater.

TopOfCasing is usually known and constant (until resurveyed).
But DepthToWater may or may not exist for a given event (well
may be covered with fire ants, for example).

Now, if you equate Null with 0, then the WTE calculation says
the water table elevation is flush with the top of the well,
falsely implying that the site is underwater.

And, since this particular site is on the Mississippi River,
it sometimes IS underwater, but this is NEVER determined by
water table elevations, which, due to the CORRECT treatment
of Nulls by Access, never returns FALSE calculations.
0

is a bug, just as it's a bug in Excel to evaluate blank cells
as 0. It should return None or throw an exception like sum([None,1])
does.
Same way, if we would have a prod() function, it should return one for
empty sequences because X*1 = X. The neutral element for this operation
is one.

Of course this is not good for summing other types of objects. But how
clumsy would it be to use

sum( L +[0] )

or

if L:
value = sum(L)
else:
value = 0

instead of sum(L).

Once again, this is what sum() is used for in most cases, so this
behavior is the "expected" one.

Another argument to convince you: the sum() function in SQL for empty
row sets returns zero in most relational databases.

But of course it could have been implemented in a different way... I
believe that there have been excessive discussions about this decision,
and the current implementation is very good, if not the best.

Best,

Laszlo
 
C

castironpi

Empty Python lists [] don't know the type of the items it will
contain, so this sounds strange:

0

Because that [] may be an empty sequence of someobject:
sum(s for s in ["a", "b"] if len(s) > 2)

0

In a statically typed language in that situation you may answer the
initializer value of the type of the items of the list, as I do in the
sum() in D.

This sounds like a more correct/clean thing to do:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: max() arg is an empty sequence

So it may be better to make the sum([]) too raise a ValueError, in
Python 3/3.1 (if this isn't already true). On the other hand often
enough I have code like this:

This may raise the ValueError both if iterable is empty of if the
predicate on its items is always false, so instead of catching
exceptions, that I try to avoid, I usually end with a normal loop,
that's readable and fast:

max_value = smallvalue
for x in iterable:
    if predicate(x):
        max_value = max(max_value, fun(x))

Where running speed matters, I may even replace that max(max_value,
fun(x)) with a more normal if/else.

A possible alternative is to add a default to max(), like the next()
built-in of Python 2.6:

This returns smallvalue if there are no items to compute the max of.

Bye,
bearophile

Two thoughts:
1/ 'Reduce' has a 'default' argument-- they call it 'initial'.
reduce( max, [ 0, 1, 2, 3 ] ) 3
reduce( max, [ 0, 1, 2, 'a' ] ) 'a'
reduce( max, [ 0, 1, 2, 'a', 'b' ] )
'b'

2/ Introduce a 'max' class object that takes a default type or default
argument. Query the default for an 'additive' identity, or query for
a 'comparitive' identity, comparisons to which always return true; or
call the constructor with no arguments to construct one.
 
S

Steven D'Aprano

0

is a bug, just as it's a bug in Excel to evaluate blank cells as 0. It
should return None or throw an exception like sum([None,1]) does.

You're wrong, because 99.9% of the time when users leave a blank cell in
Excel, they want it to be treated as zero. Spreadsheet sum() is not the
same as mathematician's sum, which doesn't have a concept of "blank
cells". (But if it did, it would treat them as zero, since that's the
only useful thing and mathematicians are just as much pragmatists as
spreadsheet users.) The Excel code does the right thing, and your "pure"
solution would do the unwanted and unexpected thing and is therefore
buggy.

Bugs are defined by "does the code do what the user wants it to do?", not
"is it mathematically pure?". The current behaviour of sum([]) does the
right thing for the 99% of the time when users expect an integer. And the
rest of the time, they have to specify a starting value for the sum
anyway, and so sum([], initial_value) does the right thing *always*.

The only time it does the wrong thing[1] is when you forget to pass an
initial value but expect a non-numeric result. And that's the
programmer's error, not a function bug.





[1] I believe it also does the wrong thing by refusing to sum strings,
but that's another story.
 
L

Luis Zarrabeitia

Quoting Laszlo Nagy said:
Empty Python lists [] don't know the type of the items it will
contain, so this sounds strange:

0

Because that [] may be an empty sequence of someobject:

You are right in that sum could be used to sum arbitrary objects.
However, in 99.99% of the cases, you will be summing numerical values.
When adding real numbers, the neutral element is zero. ( X + 0 = X) It
is very logical to return zero for empty sequences.

Even better:

help(sum) shows

===
sum(...)
sum(sequence, start=0) -> value

Returns the sum of a sequence of numbers (NOT strings) plus the value
of parameter 'start'. When the sequence is empty, returns start.
===

so the fact that sum([]) returns zero is just because the start value is zero...
sum([],object()) would return an object().

BTW, the original code:
sum(s for s in ["a", "b"] if len(s) > 2)

wouldn't work anyway... it seems that sum doesn't like to sum strings:

<type 'exceptions.TypeError'>: sum() can't sum strings [use ''.join(seq) instead]

Cheers,
 
M

Mensanator

sum([]) 0

is a bug, just as it's a bug in Excel to evaluate blank cells as 0. It
should return None or throw an exception like sum([None,1]) does.

You're wrong, because 99.9% of the time when users leave a blank cell in
Excel, they want it to be treated as zero.

Then 99.9% of users want the wrong thing. Microsoft knows that
this is a bug but refuses to fix it to prevent breaking legacy
documents (probably dating back to VisiCalc). When graphimg data,
a missing value should be interpreted as a hole in the graph

+------+ +--+------+------+-----+


and not evaluated as 0

+------+ +--+------+------+-----+
\ /
\ /
\ /
\ /
\ /
\+/

(depending on the context of the graph, of course).

And Microsoft provides a workaround for graphs to make 0's
appear as holes. Of course, this will cause legitimate 0
values to disappear, so the workaround is inconsistent.

Spreadsheet sum() is not the
same as mathematician's sum, which doesn't have a concept of "blank
cells". (But if it did, it would treat them as zero, since that's the
only useful thing and mathematicians are just as much pragmatists as
spreadsheet users.) The Excel code does the right thing, and your "pure"
solution would do the unwanted and unexpected thing and is therefore
buggy.

Apparently, you don't use databases or make surface contours.
Contour programs REQUIRE that blanks are null, not 0, so that
the Kriging algorithm interpolates around the holes rather than
return false calculations. Excel's treatment of blank cells is
inconsistent with Access' treatment of Nulls and therefore wrong,
anyway you slice it. Math isn't a democracy, what most people want
is irrelevant.

I don't pull these things out of my ass, it's real world stuff
I observe when I help CAD operators and such debug problems.

Maybe you want to say a bug is when it doesn't do what the
author intended, but I say if what the intention was is wrong,
then a perfect implentation is still a bug because it doesn't
do what it's supposed to do.
Bugs are defined by "does the code do what the user wants it to do?", not
"is it mathematically pure?".

ReallY? So you think math IS a democracy? There is no reason to
violate
mathematical purity. If I don't get EXACTLY the same answer from
Excel,
Access, Mathematica and Python, then SOMEBODY is wrong. It would be a
shame if that somebody was Python.
The current behaviour of sum([]) does the
right thing for the 99% of the time when users expect an integer.

Why shouldn't the users expect an exception? Isn't that why we have
try:except? Maybr 99% of users expect sum([])==0, but _I_ expect to
be able to distinguish an empty list from [4,-4].
And the
rest of the time, they have to specify a starting value for the sum
anyway, and so sum([], initial_value) does the right thing *always*.

So if you really want [] to be 0, why not say sum([],0)?

Why shouldn't nothing added to nothing return nothing?
Having it evaluate to 0 is wrong 99.9% of the time.
The only time it does the wrong thing[1] is when you forget to pass an
initial value but expect a non-numeric result. And that's the
programmer's error, not a function bug.

[1] I believe it also does the wrong thing by refusing to sum strings,
but that's another story.
 
M

Mensanator

sum([])
0
is a bug, just as it's a bug in Excel to evaluate blank cells as 0. It
should return None or throw an exception like sum([None,1]) does.
You're wrong, because 99.9% of the time when users leave a blank cell in
Excel, they want it to be treated as zero.

Then 99.9% of users want the wrong thing. Microsoft knows that
this is a bug but refuses to fix it to prevent breaking legacy
documents (probably dating back to VisiCalc). When graphimg data,
a missing value should be interpreted as a hole in the graph

+------+ � � � � � � +--+------+------+-----+

and not evaluated as 0

+------+ � � � � � � +--+------+------+-----+
� � � � \ � � � � � /
� � � � �\ � � � � /
� � � � � \ � � � /
� � � � � �\ � � /
� � � � � � \ � /
� � � � � � �\+/

(depending on the context of the graph, of course).

And Microsoft provides a workaround for graphs to make 0's
appear as holes. Of course, this will cause legitimate 0
values to disappear, so the workaround is inconsistent.

I just checked and I mis-remembered how this works.
The option is for blanks to plot as holes or 0 or
be interpolated. 0 always plots as 0. The inconsistency
is that blanks are still evaluated as 0 in formulae
and macros.
Spreadsheet sum() is not the
same as mathematician's sum, which doesn't have a concept of "blank
cells". (But if it did, it would treat them as zero, since that's the
only useful thing and mathematicians are just as much pragmatists as
spreadsheet users.) The Excel code does the right thing, and your "pure"
solution would do the unwanted and unexpected thing and is therefore
buggy.

Apparently, you don't use databases or make surface contours.
Contour programs REQUIRE that blanks are null, not 0, so that
the Kriging algorithm interpolates around the holes rather than
return false calculations. Excel's treatment of blank cells is
inconsistent with Access' treatment of Nulls and therefore wrong,
anyway you slice it. Math isn't a democracy, what most people want
is irrelevant.

I don't pull these things out of my ass, it's real world stuff
I observe when I help CAD operators and such debug problems.

Maybe you want to say a bug is when it doesn't do what the
author intended, but I say if what the intention was is wrong,
then a perfect implentation is still a bug because it doesn't
do what it's supposed to do.


Bugs are defined by "does the code do what the user wants it to do?", not
"is it mathematically pure?".

ReallY? So you think math IS a democracy? There is no reason to
violate
mathematical purity. If I don't get EXACTLY the same answer from
Excel,
Access, Mathematica and Python, then SOMEBODY is wrong. It would be a
shame if that somebody was Python.
The current behaviour of sum([]) does the
right thing for the 99% of the time when users expect an integer.

Why shouldn't the users expect an exception? Isn't that why we have
try:except? Maybr 99% of users expect sum([])==0, but _I_ expect to
be able to distinguish an empty list from [4,-4].
And the
rest of the time, they have to specify a starting value for the sum
anyway, and so sum([], initial_value) does the right thing *always*.

So if you really want [] to be 0, why not say sum([],0)?

Why shouldn't nothing added to nothing return nothing?
Having it evaluate to 0 is wrong 99.9% of the time.




The only time it does the wrong thing[1] is when you forget to pass an
initial value but expect a non-numeric result. And that's the
programmer's error, not a function bug.
[1] I believe it also does the wrong thing by refusing to sum strings,
but that's another story.

- Show quoted text -
 
F

Fredrik Lundh

Mensanator said:
No it isn't. Nothing is not 0, check with MS-Access, for instance:

Null + 1 returns Null. Any arithmetic expression involving a
Null evaluates to Null. Adding something to an unknown returns
an unknown, as it should.

It is a logical fallacy to equate unknown with 0.

http://en.wikipedia.org/wiki/Empty_sum

"In mathematics, the empty sum, or nullary sum, is the result of adding
no numbers, in summation for example. Its numerical value is zero."

</F>
 
S

Steven D'Aprano

sum([])
0
is a bug, just as it's a bug in Excel to evaluate blank cells as 0.
It should return None or throw an exception like sum([None,1]) does.

You're wrong, because 99.9% of the time when users leave a blank cell
in Excel, they want it to be treated as zero.

Then 99.9% of users want the wrong thing.

It is to laugh.


Microsoft knows that this is a bug

Says you.

but refuses to fix it to prevent breaking legacy documents (probably
dating back to VisiCalc). When graphimg data, a missing value should be
interpreted as a hole in the graph

"Graphing data" is not sum(). I don't expect graphing data to result in
the same result as sum(), why would I expect them to interpret input the
same way?

+------+ +--+------+------+-----+

Why should the graphing application ignore blanks ("missing data"), but
sum() treat missing data as an error? That makes no sense at all.


and not evaluated as 0

And Microsoft provides a workaround for graphs to make 0's appear as
holes. Of course, this will cause legitimate 0 values to disappear, so
the workaround is inconsistent.

I'm not aware of any spreadsheet that treats empty cells as zero for the
purpose of graphing, and I find your claim that Excel can't draw graphs
with zero in them implausible, but I don't have a copy of Excel to test
it.


Apparently, you don't use databases or make surface contours.

Neither databases nor surface contours are sum(). What possible relevance
are they to the question of what sum() should do?

Do you perhaps imagine that there is only "ONE POSSIBLE CORRECT WAY" to
deal with missing data, and every function and program must deal with it
the same way?

Contour programs REQUIRE that blanks are null, not 0

Lucky for them that null is not 0 then.

so that the Kriging
algorithm interpolates around the holes rather than return false
calculations. Excel's treatment of blank cells is inconsistent with
Access' treatment of Nulls and therefore wrong, anyway you slice it.

No no no, you messed that sentence up. What you *really* meant was:

"Access' treatment of Nulls is inconsistent with Excel's treatment of
blank cells and therefore wrong, anyway you slice it."

No of course not. That would be stupid, just as stupid as your sentence.
Excel is not Access. They do different things. Why should they
necessarily interpret data the same way?

Maybe you want to say a bug is when it doesn't do what the author
intended, but I say if what the intention was is wrong, then a perfect
implentation is still a bug because it doesn't do what it's supposed to
do.

Who decides what it is supposed to do if not the author? You, in your
ivory tower who doesn't care a fig for what people want the software to
do?

Bug report: "Software does what users want it to do."
Fix: "Make the software do something that users don't want."

Great.

ReallY? So you think math IS a democracy? There is no reason to violate
mathematical purity.

You've given a good example yourself: the Kriging algorithm needs a Null
value which is not zero. There is no mathematical "null" which is
distinct from zero, so there's an excellent violation of mathematical
purity right there.


If I am given the job of adding up the number of widgets inside a box,
and the box is empty, I answer that there are 0 widgets inside it. If I
were to follow your advice and declare that "An error occurred, can't
determine the number of widgets inside an empty box!" people would treat
me as an idiot, and rightly so.


If I don't get EXACTLY the same answer from Excel,
Access, Mathematica and Python, then SOMEBODY is wrong. It would be a
shame if that somebody was Python.

Well Excel, Python agree that the sum of an empty list is 0. What do
Access and Mathematica do?


The current behaviour of sum([]) does the right thing for the 99% of
the time when users expect an integer.

Why shouldn't the users expect an exception? Isn't that why we have
try:except? Maybr 99% of users expect sum([])==0, but _I_ expect to be
able to distinguish an empty list from [4,-4].

The way to distinguish lists is NOT to add them up and compare the sums:
sum([4, -4]) == sum([0]) == sum([1, 2, 3, -6]) == sum([-1, 2, -1])
True

The correct way is by comparing the lists themselves:
False


And the
rest of the time, they have to specify a starting value for the sum
anyway, and so sum([], initial_value) does the right thing *always*.

So if you really want [] to be 0, why not say sum([],0)?

I don't want [] == 0. That's foolish. I want the sum of an empty list to
be 0, which is a very different thing.

And I don't need to say sum([],0) because the default value for the
second argument is 0.


Why shouldn't nothing added to nothing return nothing? Having it
evaluate to 0 is wrong 99.9% of the time.

It is to laugh.

What's the difference between having 0 widgets in a box and having an
empty box with, er, no widgets in it?
 
T

Thomas Bellman

Mensanator said:
No, but blank cells are 0 as far as Excel is concerned.
That behaviour causes nothing but trouble and I am
saddened to see Python emulate such nonsense.

Then you should feel glad that the Python sum() function *does*
signal an error for the closest equivalent of "blank cells" in
a list:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

Summing the elements of an empty list is *not* the same thing as
summing elements of a list where one element is None.

There are no "empty" boxes. There are only boxes with
known quantities and those with unknown quantities.
I hope that's not too ivory tower.

The sum() function in Python requires exactly one box. That box
can be empty, can contain "known quantities" (numbers, presumably),
or "unknown quantities" (non-numbers, e.g., None). But you can't
give it zero boxes, or three boxes.


I don't have a strong view of whether sum([]) should return 0 or
raise an error, but please do not mix that question up with what
a sum over empty cells or over NULL values should yield. They
are very different questions.

As it happens, the SQL sum() function (at least in MySQL; I don't
have any other database easily available, nor any SQL standard to
read) does return NULL for a sum over the empty sequence, so you
could argue that that would be the correct behaviour for the
Python sum() function as well, but you can't argue that because a
sum *involving* a NULL value returns NULL.
 
M

Mensanator

On Wed, 03 Sep 2008 16:20:39 -0700, Mensanator wrote:
sum([])
0
is a bug, just as it's a bug in Excel to evaluate blank cells as 0.
It should return None or throw an exception like sum([None,1]) does.
You're wrong, because 99.9% of the time when users leave a blank cell
in Excel, they want it to be treated as zero.
Then 99.9% of users want the wrong thing.

It is to laugh.
Microsoft knows that this is a bug

Says you.
but refuses to fix it to prevent breaking legacy documents (probably
dating back to VisiCalc). When graphimg data, a missing value should be
interpreted as a hole in the graph

"Graphing data" is not sum(). I don't expect graphing data to result in
the same result as sum(), why would I expect them to interpret input the
same way?
+------+ +--+------+------+-----+

Why should the graphing application ignore blanks ("missing data"), but
sum() treat missing data as an error? That makes no sense at all.

Maybe it's important to know data is missing. You can see
the holes in a graph. You can't see the holes in a sum.
I'm not aware of any spreadsheet that treats empty cells as zero for the
purpose of graphing, and I find your claim that Excel can't draw graphs
with zero in them implausible, but I don't have a copy of Excel to test
it.

That was a mistake. I made a followup correction, but
you probably didn't see it.
Neither databases nor surface contours are sum(). What possible relevance
are they to the question of what sum() should do?

Because a sum that includes Nulls isn't valid. If you treated
Nulls as 0, then not only would your sum be wrong, but so
would your count and the average based on those. Now you
can EXPLICITLY tell the database to only consider non-Null
values, which doesn't change the total, but DOES change
the count.
Do you perhaps imagine that there is only "ONE POSSIBLE CORRECT WAY" to
deal with missing data, and every function and program must deal with it
the same way?

But that's what sum() is doing now, treating sum([]) the same
as sum([],0). Why isn't sum() defined such that "...if list
is empty, return start, IF SPECIFIED, otherwise raise exception."
Then, instead of "ONE POSSIBLE CORRECT WAY", the user could
specify whether he wants Excel compatible behaviour or
Access compatible behaviour.
Lucky for them that null is not 0 then.

No, but blank cells are 0 as far as Excel is concerned.
That behaviour causes nothing but trouble and I am
saddened to see Python emulate such nonsense.
No no no, you messed that sentence up. What you *really* meant was:

"Access' treatment of Nulls is inconsistent with Excel's treatment of
blank cells and therefore wrong, anyway you slice it."

No of course not. That would be stupid, just as stupid as your sentence.
Excel is not Access. They do different things. Why should they
necessarily interpret data the same way?

Because you want consistent results?
Who decides what it is supposed to do if not the author?

The author can't change math on a whim.
You, in your ivory tower who doesn't care a fig for
what people want the software to do?

True, I could care less what peole want to do...

....as long as they do it consistently.
Bug report: "Software does what users want it to do."
Fix: "Make the software do something that users don't want."

What the users want doesn't carry any weight with respect
to what the database wants. The user must conform to the
needs of the database because the other way ain't ever gonna
happen.

If only. But then, I probably wouldn't have a job.
You've given a good example yourself: the Kriging algorithm needs a Null
value which is not zero. There is no mathematical "null" which is
distinct from zero, so there's an excellent violation of mathematical
purity right there.

Hey, I was talking databases, you brought up mathematical purity.
If I am given the job of adding up the number of widgets inside a box,
and the box is empty, I answer that there are 0 widgets inside it.

Right. it has a known quantity and that quantity is 0.
Just because the box is empty doesn't mean the quantity
is Null.
If I
were to follow your advice and declare that "An error occurred, can't
determine the number of widgets inside an empty box!" people would treat
me as an idiot, and rightly so.

Right. But a better analogy is when a new shipment is due
but hasn't arrived yet so the quantity is unknown. Now the
boss comes up and says he needs to ship 5 widgets tomorrow
and asks how many you have. You say 0. Now the boss runs
out to Joe's Widget Emporium and pays retail only to discover
when he gets back that the shipment has arrived containing
12 widgets. Because you didn't say "I don't know, today's
shipment isn't here yet", the boss not only thinks you're
an idiot, but he fires you as well.
Well Excel, Python agree that the sum of an empty list is 0. What do
Access and Mathematica do?

I don't know abaout Mathmatica, but if you EXPLICITLY
tell Access to sum only the non-Null values, you'll get the
same answer Excel does. Otherwise, any expression that
includes a Null evaluates to Null, which certainly isn't
the same answer Excel gives.
The current behaviour of sum([]) does the right thing for the 99% of
the time when users expect an integer.
Why shouldn't the users expect an exception? Isn't that why we have
try:except? Maybr 99% of users expect sum([])==0, but _I_ expect to be
able to distinguish an empty list from [4,-4].

The way to distinguish lists is NOT to add them up and compare the sums:
sum([4, -4]) == sum([0]) == sum([1, 2, 3, -6]) == sum([-1, 2, -1])

True

The correct way is by comparing the lists themselves:
[] == [4, -4]
False
And the
rest of the time, they have to specify a starting value for the sum
anyway, and so sum([], initial_value) does the right thing *always*.
So if you really want [] to be 0, why not say sum([],0)?

I don't want [] == 0. That's foolish. I want the sum of an empty list to
be 0, which is a very different thing.

In certain circumstances. In others, an empty list summing
to 0 is just as foolish. That's why sum([]) should be an
error, so you can have it either way.

Isn't one of Python's slogans "Explicit is better than implicit"?
And I don't need to say sum([],0) because the default value for the
second argument is 0.

That's the problem. There is no justification for assuming
that unknown quantities are 0.
It is to laugh.

What's the difference between having 0 widgets in a box and having an
empty box with, er, no widgets in it?

There are no "empty" boxes. There are only boxes with
known quantities and those with unknown quantities.
I hope that's not too ivory tower.
 
M

Mensanator

 Mensanator said:
(e-mail address removed) wrote:
Empty Python lists [] don't know the type of the items it will
contain, so this sounds strange:
sum([])
0
Because that [] may be an empty sequence of someobject:
You are right in that sum could be used to sum arbitrary objects.
However, in 99.99% of the cases, you will be summing numerical values..
When adding real numbers, the neutral element is zero. ( X + 0 = X) It
is very logical to return zero for empty sequences.
No it isn't. Nothing is not 0, check with MS-Access, for instance:
Null + 1 returns Null. Any arithmetic expression involving a
Null evaluates to Null. Adding something to an unknown returns
an unknown, as it should.
It is a logical fallacy to equate unknown with 0.

Which has nothing to do with the "right" value for an
empty sum.

I'm less concerned about the "right" value than a consistent
value. I'm fairly certain you can't get 0 from a query that
returns no records, so I don't like seeing empty being
treated as 0, even if it means that in set theory because
databases aren't sets.
If they hear about what you said here in
sci.math they're gonna kick you out

They usually don't kick me out, just kick me.
- what do you
imagine the universally accepted value of \sum_{j=1}^0
is?

I can't follow your banter, so I'm not sure what it should be.
For example, the water table elevation in ft above Mean Sea Level
is WTE = TopOfCasing - DepthToWater.
TopOfCasing is usually known and constant (until resurveyed).
But DepthToWater may or may not exist for a given event (well
may be covered with fire ants, for example).
Now, if you equate Null with 0, then the WTE calculation says
the water table elevation is flush with the top of the well,
falsely implying that the site is underwater.
And, since this particular site is on the Mississippi River,
it sometimes IS underwater, but this is NEVER determined by
water table elevations, which, due to the CORRECT treatment
of Nulls by Access, never returns FALSE calculations.

is a bug, just as it's a bug in Excel to evaluate blank cells
as 0. It should return None or throw an exception like sum([None,1])
does.
Same way, if we would have a prod() function, it should return one for
empty sequences because X*1 = X. The neutral element for this operation
is one.
Of course this is not good for summing other types of objects. But how
clumsy would it be to use
sum( L +[0] )
or
if L:
value = sum(L)
else:
value = 0
instead of sum(L).
Once again, this is what sum() is used for in most cases, so this
behavior is the "expected" one.
Another argument to convince you: the sum() function in SQL for empty
row sets returns zero in most relational databases.
But of course it could have been implemented in a different way... I
believe that there have been excessive discussions about this decision,
and the current implementation is very good, if not the best.
Best,
Laszlo
 
T

Thomas Bellman

Mensanator said:
Ok, but I don't understand why an empty list is a valid sum
whereas a list containing None is not.

You can't conclude the behaviour of the one from the behaviour
of the other, because the two situations have nothing at all in
common.
I'm not following that. Are you saying a query that returns no
records doesn't have a specific field containg a Null so there
are no Nulls to poison the sum? ...tap...tap...tap. Ok, I can see
that,
Exactly.

but you don't get 0 either.

That's because the SQL sum() has a special case for "no rows
returned". A *different* special case than the one that taint's
the sum when encountering a NULL. It does the equivalent of

if len(rows_returned) == 0:
# Special case for no rows returned
return NULL
total = 0
for row in rows_returned:
value = row[column]
if value is NULL:
# Special case for encountering a NULL value
return NULL
total += value
return total

Two different special cases for the two different situations. If
you were to remove the special case for no rows returned, you
would get zero when the SELECT statement finds no rows, but the
sum would still be tainted when a NULL value is encountered..

The definition of sum in mathematics *does* do away with that
special case. The sum of zero terms is zero. And the Python
sum() function follows the mathematics definition in this
respect, not the SQL definition.


You can argue that Python sum() should have special cased the
empty sequence. It's not an illogical stance to take. It's just
a totally different issue from encountering a non-numeric element
in the sequence. In some cases it might actually make sense to
treat the empty sequence as an error, but just ignore non-numeric
elements (i.e, treat them as if they were zero). And in some
cases both should be an error, and in some neither should be an
error.
 
M

Mensanator

Mensanator said:
No, but blank cells are 0 as far as Excel is concerned.
That behaviour causes nothing but trouble and I am
saddened to see Python emulate such nonsense.

Then you should feel glad that the Python sum() function *does*
signal an error for the closest equivalent of "blank cells" in
a list:

    >>> sum([1, 2, 3, None, 5, 6])
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

Yes, I am in fact happy to see that behaviour.
Summing the elements of an empty list is *not* the same thing as
summing elements of a list where one element is None.
So,
sum([1, 2, 3, None, 5, 6])
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
sum([1, 2, 3, None, 5, 6])
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

gives me an error.

As does
sum([None, None, None, None, None, None])

Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
sum([None, None, None, None, None, None])
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

Why then, doesn't
sum([A for A in [None, None, None, None, None, None] if A != None])
0

give me an error?

Ok, it's not a bug.

"This behaviour is by design." - Microsoft Knowledge Base

I don't like it, but I guess I'll just have to live with it.
There are no "empty" boxes. There are only boxes with
known quantities and those with unknown quantities.
I hope that's not too ivory tower.

The sum() function in Python requires exactly one box.  That box
can be empty, can contain "known quantities" (numbers, presumably),
or "unknown quantities" (non-numbers, e.g., None).  But you can't
give it zero boxes, or three boxes.

I don't have a strong view of whether sum([]) should return 0 or
raise an error, but please do not mix that question up with what
a sum over empty cells or over NULL values should yield.  They
are very different questions.

Ok, but I don't understand why an empty list is a valid sum
whereas a list containing None is not.
As it happens, the SQL sum() function (at least in MySQL; I don't
have any other database easily available, nor any SQL standard to
read) does return NULL for a sum over the empty sequence, so you
could argue that that would be the correct behaviour for the
Python sum() function as well, but you can't argue that because a
sum *involving* a NULL value returns NULL.

I'm not following that. Are you saying a query that returns no
records doesn't have a specific field containg a Null so there
are no Nulls to poison the sum? ...tap...tap...tap. Ok, I can see
that, but you don't get 0 either.
 
B

bearophileHUGS

David C. Ullrich:
At least in mathematics, the sum of the elements of
the empty set _is_ 0, while the maximum element of the
empty set is undefined.

What do you think about my idea of adding that 'default' argument to
the max()/min() functions?

Bye,
bearophile
 
C

castironpi

David C. Ullrich:


What do you think about my idea of adding that 'default' argument to
the max()/min() functions?

Bye,
bearophile

For max and min, why can't you just add your argument to the set
itself?

The reason max([]) is undefined is that max( S ) is in S. The reason
sum([]) is 0 is that sum( [ x ] ) - x = 0.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top