You have a bunch of marbles you want to put into bins. The division
tells you how many marbles to put into each bin. That would be an
integer since you cannot cut up individual marbles.
(Actually you can. As a small child, one of my most precious possessions
was a marble which had cracked into two halves.)
No, that doesn't follow, because you don't get the result you want if the
number of marbles is entered as Decimals or floats. Maybe the data came
from a marble-counting device that always returns floats.
You're expecting the function to magically know what you want to do with
the result and return the right kind of answer, which is the wrong way to
go about it. For example, there are situations where your data is given
in integers, but the number you want is a float.
# number of 20kg bags of flour per order
data = [5, 7, 20, 2, 7, 6, 1, 37, 3]
weights = [20*n for n in data]
mean(weights)
195.55555555555554
If I was using a library that arbitrarily decided to round the mean
weight per order to 195kg, I'd report that as a bug. Maybe I want the
next highest integer, not lowest. Maybe I do care about that extra 5/9th
of a kilo. It simply isn't acceptable for the function to try to guess
what I'm going to do with the result.
I think it's more important that a program never give a wrong answer,
than save a few keystrokes. So, that polymorphic mean function is a bit
scary. It might be best to throw an error if the args are all integers.
There is no definitely correct way to handle it so it's better to
require explicit directions.
Of course there's a correct way to handle it. You write a function that
returns the mathematical mean. And then, if you need special processing
of that mean, (say) truncating if the numbers are all ints, or on
Tuesdays, you do so afterwards:
x = mean(data)
if all(isinstance(n, int) for n in data) or today() == Tuesday:
x = int(x)
I suppose that if your application is always going to truncate the mean
you might be justified in writing an optimized function that does that.
But don't call it "truncated_mean", because that has a specific meaning
to statisticians that is not the same as what you're talking about.
Paul, I'm pretty sure you've publicly defended duck typing before. Now
you're all scared of some imagined type non-safety that results from
numeric coercions. I can't imagine why you think that this should be
allowed:
class Float(float): pass
x = Float(1.0)
mean([x, 2.0, 3.0, 5.0])
but this gives you the heebie-geebies:
mean([1, 2.0, 3.0, 5.0])
As a general principle, I'd agree that arbitrarily coercing any old type
into any other type is a bad idea. But in the specific case of numeric
coercions, 99% of the time the Right Way is to treat all numbers
identically, and then restrict the result if you want a restricted
result, so the language should make that the easy case, and leave the 1%
to the developer to write special code:
def pmean(data): # Paul Rubin's mean
"""Returns the arithmetic mean of data, unless data is all
ints, in which case returns the mean rounded to the nearest
integer less than the arithmetic mean."""
s = sum(data)
if isinstance(s, int): return s//len(data)
else: return s/len(data)