scipy.stats.itemfreq: overflow with add.reduce

Hans Georg Krauthaeuser · Dec 21, 2005

Hi All,

I was playing with scipy.stats.itemfreq when I observed the following
overflow:

In [119]:for i in [254,255,256,257,258]:
.....: l=[0]*i
.....: print i, stats.itemfreq(l), l.count(0)
.....:
254 [ [ 0 254]] 254
255 [ [ 0 255]] 255
256 [ [0 0]] 256
257 [ [0 1]] 257
258 [ [0 2]] 258

itemfreq is pretty small (in stats.py):

----------------------------------------------------------------------
def itemfreq(a):
"""
Returns a 2D array of item frequencies. Column 1 contains item values,
column 2 contains their respective counts. Assumes a 1D array is passed.

Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
"""
scores = _support.unique(a)
scores = sort(scores)
freq = zeros(len(scores))
for i in range(len(scores)):
freq = add.reduce(equal(a,scores))
return array(_support.abut(scores, freq))
----------------------------------------------------------------------

It seems that add.reduce is the source for the overflow:

In [116]:from scipy import *

In [117]:for i in [254,255,256,257,258]:
.....: l=[0]*i
.....: print i, add.reduce(equal(l,0))
.....:
254 254
255 255
256 0
257 1
258 2

Is there any possibility to avoid the overflow?

BTW:
Python 2.3.5 (#2, Aug 30 2005, 15:50:26)
[GCC 4.0.2 20050821 (prerelease) (Debian 4.0.1-6)] on linux2

scipy_version.scipy_version --> '0.3.2'

Thanks and best regards
Hans Georg Krauthäuser

Hans Georg Krauthaeuser · Dec 21, 2005

Hans said:
Hi All,

I was playing with scipy.stats.itemfreq when I observed the following
overflow:

In [119]:for i in [254,255,256,257,258]:
.....: l=[0]*i
.....: print i, stats.itemfreq(l), l.count(0)
.....:
254 [ [ 0 254]] 254
255 [ [ 0 255]] 255
256 [ [0 0]] 256
257 [ [0 1]] 257
258 [ [0 2]] 258

itemfreq is pretty small (in stats.py):

----------------------------------------------------------------------
def itemfreq(a):
"""
Returns a 2D array of item frequencies. Column 1 contains item values,
column 2 contains their respective counts. Assumes a 1D array is passed.

Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
"""
scores = _support.unique(a)
scores = sort(scores)
freq = zeros(len(scores))
for i in range(len(scores)):
freq = add.reduce(equal(a,scores))
return array(_support.abut(scores, freq))
----------------------------------------------------------------------

It seems that add.reduce is the source for the overflow:

In [116]:from scipy import *

In [117]:for i in [254,255,256,257,258]:
.....: l=[0]*i
.....: print i, add.reduce(equal(l,0))
.....:
254 254
255 255
256 0
257 1
258 2

Is there any possibility to avoid the overflow?

BTW:
Python 2.3.5 (#2, Aug 30 2005, 15:50:26)
[GCC 4.0.2 20050821 (prerelease) (Debian 4.0.1-6)] on linux2

scipy_version.scipy_version --> '0.3.2'

Thanks and best regards
Hans Georg Krauthäuser

After some further investigation:

In [150]:add.reduce(array(equal([0]*256,0),typecode='l'))
Out[150]:256

In [151]:add.reduce(equal([0]*256,0))
Out[151]:0

The problem occurs with arrays with typecode 'b' (as returned by equal).

Workaround patch for itemfreq is obvious, but ... is it a bug or a feature?

regards
Hans Georg

Hans Georg Krauthaeuser · Dec 22, 2005

Hans said:
Hans said:

Hi All,

I was playing with scipy.stats.itemfreq when I observed the following
overflow:

In [119]:for i in [254,255,256,257,258]:
.....: l=[0]*i
.....: print i, stats.itemfreq(l), l.count(0)
.....:
254 [ [ 0 254]] 254
255 [ [ 0 255]] 255
256 [ [0 0]] 256
257 [ [0 1]] 257
258 [ [0 2]] 258

itemfreq is pretty small (in stats.py):

----------------------------------------------------------------------
def itemfreq(a):
"""
Returns a 2D array of item frequencies. Column 1 contains item values,
column 2 contains their respective counts. Assumes a 1D array is passed.

Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
"""
scores = _support.unique(a)
scores = sort(scores)
freq = zeros(len(scores))
for i in range(len(scores)):
freq = add.reduce(equal(a,scores))
return array(_support.abut(scores, freq))
----------------------------------------------------------------------

It seems that add.reduce is the source for the overflow:

In [116]:from scipy import *

In [117]:for i in [254,255,256,257,258]:
.....: l=[0]*i
.....: print i, add.reduce(equal(l,0))
.....:
254 254
255 255
256 0
257 1
258 2

Is there any possibility to avoid the overflow?

BTW:
Python 2.3.5 (#2, Aug 30 2005, 15:50:26)
[GCC 4.0.2 20050821 (prerelease) (Debian 4.0.1-6)] on linux2

scipy_version.scipy_version --> '0.3.2'

Thanks and best regards
Hans Georg Krauthäuser

Click to expand...

After some further investigation:

In [150]:add.reduce(array(equal([0]*256,0),typecode='l'))
Out[150]:256

In [151]:add.reduce(equal([0]*256,0))
Out[151]:0

The problem occurs with arrays with typecode 'b' (as returned by equal).

Workaround patch for itemfreq is obvious, but ... is it a bug or a feature?

regards
Hans Georg

I feel a bit lonely here, but, nevertheless a further remark:

The problem comes directly from the ufunc 'add' for typecode 'b'. In
contrast to 'multiply' the typecode is not 'upcasted':

In [178]:array(array([1],'b')*2)
Out[178]:array([2],'i')

In [179]:array(array([1],'b')+array([1],'b'))
Out[179]:array([2],'b')

So, for a array a with typecode 'b' it follows that

a+a != a*2

At the moment, I don't have the time to try the new scipy_core. It would
be nice to hear whether the problem is known or even already fixed!?

Regards
Hans Georg Krauthäuser

Help with my responsive home page	2	Dec 14, 2022
How do i Do this function(dealing with arrays)	1	Dec 10, 2021
overflow problem?	6	Nov 8, 2011
I Need Fix In Code	1	Apr 12, 2023
Overflow on INTEGER value.	24	Oct 9, 2012
Issue with key down - JS	3	Nov 25, 2020
Custom matrix multiplication produces different results to glm	0	Sep 16, 2023
SENTINEL CONTROL LOOP WHEN DEALING WITH TWO ARRAYS	1	Oct 26, 2023

scipy.stats.itemfreq: overflow with add.reduce

Hans Georg Krauthaeuser

Hans Georg Krauthaeuser

Hans Georg Krauthaeuser

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads