looping in array vs looping in a dic

giuseppe.amatulli · Sep 20, 2012

Hi,
I have this script in python that i need to apply for very large arrays (arrays coming from satellite images).
The script works grate but i would like to speed up the process.
The larger computational time is in the for loop process.
Is there is a way to improve that part?
Should be better to use dic() instead of np.ndarray for saving the results?
and if yes how i can make the sum in dic()(like in the correspondent matrix[row_c,1] = matrix[row_c,1] + valuesRaster[row,col] )?
If the dic() is the solution way is faster?

Thanks
Giuseppe

import numpy as np
import sys
from time import clock, time

# create the arrays

start = time()
valuesRaster = np.random.random_integers(0, 100, 100).reshape(10, 10)
valuesCategory = np.random.random_integers(1, 10, 100).reshape(10, 10)

elapsed = (time() - start)
print(elapsed , "create the data")

start = time()

categories = np.unique(valuesCategory)
matrix = np.c_[ categories , np.zeros(len(categories))]

elapsed = (time() - start)
print(elapsed , "create the matrix and append a colum zero ")

rows = 10
cols = 10

start = time()

for col in range(0,cols):
for row in range(0,rows):
for row_c in range(0,len(matrix)) :
if valuesCategory[row,col] == matrix[row_c,0] :
matrix[row_c,1] = matrix[row_c,1] + valuesRaster[row,col]
break
elapsed = (time() - start)
print(elapsed , "loop in the data ")

print (matrix)

MRAB · Sep 20, 2012

Hi,
I have this script in python that i need to apply for very large arrays (arrays coming from satellite images).
The script works grate but i would like to speed up the process.
The larger computational time is in the for loop process.
Is there is a way to improve that part?
Should be better to use dic() instead of np.ndarray for saving the results?
and if yes how i can make the sum in dic()(like in the correspondent matrix[row_c,1] = matrix[row_c,1] + valuesRaster[row,col] )?
If the dic() is the solution way is faster?

Thanks
Giuseppe

import numpy as np
import sys
from time import clock, time

# create the arrays

start = time()
valuesRaster = np.random.random_integers(0, 100, 100).reshape(10, 10)
valuesCategory = np.random.random_integers(1, 10, 100).reshape(10, 10)

elapsed = (time() - start)
print(elapsed , "create the data")

start = time()

categories = np.unique(valuesCategory)
matrix = np.c_[ categories , np.zeros(len(categories))]

elapsed = (time() - start)
print(elapsed , "create the matrix and append a colum zero ")

rows = 10
cols = 10

start = time()

for col in range(0,cols):
for row in range(0,rows):
for row_c in range(0,len(matrix)) :
if valuesCategory[row,col] == matrix[row_c,0] :
matrix[row_c,1] = matrix[row_c,1] + valuesRaster[row,col]
break
elapsed = (time() - start)
print(elapsed , "loop in the data ")

print (matrix)

If I understand the code correctly, 'matrix' contains the categories in
column 0 and the totals in column 1.

What you're doing is performing a linear search through the categories
and then adding to the corresponding total.

Linear searches are slow because on average you have to search through
half of the list. Using a dict would be much faster (although you
should of course measure it!).

Try something like this:

import numpy as np
from time import time

# Create the arrays.

start = time()

valuesRaster = np.random.random_integers(0, 100, 100).reshape(10, 10)
valuesCategory = np.random.random_integers(1, 10, 100).reshape(10, 10)

elapsed = time() - start
print(elapsed, "Create the data.")

start = time()

categories = np.unique(valuesCategory)
totals = dict.fromkeys(categories, 0)

elapsed = time() - start
print(elapsed, "Create the totals dict.")

rows = 100
cols = 10

start = time()

for col in range(cols):
for row in range(rows):
cat = valuesCategory[row, col]
ras = valuesRaster[row, col]
totals[cat] += ras

elapsed = time() - start
print(elapsed, "Loop in the data.")

print(totals)

Ian Kelly · Sep 20, 2012

for col in range(cols):
for row in range(rows):
cat = valuesCategory[row, col]
ras = valuesRaster[row, col]
totals[cat] += ras

Expanding on what MRAB wrote, since you probably have far fewer
categories than pixels, you may be able to take better advantage of
numpy's vectorized operations (which are pretty much the whole point
of using numpy in the first place) by looping over the categories
instead:

for cat in categories:
totals[cat] += np.sum(valuesCategory * (valuesRaster == cat))

giuseppe.amatulli · Sep 21, 2012

Hi Ian and MRAB
thanks to you input i have improve the speed of my code. Definitely reading in dic() is faster. I have one more question.
In the dic() I calculate the sum of the values, but i want count also the number of observation, in order to calculate the average in the end.
Should i create a new dic() or is possible to do in the same dic().
Here in the final code.
Thanks Giuseppe

rows = dsCategory.RasterYSize
cols = dsCategory.RasterXSize

print("Generating output file %s" %(dst_file))

start = time()

unique=dict()

for irows in xrange(rows):
valuesRaster=dsRaster.GetRasterBand(1).ReadAsArray(0,irows,cols,1)
valuesCategory=dsCategory.GetRasterBand(1).ReadAsArray(0,irows,cols,1)
for icols in xrange(cols):
if ( valuesRaster[0,icols] != no_data_Raster ) and ( valuesCategory[0,icols] != no_data_Category ) :
row = valuesCategory[0, icols],valuesRaster[0, icols]
if row[0] in unique :
unique[row[0]] += row[1]
else:
unique[row[0]] = 0+row[1] # this 0 was add if not the first observation was considered = 0

giuseppe.amatulli · Sep 21, 2012

Hi Ian and MRAB
thanks to you input i have improve the speed of my code. Definitely reading in dic() is faster. I have one more question.
In the dic() I calculate the sum of the values, but i want count also the number of observation, in order to calculate the average in the end.
Should i create a new dic() or is possible to do in the same dic().
Here in the final code.
Thanks Giuseppe

rows = dsCategory.RasterYSize
cols = dsCategory.RasterXSize

print("Generating output file %s" %(dst_file))

start = time()

unique=dict()

for irows in xrange(rows):
valuesRaster=dsRaster.GetRasterBand(1).ReadAsArray(0,irows,cols,1)
valuesCategory=dsCategory.GetRasterBand(1).ReadAsArray(0,irows,cols,1)
for icols in xrange(cols):
if ( valuesRaster[0,icols] != no_data_Raster ) and ( valuesCategory[0,icols] != no_data_Category ) :
row = valuesCategory[0, icols],valuesRaster[0, icols]
if row[0] in unique :
unique[row[0]] += row[1]
else:
unique[row[0]] = 0+row[1] # this 0 was add if not the first observation was considered = 0

MRAB · Sep 21, 2012

Hi Ian and MRAB
thanks to you input i have improve the speed of my code. Definitely reading in dic() is faster. I have one more question.
In the dic() I calculate the sum of the values, but i want count also the number of observation, in order to calculate the average in the end.
Should i create a new dic() or is possible to do in the same dic().
Here in the final code.
Thanks Giuseppe

Keep it simple. Use 2 dicts.

rows = dsCategory.RasterYSize
cols = dsCategory.RasterXSize

print("Generating output file %s" %(dst_file))

start = time()

unique=dict()

for irows in xrange(rows):
valuesRaster=dsRaster.GetRasterBand(1).ReadAsArray(0,irows,cols,1)
valuesCategory=dsCategory.GetRasterBand(1).ReadAsArray(0,irows,cols,1)
for icols in xrange(cols):
if ( valuesRaster[0,icols] != no_data_Raster ) and ( valuesCategory[0,icols] != no_data_Category ) :
row = valuesCategory[0, icols],valuesRaster[0, icols]
if row[0] in unique :
unique[row[0]] += row[1]
else:
unique[row[0]] = 0+row[1] # this 0 was add if not the first observation was considered = 0

You could use defaultdict instead:

from collections import defaultdict

unique = defaultdict(int)
....
category, raster = valuesCategory[0, icols],
valuesRaster[0, icols]
unique[category] += raster

looping versus comprehension	0	Jan 30, 2013
I Need Fix In Code	1	Apr 12, 2023
Index Error during backpropagation in a multilayer neural network.	1	Jun 17, 2023
How to automatically append x,y values to a matrix	2	Jan 30, 2023
Yet another "simple" headscratcher	4	May 31, 2014
How to use Densenet121 in monai	0	Feb 16, 2024
Python bug? Indexing to matrices	4	Jul 12, 2011
Nested Looping SQL Querys	8	Sep 20, 2006

looping in array vs looping in a dic

giuseppe.amatulli

MRAB

Ian Kelly

giuseppe.amatulli

giuseppe.amatulli

MRAB

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads