frequency of values in a field

noydb · Feb 8, 2011

I am looking for ways to go about capturing the frequency of unique
values in one field in a dbf table which contains ~50k records. The
values are numbers with atleast 5 digits to the right of the decimal,
but I want the frequency of values to only 2 decimal places. I do
have a method to do this courtesy of a provided tool in ArcGIS. Was
just curious about ways to do it without arcgis sw, using just python.

Saw this http://code.activestate.com/recipes/277600-one-liner-frequency-count/
using itertools.

I'd be curious to see how experienced pythoners' (or not too
experienced!) would go about doing this.

Thanks for any snippets provided, this should be interesting and
educating!

Paul Rubin · Feb 8, 2011

noydb said:
I am looking for ways to go about capturing the frequency of unique
values in one field in a dbf table which contains ~50k records. The
values are numbers with atleast 5 digits to the right of the decimal,
but I want the frequency of values to only 2 decimal places. I do
have a method to do this courtesy of a provided tool in ArcGIS. Was
just curious about ways to do it without arcgis sw, using just python.

The Decimal module is pretty slow but is conceptually probably the right
way to do this. With just 50k records it shouldn't be too bad. With
more records you might look for a faster way.

from decimal import Decimal as D
from collections import defaultdict

records = ['3.14159','2.71828','3.142857']

td = defaultdict(int)
for x in records:
td[D(x).quantize(D('0.01'))] += 1

print td

Saw this http://code.activestate.com/recipes/277600-one-liner-frequency-count/
using itertools.

That is cute but I cringe a bit at the temporary lists and the n log n algorithm.

Vlastimil Brom · Feb 8, 2011

2011/2/8 said:
noydb said:

I am looking for ways to go about capturing the frequency of unique
values in one field in a dbf table which contains ~50k records. The
values are numbers with atleast 5 digits to the right of the decimal,
but I want the frequency of values to only 2 decimal places. I do
have a method to do this courtesy of a provided tool in ArcGIS. Was
just curious about ways to do it without arcgis sw, using just python.

Click to expand...

...

from decimal import Decimal as D
from collections import defaultdict

records = ['3.14159','2.71828','3.142857']

td = defaultdict(int)
for x in records:
td[D(x).quantize(D('0.01'))] += 1

print td

...

Click to expand...

Another variant of the above code using collections.Counter (in newer
python versions);
The actual frequency counting code is actually the single
instantiation of the Counter from an iterable. The appropriate
handling of the number values might be tweaked as needed.

from decimal import Decimal as D
from collections import Counter
records = ['3.14159','2.71828','3.142857']
Counter(D(x).quantize(D('0.01')) for x in records) Counter({Decimal('3.14'): 2, Decimal('2.72'): 1})

Click to expand...

Click to expand...

vbr

noydb · Feb 9, 2011

The Decimal module is pretty slow but is conceptually probably the right
way to do this. With just 50k records it shouldn't be too bad. With
more records you might look for a faster way.

from decimal import Decimal as D
from collections import defaultdict

records = ['3.14159','2.71828','3.142857']

td = defaultdict(int)
for x in records:
td[D(x).quantize(D('0.01'))] += 1

print td

I played with this - it worked. Using Python 2.6 so counter no good.

I require an output text file of sorted "key value" so I added
(further code to write out to an actual textfile, not important here)

for z in sorted(set(td)):
print z, td[z]

Click to expand...

So it seems the idea is to add all the records in the particular field
of interest into a list (record). How does one do this in pure
Python?
Normally in my work with gis/arcgis sw, I would do a search cursor on
the DBF file and add each value in the particular field into a list
(to populate records above). Something like:

import arcgisscripting
# Create the geoprocessor object
gp = arcgisscripting.create()
records_list = []
cur = gp.SearchCursor(dbfTable)
row = cur.Next()
while row:
value = row.particular_field
records_list.append(value)

Click to expand...

Ethan Furman · Feb 9, 2011

noydb said:
The Decimal module is pretty slow but is conceptually probably the right
way to do this. With just 50k records it shouldn't be too bad. With
more records you might look for a faster way.

from decimal import Decimal as D
from collections import defaultdict

records = ['3.14159','2.71828','3.142857']

td = defaultdict(int)
for x in records:
td[D(x).quantize(D('0.01'))] += 1

print td

Click to expand...

I played with this - it worked. Using Python 2.6 so counter no good.

I require an output text file of sorted "key value" so I added
(further code to write out to an actual textfile, not important here)

for z in sorted(set(td)):
print z, td[z]

Click to expand...

Click to expand...

So it seems the idea is to add all the records in the particular field
of interest into a list (record). How does one do this in pure
Python?
Normally in my work with gis/arcgis sw, I would do a search cursor on
the DBF file and add each value in the particular field into a list
(to populate records above). Something like:

--> import arcgisscripting
--> # Create the geoprocessor object
--> gp = arcgisscripting.create()
--> records_list = []
--> cur = gp.SearchCursor(dbfTable)
--> row = cur.Next()
--> while row:
--> value = row.particular_field
--> records_list.append(value)

Are you trying to get away from arcgisscripting? There is a pure python
dbf package on PyPI (I know, I put it there

that you can use to
access the .dbf file in question (assuming it's a dBase III, IV, or
FoxPro format).

http://pypi.python.org/pypi/dbf/0.88.16 if you're interested.

Using it, the code above could be:

-----------------------------------------------------
import dbf
from collections import defaultdict
from decimal import Decimal

table = dbf.Table('path/to/table/table_name')

freq = defaultdict(int)
for record in table:
value = Decimal(record['field_of_interest'])
key = value.quantize(Decimal('0.01'))
freq[key] += 1

for z in sorted(freq):
print z, freq[z]

-----------------------------------------------------

Numeric/Float field types are returned as python floats*, so there may
be slight discrepancies between the stored value and the returned value.

Hope this helps.

~Ethan~

*Unless created with zero decimal places, in which case they are
returned as python integers.

noydb · Feb 9, 2011

noydb wrote:

> Paul Rubin wrote:

The Decimal module is pretty slow but is conceptually probably the right
way to do this. With just 50k records it shouldn't be too bad. With
more records you might look for a faster way.
from decimal import Decimal as D
from collections import defaultdict
records = ['3.14159','2.71828','3.142857']
td = defaultdict(int)
for x in records:
td[D(x).quantize(D('0.01'))] += 1
print td

Click to expand...

Click to expand...

I played with this - it worked. Using Python 2.6 so counter no good.

Click to expand...

I require an output text file of sorted "key value" so I added
(further code to write out to an actual textfile, not important here)

for z in sorted(set(td)):
print z, td[z]

Click to expand...

Click to expand...

So it seems the idea is to add all the records in the particular field
of interest into a list (record). How does one do this in pure
Python?
Normally in my work with gis/arcgis sw, I would do a search cursor on
the DBF file and add each value in the particular field into a list
(to populate records above). Something like:

Click to expand...

--> import arcgisscripting
--> # Create the geoprocessor object
--> gp = arcgisscripting.create()
--> records_list = []
--> cur = gp.SearchCursor(dbfTable)
--> row = cur.Next()
--> while row:
--> value = row.particular_field
--> records_list.append(value)

Click to expand...

Are you trying to get away from arcgisscripting? There is a pure python
dbf package on PyPI (I know, I put it there that you can use to
access the .dbf file in question (assuming it's a dBase III, IV, or
FoxPro format).

http://pypi.python.org/pypi/dbf/0.88.16if you're interested.

Using it, the code above could be:

-----------------------------------------------------
import dbf
from collections import defaultdict
from decimal import Decimal

table = dbf.Table('path/to/table/table_name')

freq = defaultdict(int)
for record in table:
value = Decimal(record['field_of_interest'])
key = value.quantize(Decimal('0.01'))
freq[key] += 1

for z in sorted(freq):
print z, freq[z]

-----------------------------------------------------

Numeric/Float field types are returned as python floats*, so there may
be slight discrepancies between the stored value and the returned value.

Hope this helps.

~Ethan~

*Unless created with zero decimal places, in which case they are
returned as python integers.- Hide quoted text -

- Show quoted text -

Oops, didn't see htis before I posted last.

Thanks! I'll try this, looks good, makes sense.

PyWart: PEP8: a seething cauldron of inconsistencies.	1	Jul 28, 2011
PyWart: PEP8: A cauldron of inconsistencies.	7	Jul 27, 2011
Repost: Values of location field gets truncated in a asp table	11	Jan 11, 2006
Values of location field gets truncated in a asp generated table	4	Jan 11, 2006
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
[ANN] Builds of PyWebkitGtk and Webkit-Glib-Gtk(r39359+#16401.master) for Debian i386,Debian AMD64 a	0	Dec 31, 2008
In the Matter of Herb Schildt: a Detailed Analysis of "C: TheComplete Nonsense"	109	Apr 3, 2010
Opportunity of a lifetime to Attend a Amazing Event.	0	Apr 12, 2008

frequency of values in a field

noydb

Paul Rubin

Vlastimil Brom

noydb

Ethan Furman

noydb

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads