frequency of values in a field

N

noydb

I am looking for ways to go about capturing the frequency of unique
values in one field in a dbf table which contains ~50k records. The
values are numbers with atleast 5 digits to the right of the decimal,
but I want the frequency of values to only 2 decimal places. I do
have a method to do this courtesy of a provided tool in ArcGIS. Was
just curious about ways to do it without arcgis sw, using just python.

Saw this http://code.activestate.com/recipes/277600-one-liner-frequency-count/
using itertools.

I'd be curious to see how experienced pythoners' (or not too
experienced!) would go about doing this.

Thanks for any snippets provided, this should be interesting and
educating!
 
P

Paul Rubin

noydb said:
I am looking for ways to go about capturing the frequency of unique
values in one field in a dbf table which contains ~50k records. The
values are numbers with atleast 5 digits to the right of the decimal,
but I want the frequency of values to only 2 decimal places. I do
have a method to do this courtesy of a provided tool in ArcGIS. Was
just curious about ways to do it without arcgis sw, using just python.

The Decimal module is pretty slow but is conceptually probably the right
way to do this. With just 50k records it shouldn't be too bad. With
more records you might look for a faster way.

from decimal import Decimal as D
from collections import defaultdict

records = ['3.14159','2.71828','3.142857']

td = defaultdict(int)
for x in records:
td[D(x).quantize(D('0.01'))] += 1

print td

That is cute but I cringe a bit at the temporary lists and the n log n algorithm.
 
V

Vlastimil Brom

2011/2/8 said:
noydb said:
I am looking for ways to go about capturing the frequency of unique
values in one field in a dbf table which contains ~50k records. The
values are numbers with atleast 5 digits to the right of the decimal,
but I want the frequency of values to only 2 decimal places. I do
have a method to do this courtesy of a provided tool in ArcGIS. Was
just curious about ways to do it without arcgis sw, using just python.
...

from decimal import Decimal as D
from collections import defaultdict

records = ['3.14159','2.71828','3.142857']

td = defaultdict(int)
for x in records:
td[D(x).quantize(D('0.01'))] += 1

print td

Another variant of the above code using collections.Counter (in newer
python versions);
The actual frequency counting code is actually the single
instantiation of the Counter from an iterable. The appropriate
handling of the number values might be tweaked as needed.
from decimal import Decimal as D
from collections import Counter
records = ['3.14159','2.71828','3.142857']
Counter(D(x).quantize(D('0.01')) for x in records) Counter({Decimal('3.14'): 2, Decimal('2.72'): 1})

vbr
 
N

noydb

The Decimal module is pretty slow but is conceptually probably the right
way to do this.  With just 50k records it shouldn't be too bad.  With
more records you might look for a faster way.

    from decimal import Decimal as D
    from collections import defaultdict

    records = ['3.14159','2.71828','3.142857']

    td = defaultdict(int)
    for x in records:
        td[D(x).quantize(D('0.01'))] += 1

    print td

I played with this - it worked. Using Python 2.6 so counter no good.

I require an output text file of sorted "key value" so I added
(further code to write out to an actual textfile, not important here)
for z in sorted(set(td)):
print z, td[z]

So it seems the idea is to add all the records in the particular field
of interest into a list (record). How does one do this in pure
Python?
Normally in my work with gis/arcgis sw, I would do a search cursor on
the DBF file and add each value in the particular field into a list
(to populate records above). Something like:
import arcgisscripting
# Create the geoprocessor object
gp = arcgisscripting.create()
records_list = []
cur = gp.SearchCursor(dbfTable)
row = cur.Next()
while row:
value = row.particular_field
records_list.append(value)
 
E

Ethan Furman

noydb said:
The Decimal module is pretty slow but is conceptually probably the right
way to do this. With just 50k records it shouldn't be too bad. With
more records you might look for a faster way.

from decimal import Decimal as D
from collections import defaultdict

records = ['3.14159','2.71828','3.142857']

td = defaultdict(int)
for x in records:
td[D(x).quantize(D('0.01'))] += 1

print td

I played with this - it worked. Using Python 2.6 so counter no good.

I require an output text file of sorted "key value" so I added
(further code to write out to an actual textfile, not important here)
for z in sorted(set(td)):
print z, td[z]

So it seems the idea is to add all the records in the particular field
of interest into a list (record). How does one do this in pure
Python?
Normally in my work with gis/arcgis sw, I would do a search cursor on
the DBF file and add each value in the particular field into a list
(to populate records above). Something like:

--> import arcgisscripting
--> # Create the geoprocessor object
--> gp = arcgisscripting.create()
--> records_list = []
--> cur = gp.SearchCursor(dbfTable)
--> row = cur.Next()
--> while row:
--> value = row.particular_field
--> records_list.append(value)

Are you trying to get away from arcgisscripting? There is a pure python
dbf package on PyPI (I know, I put it there ;) that you can use to
access the .dbf file in question (assuming it's a dBase III, IV, or
FoxPro format).

http://pypi.python.org/pypi/dbf/0.88.16 if you're interested.

Using it, the code above could be:

-----------------------------------------------------
import dbf
from collections import defaultdict
from decimal import Decimal

table = dbf.Table('path/to/table/table_name')

freq = defaultdict(int)
for record in table:
value = Decimal(record['field_of_interest'])
key = value.quantize(Decimal('0.01'))
freq[key] += 1

for z in sorted(freq):
print z, freq[z]

-----------------------------------------------------

Numeric/Float field types are returned as python floats*, so there may
be slight discrepancies between the stored value and the returned value.

Hope this helps.

~Ethan~

*Unless created with zero decimal places, in which case they are
returned as python integers.
 
N

noydb

noydb wrote:

 > Paul Rubin wrote:




The Decimal module is pretty slow but is conceptually probably the right
way to do this.  With just 50k records it shouldn't be too bad.  With
more records you might look for a faster way.
    from decimal import Decimal as D
    from collections import defaultdict
    records = ['3.14159','2.71828','3.142857']
    td = defaultdict(int)
    for x in records:
        td[D(x).quantize(D('0.01'))] += 1
    print td
I played with this - it worked.  Using Python 2.6 so counter no good.
I require an output text file of sorted "key value" so I added
(further code to write out to an actual textfile, not important here)
for z in sorted(set(td)):
    print z, td[z]
So it seems the idea is to add all the records in the particular field
of interest into a list (record).  How does one do this in pure
Python?
Normally in my work with gis/arcgis sw, I would do a search cursor on
the DBF file and add each value in the particular field into a list
(to populate records above).  Something like:
--> import arcgisscripting
--> # Create the geoprocessor object
--> gp = arcgisscripting.create()
--> records_list = []
--> cur = gp.SearchCursor(dbfTable)
--> row = cur.Next()
--> while row:
-->    value = row.particular_field
-->    records_list.append(value)

Are you trying to get away from arcgisscripting?  There is a pure python
dbf package on PyPI (I know, I put it there ;) that you can use to
access the .dbf file in question (assuming it's a dBase III, IV, or
FoxPro format).

http://pypi.python.org/pypi/dbf/0.88.16if you're interested.

Using it, the code above could be:

-----------------------------------------------------
import dbf
from collections import defaultdict
from decimal import Decimal

table = dbf.Table('path/to/table/table_name')

freq = defaultdict(int)
for record in table:
     value = Decimal(record['field_of_interest'])
     key = value.quantize(Decimal('0.01'))
     freq[key] += 1

for z in sorted(freq):
     print z, freq[z]

-----------------------------------------------------

Numeric/Float field types are returned as python floats*, so there may
be slight discrepancies between the stored value and the returned value.

Hope this helps.

~Ethan~

*Unless created with zero decimal places, in which case they are
returned as python integers.- Hide quoted text -

- Show quoted text -



Oops, didn't see htis before I posted last.

Thanks! I'll try this, looks good, makes sense.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,072
Latest member
trafficcone

Latest Threads

Top