How to implement key of key in python?

E

eckhleung

I'm migrating from Perl to Python and unable to identify the equivalent of key of key concept. The following codes run well,

import csv

attr = {}

with open('test.txt','rb') as tsvin:
tsvin = csv.reader(tsvin, delimiter='\t')

for row in tsvin:
ID = row[1]


until:
attr[ID]['adm3'] = row[2]

I then try:
attr[ID].adm3 = row[2]

still doesn't work. Some posts suggest using module dict but some do not. I'm a bit confused now. Any suggestions?
 
C

CHIN Dihedral

I'm migrating from Perl to Python and unable to identify the equivalent of key of key concept. The following codes run well,



import csv



attr = {}



with open('test.txt','rb') as tsvin:

tsvin = csv.reader(tsvin, delimiter='\t')



for row in tsvin:

ID = row[1]





until:

attr[ID]['adm3'] = row[2]



I then try:

attr[ID].adm3 = row[2]



still doesn't work. Some posts suggest using module dict but some do not. I'm a bit confused now. Any suggestions?

Please check your attr as an empty dictionary or so-called a hash in perl.

The syntax of adding a (K,V) pair
is different between python and perl.
 
M

MRAB

I'm migrating from Perl to Python and unable to identify the equivalent of key of key concept. The following codes run well,

import csv

attr = {}

with open('test.txt','rb') as tsvin:
tsvin = csv.reader(tsvin, delimiter='\t')

for row in tsvin:
ID = row[1]


until:
attr[ID]['adm3'] = row[2]

I then try:
attr[ID].adm3 = row[2]

still doesn't work. Some posts suggest using module dict but some do not. I'm a bit confused now. Any suggestions?
Python doesn't have Perl's autovivication feature. If you want the
value to be a dict then you need to create that dict first:

attr[ID] = {}
attr[ID]['adm3'] = row[2]

You could also have a look at the 'defaultdict' class in the
'collections' module.
 
E

eckhleung

I'm migrating from Perl to Python and unable to identify the equivalent of key of key concept. The following codes run well,
import csv
attr = {}
with open('test.txt','rb') as tsvin:
tsvin = csv.reader(tsvin, delimiter='\t')
for row in tsvin:
ID = row[1]

attr[ID]['adm3'] = row[2]
I then try:
attr[ID].adm3 = row[2]
still doesn't work. Some posts suggest using module dict but some do not. I'm a bit confused now. Any suggestions?

Python doesn't have Perl's autovivication feature. If you want the

value to be a dict then you need to create that dict first:

attr[ID] = {}

attr[ID]['adm3'] = row[2]

You could also have a look at the 'defaultdict' class in the

'collections' module.

I identify the information below:
s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
d = defaultdict(list)
for k, v in s:
d[k].append(v)

While it is fine for a small dataset, I need a more generic way to do so. Indeed the "test.txt" in my example contains more columns of attributes like:

ID address age gender phone-number race education ...
ABC123 Ohio, USA 18 F 800-123-456 european university
ACC499 London 33 M 800-111-400 african university
....

so later I can retrieve the information in python by:

attr['ABC123'].address (containing 'Ohio, USA')
attr['ABC123'].race (containing 'european')
attr['ACC499'].age (containing '33')

The following links mention something similar,

http://courses.cs.washington.edu/courses/cse140/13wi/csv-parsing.html
http://stackoverflow.com/questions/8800111/parse-csv-file-and-aggregate-the-values
http://stackoverflow.com/questions/17763642/reading-tab-separated-file-into-a-defaultdict-python
http://semanticbible.com/blogos/2009/06/12/reading-tab-delimited-data-in-python-with-csv/

unfortunately none of them illustrates how to store the values and access them later. Moreover, they bring some new terms, e.g. combined, [], etc.

Is there any better reference?

Thanks again!
 
A

Andrea D'Amore

While it is fine for a small dataset, I need a more generic way to do so.

I don't get how the dataset size affects the generality of the solution here.

From your first message:
attr = {}
with open('test.txt','rb') as tsvin:
tsvin = csv.reader(tsvin, delimiter='\t')
for row in tsvin:
ID = row[1]

so your is solved by adding a simple
attr[ID] = {}

after the ID assignment. It seems simple to implement and generic enough to me.

unfortunately none of them illustrates how to store the values and
access them later.

You access the stored value by using the variable name that holds it,
but here you should probabily make more clear what your actual issue is.

Moreover, they bring some new terms, e.g. combined, [], etc.

The "[]" syntax is used in Python for lists.

The term "combined" hasn't a specific pythonic meaning there and is
just used as a meaningful variable name as the author is combining,
i.e. adding, numerical values.
 
P

Peter Otten

I'm migrating from Perl to Python and unable to identify the equivalent
of key of key concept. The following codes run well,
import csv
attr = {}
with open('test.txt','rb') as tsvin:
tsvin = csv.reader(tsvin, delimiter='\t')
for row in tsvin:
ID = row[1]

attr[ID]['adm3'] = row[2]
I then try:
attr[ID].adm3 = row[2]
still doesn't work. Some posts suggest using module dict but some do
not. I'm a bit confused now. Any suggestions?

Python doesn't have Perl's autovivication feature. If you want the

value to be a dict then you need to create that dict first:

attr[ID] = {}

attr[ID]['adm3'] = row[2]

You could also have a look at the 'defaultdict' class in the

'collections' module.

I identify the information below:
s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
d = defaultdict(list)
for k, v in s:
d[k].append(v)

While it is fine for a small dataset, I need a more generic way to do so.
Indeed the "test.txt" in my example contains more columns of attributes
like:

ID address age gender phone-number race education ...
ABC123 Ohio, USA 18 F 800-123-456 european university
ACC499 London 33 M 800-111-400 african university
...

so later I can retrieve the information in python by:

attr['ABC123'].address (containing 'Ohio, USA')
attr['ABC123'].race (containing 'european')
attr['ACC499'].age (containing '33')

Using a csv.DictReader comes close with minimal effort:

# write demo data to make the example self-contained
with open("tmp.csv", "w") as f:
f.write("""\
ID,address,age,gender,phone-number,race,education
ABC123,"Ohio, USA",18,F,800-123-456,european,university
ACC499,London,33,M,800-111-400,african,university
""")

import csv
import pprint

with open("tmp.csv") as f:
attr = {row["ID"]: row for row in csv.DictReader(f)}

pprint.pprint(attr)

print(attr["ACC499"]["age"])

The "dict comprehension"

attr = {row["ID"]: row for row in csv.DictReader(f)}

is a shortcut for

attr = {}
for row in csv.DictReader(f):
attr[row["ID"]] = row

If you insist on attribute access (row.age instead of row["age"]) you can
use a namedtuple. This is a bit more involved:

import csv
import pprint
from collections import namedtuple

with open("tmp.csv") as f:
rows = csv.reader(f)
header = next(rows)

# make sure column names are valid Python identifiers
header = [column.replace("-", "_") for column in header]

RowType = namedtuple("RowType", header)
key_index = header.index("ID")
attr = {row[key_index]: RowType(*row) for row in rows}

pprint.pprint(attr)

print(attr["ABC123"].race)
The following links mention something similar,

Too many, so I checked none of them ;)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top