Numpy record array - field names for all dimensions

S

ShanMayne

Greetings All

I am seeking to represent datasets where each data element is the
calculated result from several (4 for now) other data types. A matrix-
like (in the general mathematical sense) seems logical, where the
intersection of each of the 4 values (from different data sets) holds
the value derived from those 4 values here serving as indexes.

So, each matrix/array element is associated with 4 fields.
eg:
matrix element/output value = 24.235 -->
'Formula' = 'C12H24O2N2'
'Solvent' = 'Acetonitrile'
'fragmentation_method' = 'CID'
'resolution' = 'unit'

ideally I would like to call the output value by indexing the matrix
with the input information. eg:

matrix['C12H24O2N2']['Acetonitrile']['CID']['unit'] = 24.235

Numpy's record arrays seemingly don't allow all dimensions to carry
field names. ie. each column/row carrying a label. Instead fieldname
usage appears to create a "new dimension" as denoted by square
brackets.

eg:
pixel_matrix = array([[(1,2,3), (4,5,6)], [(7,8,9), (10,11,12)]],
[('r',float32),('g',float32),('b',float32)])


Q:
Can anyone tell me if the sort of data structuring I seek can be done
with Numpy record arrays or, if not, can you recommend a more suitable
module?

Great & Glowing Thanks!
 
R

Robert Kern

ShanMayne said:
Greetings All

Greetings! If you have more numpy questions, you will find numpy-discussion to
be a better forum:

http://www.scipy.org/Mailing_Lists
I am seeking to represent datasets where each data element is the
calculated result from several (4 for now) other data types. A matrix-
like (in the general mathematical sense) seems logical, where the
intersection of each of the 4 values (from different data sets) holds
the value derived from those 4 values here serving as indexes.

So, each matrix/array element is associated with 4 fields.
eg:
matrix element/output value = 24.235 -->
'Formula' = 'C12H24O2N2'
'Solvent' = 'Acetonitrile'
'fragmentation_method' = 'CID'
'resolution' = 'unit'

ideally I would like to call the output value by indexing the matrix
with the input information. eg:

matrix['C12H24O2N2']['Acetonitrile']['CID']['unit'] = 24.235

Numpy's record arrays seemingly don't allow all dimensions to carry
field names. ie. each column/row carrying a label. Instead fieldname
usage appears to create a "new dimension" as denoted by square
brackets.

Pretty much. You can make nested dtypes, but that's not really the data
structure that you want. You probably want a simple dictionary.

d = {
('C12H24O2N2','Acetonitrile','CID','unit'): 24.235,
...
}

assert d['C12H24O2N2','Acetonitrile','CID','unit'] == 24.235

If you want to make partial queries (e.g. Formula='C12H23O2N2' and
resolution='unit'), this becomes more like a typical relational database, but
you can probably get along with a few simple functions to loop over the
dictionary and pull out the relevant keys pretty quickly.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top