Data mapper - need to map an dictionary of values to a model

L

Luke

I am writing an order management console. I need to create an import
system that is easy to extend. For now, I want to accept an dictionary
of values and map them to my data model. The thing is, I need to do
things to certain columns:

- I need to filter some of the values (data comes in as YYYY-MM-
DDTHH:MM:SS-(TIMEZONE-OFFSET) and it needs to map to Order.date as a
YYYY-MM-DD field)
- I need to map parts of an input column to more than one model param
(for instance if I get a full name for input--like "John Smith"--I
need a function to break it apart and map it to
Order.shipping_first_name and Order.shipping_last_name)
- Sometimes I need to do it the other way too... I need to map
multiple input columns to one model param (If I get a shipping fee, a
shipping tax, and a shipping discount, I need them added together and
mapped to Order.shipping_fee)

I have begun this process, but I'm finding it difficult to come up
with a good system that is extensible and easy to understand. I won't
always be the one writing the importers, so I'd like it to be pretty
straight-forward. Any ideas?

Oh, I should also mention that many times the data will map to several
different models. For instance, the importer I'm writing first would
map to 3 different models (Order, OrderItem, and OrderCharge)

I am not looking for anybody to write any code for me. I'm simply
asking for inspiration. What design patterns would you use here? Why?
 
L

Luke

Luke:


What about "generator (scanner) with parameters"? :)

Bye,
bearophile

I'm not familiar with this pattern. I will search around, but if you
have any links or you would like to elaborate, that would be
wonderful. :)
 
B

bearophileHUGS

Luke:
I'm not familiar with this pattern. I will search around, but if you
have any links or you would like to elaborate, that would be
wonderful. :)

It's not a pattern, it's a little thing:

def line_filter(filein, params):
for line in filein:
if good(line, params):
yield extract(line, params)

That equals to this too:

def line_filter(filein, params):
return (extract(line, params) for line in filein if good(line,
params))

But probably that's not enough to solve your problem, so other people
can give you a better answer.

Bye,
bearophile
 
G

George Sakkis

I am writing an order management console. I need to create an import
system that is easy to extend. For now, I want to accept an dictionary
of values and map them to my data model. The thing is, I need to do
things to certain columns:

- I need to filter some of the values (data comes in as YYYY-MM-
DDTHH:MM:SS-(TIMEZONE-OFFSET) and it needs to map to Order.date as a
YYYY-MM-DD field)
- I need to map parts of an input column to more than one model param
(for instance if I get a full name for input--like "John Smith"--I
need a function to break it apart and map it to
Order.shipping_first_name and Order.shipping_last_name)
- Sometimes I need to do it the other way too... I need to map
multiple input columns to one model param (If I get a shipping fee, a
shipping tax, and a shipping discount, I need them added together and
mapped to Order.shipping_fee)

I have begun this process, but I'm finding it difficult to come up
with a good system that is extensible and easy to understand. I won't
always be the one writing the importers, so I'd like it to be pretty
straight-forward. Any ideas?

Oh, I should also mention that many times the data will map to several
different models. For instance, the importer I'm writing first would
map to 3 different models (Order, OrderItem, and OrderCharge)

I am not looking for anybody to write any code for me. I'm simply
asking for inspiration. What design patterns would you use here? Why?

The specific transformations you describe are simple to be coded
directly but unless you constrain the set of possible transformations
that can take place, I don't see how can this be generalized in any
useful way. It just seems too open-ended.

The only pattern I can see here is breaking down the overall
transformation to independent steps, just like the three you
described. Given some way to specify each separate transformation,
their combination can be factored out. To illustrate, here's a trivial
example (with dicts for both input and output):

class MultiTransformer(object):
def __init__(self, *tranformers):
self._tranformers = tranformers

def __call__(self, input):
output = {}
for t in self._tranformers:
output.update(t(input))
return output

date_tranformer = lambda input: {'date' : input['date'][:10]}
name_tranformer = lambda input: dict(
zip(('first_name', 'last_name'),
input['name']))
fee_tranformer = lambda input: {'fee' : sum([input['fee'],
input['tax'],
input['discount']])}
tranformer = MultiTransformer(date_tranformer,
name_tranformer,
fee_tranformer)
print tranformer(dict(date='2007-12-22 03:18:99-EST',
name='John Smith',
fee=30450.99,
tax=459.15,
discount=985))
# output
#{'date': '2007-12-22', 'fee': 31895.140000000003,
'first_name': #'J', 'last_name': 'o'}


You can see that the MultiTransformer doesn't buy you much by itself;
it just allows dividing the overall task to smaller bits that can be
documented, tested and reused separately. For anything more
sophisticated, you have to constrain what are the possible
transformations that can happen. I did something similar for
transforming CSV input rows (http://pypi.python.org/pypi/csvutils/) so
that it's easy to specify 1-to-{0,1} transformations but not 1-to-many
or many-to-1.

HTH,
George
 
G

George Sakkis

name_tranformer = lambda input: dict(
zip(('first_name', 'last_name'),
input['name']))

Of course that should write:

name_tranformer = lambda input: dict(
zip(('first_name', 'last_name'),
input['name'].split()))

George
 
L

Luke

I am writing an order management console. I need to create an import
system that is easy to extend. For now, I want to accept an dictionary
of values and map them to my data model. The thing is, I need to do
things to certain columns:
- I need to filter some of the values (data comes in as YYYY-MM-
DDTHH:MM:SS-(TIMEZONE-OFFSET) and it needs to map to Order.date as a
YYYY-MM-DD field)
- I need to map parts of an input column to more than one model param
(for instance if I get a full name for input--like "John Smith"--I
need a function to break it apart and map it to
Order.shipping_first_name and Order.shipping_last_name)
- Sometimes I need to do it the other way too... I need to map
multiple input columns to one model param (If I get a shipping fee, a
shipping tax, and a shipping discount, I need them added together and
mapped to Order.shipping_fee)
I have begun this process, but I'm finding it difficult to come up
with a good system that is extensible and easy to understand. I won't
always be the one writing the importers, so I'd like it to be pretty
straight-forward. Any ideas?
Oh, I should also mention that many times the data will map to several
different models. For instance, the importer I'm writing first would
map to 3 different models (Order, OrderItem, and OrderCharge)
I am not looking for anybody to write any code for me. I'm simply
asking for inspiration. What design patterns would you use here? Why?

The specific transformations you describe are simple to be coded
directly but unless you constrain the set of possible transformations
that can take place, I don't see how can this be generalized in any
useful way. It just seems too open-ended.

The only pattern I can see here is breaking down the overall
transformation to independent steps, just like the three you
described. Given some way to specify each separate transformation,
their combination can be factored out. To illustrate, here's a trivial
example (with dicts for both input and output):

class MultiTransformer(object):
def __init__(self, *tranformers):
self._tranformers = tranformers

def __call__(self, input):
output = {}
for t in self._tranformers:
output.update(t(input))
return output

date_tranformer = lambda input: {'date' : input['date'][:10]}
name_tranformer = lambda input: dict(
zip(('first_name', 'last_name'),
input['name']))
fee_tranformer = lambda input: {'fee' : sum([input['fee'],
input['tax'],
input['discount']])}
tranformer = MultiTransformer(date_tranformer,
name_tranformer,
fee_tranformer)
print tranformer(dict(date='2007-12-22 03:18:99-EST',
name='John Smith',
fee=30450.99,
tax=459.15,
discount=985))
# output
#{'date': '2007-12-22', 'fee': 31895.140000000003,
'first_name': #'J', 'last_name': 'o'}

You can see that the MultiTransformer doesn't buy you much by itself;
it just allows dividing the overall task to smaller bits that can be
documented, tested and reused separately. For anything more
sophisticated, you have to constrain what are the possible
transformations that can happen. I did something similar for
transforming CSV input rows (http://pypi.python.org/pypi/csvutils/) so
that it's easy to specify 1-to-{0,1} transformations but not 1-to-many
or many-to-1.

HTH,
George

thank you that is very helpful. I will ponder that for a while :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top