please help with optimisation of this code - update of given table according to another table

Discussion in 'Python' started by Farraige, Nov 8, 2006.

  1. Farraige

    Farraige Guest

    Hi I need your help...
    I am implementing the method that updates given table (table is
    represented as list of lists of strings) according to other table (some
    kind of merging)...

    This method takes following arguments:
    t1 - table we would like to update
    t2 - table we would like to take data
    from
    keyColumns - list of key indexes e.g. [0,1]
    columnsToBeUpdated - list of column indexes we would like to update in
    our table T1 e.g [2,4]

    Let's say we have a table T1:

    A B C D E
    ---------------
    1 4 5 7 7
    3 4 0 0 0

    and we call a method mergeTable(T1, T2, [0,1], [2,4])

    It means that we would like to update columns C and E of table T1 with
    data from table T2 but only in case the key columns A and B are equal
    in both tables.... I grant that the given key is unique in both tables
    so if I find a row with the same key in table T2 I do merging, stop and
    go to next row in table T1...

    Let's say T2 looks following:

    A B C D E
    ---------------
    2 2 8 8 8
    1 4 9 9 9

    So after execution of our mergeTable method, the table T1 should look
    like :

    A B C D E
    1 4 9 7 9
    3 4 0 0 0

    The 2nd row ['3', '4', '0' ,'0', '0'] didn't change because there was
    no row in table T2 with key = 3 ,4

    The main part of my algorithm now looks something like ...

    merge(t1, t2, keyColumns, columnsToBeUpdated)

    ........

    for row_t1 in t1:
    for row_t2 in t2:
    if [row_t1 for i in keyColumns] == [row_t2[j] for j
    in keyColumns]:
    # the keys are the same
    for colName in columnsToBeUpdated:
    row_t1[colName] = row_t2[colName]

    # go outside the inner loop - we found a row with
    # the same key in the table
    break

    In my algorithm I have 2 for loops and I have no idea how to optimise
    it (maybe with map? )
    I call this method for very large data and the performance is a
    critical issue for me :(

    I will be grateful for any ideas
    Thanks in advance!
     
    Farraige, Nov 8, 2006
    #1
    1. Advertising

  2. In <>, Farraige wrote:

    > Let's say we have a table T1:
    >
    > A B C D E
    > ---------------
    > 1 4 5 7 7
    > 3 4 0 0 0
    >
    > and we call a method mergeTable(T1, T2, [0,1], [2,4])
    >
    > It means that we would like to update columns C and E of table T1 with
    > data from table T2 but only in case the key columns A and B are equal
    > in both tables.... I grant that the given key is unique in both tables
    > so if I find a row with the same key in table T2 I do merging, stop and
    > go to next row in table T1...
    >
    > Let's say T2 looks following:
    >
    > A B C D E
    > ---------------
    > 2 2 8 8 8
    > 1 4 9 9 9
    >
    > So after execution of our mergeTable method, the table T1 should look
    > like :
    >
    > A B C D E
    > 1 4 9 7 9
    > 3 4 0 0 0
    >
    > The 2nd row ['3', '4', '0' ,'0', '0'] didn't change because there was
    > no row in table T2 with key = 3 ,4
    >
    > The main part of my algorithm now looks something like ...
    >
    > merge(t1, t2, keyColumns, columnsToBeUpdated)
    >
    > .......
    >
    > for row_t1 in t1:
    > for row_t2 in t2:
    > if [row_t1 for i in keyColumns] == [row_t2[j] for j
    > in keyColumns]:
    > # the keys are the same
    > for colName in columnsToBeUpdated:
    > row_t1[colName] = row_t2[colName]
    >
    > # go outside the inner loop - we found a row with
    > # the same key in the table
    > break
    >
    > In my algorithm I have 2 for loops and I have no idea how to optimise
    > it (maybe with map? )
    > I call this method for very large data and the performance is a
    > critical issue for me :(


    Just go through the first table once and build a mapping key->row and then
    go through the second table once and look for each row if the key is in
    the mapping. If yes: update columns. This runs in O(2*rows) instead if
    O(rows**2).

    def update_table(table_a, table_b, key_columns, columns_to_be_updated):
    def get_key(row):
    return tuple(row[x] for x in key_columns)

    key2row = dict((get_key(row), row) for row in table_a)
    for row in table_b:
    row_to_be_updated = key2row.get(get_key(row))
    if row_to_be_updated is not None:
    for column in columns_to_be_updated:
    row_to_be_updated[column] = row[column]


    def main():
    table_a = [[1, 4, 5, 7, 7],
    [3, 4, 0, 0, 0]]
    table_b = [[2, 2, 8, 8, 8],
    [1, 4, 9, 9, 9]]
    update_table(table_a, table_b, (0, 1), (2, 4))
    for row in table_a:
    print row

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Nov 8, 2006
    #2
    1. Advertising

  3. On 2006-11-08, Farraige <> wrote:
    >
    > ...
    >
    > The main part of my algorithm now looks something like ...
    >
    > merge(t1, t2, keyColumns, columnsToBeUpdated)
    >
    > .......
    >
    > for row_t1 in t1:
    > for row_t2 in t2:
    > if [row_t1 for i in keyColumns] == [row_t2[j] for j
    > in keyColumns]:
    > # the keys are the same
    > for colName in columnsToBeUpdated:
    > row_t1[colName] = row_t2[colName]
    >
    > # go outside the inner loop - we found a row with
    > # the same key in the table
    > break
    >
    > In my algorithm I have 2 for loops and I have no idea how to optimise
    > it (maybe with map? )
    > I call this method for very large data and the performance is a
    > critical issue for me :(
    >
    > I will be grateful for any ideas


    One idea would be to precompute the list comprehensions in the if test.

    p2 = [[row_t2 for i in keyColums] for row_t2 in t2]
    for row_t1 in t1:
    proj1 = [row_t1 for i in keyColumns]
    for row_t1, proj2 in izip(t2, p2):
    if proj1 == proj2:
    ...


    --
    Antoon Pardon
     
    Antoon Pardon, Nov 8, 2006
    #3
  4. Re: please help with optimisation of this code - update ofgiven table according to another table

    At Wednesday 8/11/2006 07:18, Farraige wrote:

    > for row_t1 in t1:
    > for row_t2 in t2:
    > if [row_t1 for i in keyColumns] == [row_t2[j] for j
    >in keyColumns]:
    > # the keys are the same
    > for colName in columnsToBeUpdated:
    > row_t1[colName] = row_t2[colName]
    >
    > # go outside the inner loop - we found a row with
    > # the same key in the table
    > break
    >
    >In my algorithm I have 2 for loops and I have no idea how to optimise
    >it (maybe with map? )
    >I call this method for very large data and the performance is a
    >critical issue for me :(


    def getkey(row):
    return tuple([row for i in keyColumns])

    Sort both tables using .sort(key=getkey); then traverse both in
    paralell like in classic merge; in 1 pass you get the job done (not
    counting the two initial sorts)


    --
    Gabriel Genellina
    Softlab SRL

    __________________________________________________
    Correo Yahoo!
    Espacio para todos tus mensajes, antivirus y antispam ¡gratis!
    ¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar
     
    Gabriel Genellina, Nov 8, 2006
    #4
  5. Farraige

    Farraige Guest

    Thank you

    Thank you all for your very valuable clues!
    Thanks to you I got nearly 97% perfomance improvement !!!!!!!!!!!!!!! I
    can't believe it :)))
    Once again thank you

    Best wishes
    Farraige
     
    Farraige, Nov 8, 2006
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Lord0
    Replies:
    1
    Views:
    573
    Thomas Weidenfeller
    Apr 19, 2006
  2. chiara
    Replies:
    6
    Views:
    476
    Barry Schwarz
    Oct 6, 2005
  3. Mayank Kaushik
    Replies:
    11
    Views:
    642
  4. 2Barter.net
    Replies:
    0
    Views:
    372
    2Barter.net
    Dec 13, 2006
  5. Casey Hawthorne
    Replies:
    385
    Views:
    5,687
    ng2010
    Apr 4, 2010
Loading...

Share This Page