logic, algorithms, and (Perl) data structures

Discussion in 'Perl Misc' started by ccc31807, Mar 26, 2009.

  1. ccc31807

    ccc31807 Guest

    I've been given an assignment to revive a project that was apparently
    abandoned by the original developers, and I need some help thinking
    through the logic.

    The DB has Person table with the name, address, etc., of individual
    people. This should have been called the Actors table, but it's too
    late now. The names are identifiers, and I know names are not unique,
    individuals having several names (J. Smith, Jim Smith, and James Smith
    are all the same person) and names being shared by individuals (three
    people all named Jim Smith) but we are dealing with this as I indicate
    below.

    The DB also has an Accounts table for discrete events. Should have
    been called the Events table, but never mind. This list events by ID,
    which is a unique identifier. The idea is that an Even (or Account)
    has a primary Actor (Person) with other actors associated with the
    event in one way or another. There are often multiple accounts
    concerning the same person with different ID numbers. For example,
    medical events may record a broken arm, sprained ankle, and cuts and
    abrasions for the same person in three different records.

    Accounts are pushed in CSV files and inserted into the database in an
    automated manner. Each row contains the data defining the account and
    the primary actor. These rows are not currently split, and all the
    data gets inserted into the Account table. The secondary actors are
    inserted into the Person table.

    I would like to split the account records, insert the account specific
    data into the Account table and the person specific data into the
    Person table. This means associating each account with the primary
    person, which means checking the import names against the Person
    table. (The fact that we can't uniquely identify the people means that
    the accounts must be manually worked, and the client has a staff that
    already does this. This is a data problem, not a software problem,
    that I am not responsible for.)

    The Person table isn't large now, but we anticipate that it will grow
    to several million records. Each import file, the accounts, has from
    20 to 50 records. There can be as many as 50 per day.

    Each day, I read the Person table into a hash that has elements that
    look like this: %person --> $person{$last}{$first}{$middle}{$suffix}

    I split each record into a series of scalars, including a hash that
    looks like this: %account_person --> $account_person{$last}{$first}
    ($middle}($suffix}.

    If there isn't a match, the account is read into Account and the
    person is read into Person. If there is a match, the record is flagged
    for manual handling.

    I've been thinking about Schwartzian Transforms and Guttman-Rosler
    Transforms, and am wondering of this is the best way to handle this.
    I'm not concerned with real time processing since this is handled off-
    line, but I'd like to make it as efficient as I can.

    Thanks, CC.
     
    ccc31807, Mar 26, 2009
    #1
    1. Advertising

  2. ccc31807

    smallpond Guest

    On Mar 26, 9:39 am, ccc31807 <> wrote:

    >
    > I've been thinking about Schwartzian Transforms and Guttman-Rosler
    > Transforms, and am wondering of this is the best way to handle this.
    > I'm not concerned with real time processing since this is handled off-
    > line, but I'd like to make it as efficient as I can.
    >
    > Thanks, CC.



    My advice:
    1) define the operations you want to do
    2) lay out tables to simplify the operations
    3) write the SQL needed to perform the operations
    4) write the perl glue
    5) test, test, test, test, test
    ....
    97) worry about the efficiency of the perl code
     
    smallpond, Mar 26, 2009
    #2
    1. Advertising

  3. ccc31807

    ccc31807 Guest

    On Mar 26, 11:00 am, smallpond <> wrote:
    > My advice:
    > 1) define the operations you want to do
    > 2) lay out tables to simplify the operations
    > 3) write the SQL needed to perform the operations
    > 4) write the perl glue
    > 5) test, test, test, test, test
    > ...
    > 97) worry about the efficiency of the perl code


    Absolutely great advice. Two points:

    3.5) write the user interface. Users must interact with the system,
    and I find that writing the interface before I write the code (Perl
    glue) gives me both the input and the output that I need to write the
    code for.

    More importantly, people can be active in different capacities in
    different events, and I'm drowning in complexity. I'd like to be able
    to have ONE (1) table with all my people data. The prior developers
    have about ten tables representing different capacities, with the
    problem that if a person's address and email changes, you've got to
    chase that person over ten (worst case) different tables. My way out
    is to match the incoming names against the existing names, which won't
    tag duplicates but will at least tag uniques. I'll let the client sort
    out the duplicates, and he understands that this isn't a software
    problem.

    Truthfully, this is the first application that I have been involved
    with that has to track people in this way, and I'm having a problem
    dealing with it. Thanks for your help.

    CC
     
    ccc31807, Mar 26, 2009
    #3
  4. ccc31807

    Uri Guttman Guest

    >>>>> "c" == ccc31807 <> writes:

    c> I've been thinking about Schwartzian Transforms and Guttman-Rosler
    c> Transforms, and am wondering of this is the best way to handle this.
    c> I'm not concerned with real time processing since this is handled off-
    c> line, but I'd like to make it as efficient as I can.

    as smallpond has said, worry about efficiency after you get it working
    correctly. but in any case if you do need easier and faster sorting, use
    Sort::Maker which has both. and you can select the sort style you want
    and even print out the generated code for ejimication.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Free Perl Training --- http://perlhunter.com/college.html ---------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
     
    Uri Guttman, Mar 26, 2009
    #4
  5. ccc31807

    smallpond Guest

    On Mar 26, 11:20 am, ccc31807 <> wrote:
    > On Mar 26, 11:00 am, smallpond <> wrote:
    >
    > > My advice:
    > > 1) define the operations you want to do
    > > 2) lay out tables to simplify the operations
    > > 3) write the SQL needed to perform the operations
    > > 4) write the perl glue
    > > 5) test, test, test, test, test
    > > ...
    > > 97) worry about the efficiency of the perl code

    >
    > Absolutely great advice. Two points:
    >
    > 3.5) write the user interface. Users must interact with the system,
    > and I find that writing the interface before I write the code (Perl
    > glue) gives me both the input and the output that I need to write the
    > code for.
    >
    > More importantly, people can be active in different capacities in
    > different events, and I'm drowning in complexity. I'd like to be able
    > to have ONE (1) table with all my people data. The prior developers
    > have about ten tables representing different capacities, with the
    > problem that if a person's address and email changes, you've got to
    > chase that person over ten (worst case) different tables. My way out
    > is to match the incoming names against the existing names, which won't
    > tag duplicates but will at least tag uniques. I'll let the client sort
    > out the duplicates, and he understands that this isn't a software
    > problem.
    >
    > Truthfully, this is the first application that I have been involved
    > with that has to track people in this way, and I'm having a problem
    > dealing with it. Thanks for your help.
    >
    > CC


    My point is that the database design will determine 99% of
    the system stability and efficiency. 10 tables is ok as long
    as they are linked properly - i.e. not by selecting on non-unique
    strings, but by using unique keys. Solve that problem first.

    You are correct that if you have multiple tables indexed by the same
    thing (person) then they should be combined. Especially in any
    environment where you have to worry about multiple processes
    doing updates, like a website.
     
    smallpond, Mar 26, 2009
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    3
    Views:
    670
    Roedy Green
    Jan 14, 2006
  2. Thomas Nick
    Replies:
    0
    Views:
    1,887
    Thomas Nick
    Jun 13, 2005
  3. efrat
    Replies:
    2
    Views:
    389
    Bryan Olson
    Sep 28, 2006
  4. Alfonso Morra
    Replies:
    11
    Views:
    720
    Emmanuel Delahaye
    Sep 24, 2005
  5. Alec Taylor
    Replies:
    0
    Views:
    165
    Alec Taylor
    Mar 2, 2012
Loading...

Share This Page