Complex data structures and variable scope

Discussion in 'Perl Misc' started by Henry Law, Sep 9, 2005.

  1. Henry Law

    Henry Law Guest

    I wonder if someone can help me with working out where to define my
    variables so as to give the desired result here...

    Personnel data is to be read from a file. Attributes for each
    employee are presented as name/value pairs; some pairs occur once
    only, but each person can have more than one qualification. The data
    is to be accumulated in a hash, keyed by the person's name; each
    person's hash contains their attributes, plus an reference to an array
    containing their qualifications. This enables me (using other
    programming not relevant here) to look up any employee by name and
    present their attributes and qualifications.

    Here's a runnable program which shows where I've got to. It gives the
    wrong results and I'm trying to work out how to fix it.

    --------------- <start of example> -----------------
    #! C:\Perl\bin\perl.exe

    use strict;
    use warnings;
    use Data::Dumper;

    my %people;
    my %attrs;
    my @quals;

    while (my $rec = <DATA>) {
    #my %attrs;
    #my @quals;
    chomp $rec;
    if ($rec) {
    my ($name,$value) = split /:/, $rec;
    if ($name eq "qual") {
    push @quals,$value;
    next;
    }
    $attrs{$name} = $value;
    } else { # End of person
    $attrs{quals} = \@quals;
    $people{$attrs{name}} = \%attrs;
    #undef %attrs;
    #undef @quals;
    }
    }
    print Dumper(%people);

    __END__
    name:alice
    occupation:analyst
    born:aberdeen
    qual:a1
    qual:a2

    name:bob
    occupation:baker
    status:broken
    qual:b1
    qual:b2
    ---------------- <end of example> ------------------

    This results in
    $VAR1 = 'alice';
    $VAR2 = {
    'quals' => [
    'a1',
    'a2',
    'b1',
    'b2'
    ],
    'status' => 'broken',
    'name' => 'bob',
    'born' => 'aberdeen',
    'occupation' => 'baker'
    };

    I can see what's wrong: %attrs and @quals have scope outside the
    file-read loop, so when the program gets to Bob's records they either
    add to or replace Alice's. But I can't define them inside the read
    loop because they then disappear and reappear with each record (you
    can see where I tried this in the comments). I've also tried
    undef-ing the arrays at the end of each person (also as shown),
    without getting the desired result.

    Can someone make some suggestions? I'm prepared to structure the
    whole program differently but the data format is a given.
    --

    Henry Law <>< Manchester, England
     
    Henry Law, Sep 9, 2005
    #1
    1. Advertising

  2. Henry Law

    Anno Siegel Guest

    Henry Law <> wrote in comp.lang.perl.misc:
    > I wonder if someone can help me with working out where to define my
    > variables so as to give the desired result here...
    >
    > Personnel data is to be read from a file. Attributes for each
    > employee are presented as name/value pairs; some pairs occur once
    > only, but each person can have more than one qualification. The data
    > is to be accumulated in a hash, keyed by the person's name; each
    > person's hash contains their attributes, plus an reference to an array
    > containing their qualifications. This enables me (using other
    > programming not relevant here) to look up any employee by name and
    > present their attributes and qualifications.
    >
    > Here's a runnable program which shows where I've got to. It gives the
    > wrong results and I'm trying to work out how to fix it.


    I haven't completely analyzed your code. The two remarks I added are
    inessential to your problem.

    > --------------- <start of example> -----------------
    > #! C:\Perl\bin\perl.exe
    >
    > use strict;
    > use warnings;
    > use Data::Dumper;
    >
    > my %people;
    > my %attrs;
    > my @quals;
    >
    > while (my $rec = <DATA>) {
    > #my %attrs;
    > #my @quals;
    > chomp $rec;
    > if ($rec) {
    > my ($name,$value) = split /:/, $rec;
    > if ($name eq "qual") {
    > push @quals,$value;
    > next;
    > }
    > $attrs{$name} = $value;
    > } else { # End of person
    > $attrs{quals} = \@quals;
    > $people{$attrs{name}} = \%attrs;
    > #undef %attrs;
    > #undef @quals;


    The normal way to clear an aggregate is by setting it to (), so the
    commented code should be

    %attrs = ();
    @quals = ();

    Undef'ing does work as expected, but has the effect of destroying
    the aggregate entirely. That is normally not what you want.

    > }
    > }
    > print Dumper(%people);


    That dump is a bit misleading. Data::Dumper dumps references, so
    that should be

    print Dumper \ %people;

    > __END__
    > name:alice
    > occupation:analyst
    > born:aberdeen
    > qual:a1
    > qual:a2
    >
    > name:bob
    > occupation:baker
    > status:broken
    > qual:b1
    > qual:b2
    > ---------------- <end of example> ------------------
    >
    > This results in
    > $VAR1 = 'alice';
    > $VAR2 = {
    > 'quals' => [
    > 'a1',
    > 'a2',
    > 'b1',
    > 'b2'
    > ],
    > 'status' => 'broken',
    > 'name' => 'bob',
    > 'born' => 'aberdeen',
    > 'occupation' => 'baker'
    > };


    With the corrected dump the result looks a little more plausible:

    $VAR1 = {
    'alice' => {
    'status' => 'broken',
    'born' => 'aberdeen',
    'name' => 'bob',
    'quals' => [
    'a1',
    'a2',
    'b1',
    'b2'
    ],
    'occupation' => 'baker'
    }
    };

    though that's still not what you want.

    > I can see what's wrong: %attrs and @quals have scope outside the
    > file-read loop, so when the program gets to Bob's records they either
    > add to or replace Alice's. But I can't define them inside the read
    > loop because they then disappear and reappear with each record (you
    > can see where I tried this in the comments). I've also tried
    > undef-ing the arrays at the end of each person (also as shown),
    > without getting the desired result.


    One difficulty is that you are reading the input line-wise, but
    process it by paragraphs. This is always a bit tricky because
    some passes through the loop are different from the others. Reading
    the file paragraph-wise simplifies things because each loop processes
    one person. In particular, you can now declare a lexical hash %person
    in the loop body. It will be re-used each time through the loop (which
    wouldn't work if each person required multiple passes).

    my %people;
    $/ = ''; # paragraph mode
    while ( <DATA> ) {
    my %person;
    for ( split /\n/ ) { # process all lines for one person
    my ( $key, $val) = split /:/;
    if ( $key eq 'qual' ) {
    push @{ $person{ quals}}, $val;
    } else {
    $person{ $key} = $val;
    }
    }
    $people{ $person{ name}} = \ %person;
    }

    print Dumper \ %people;

    __DATA__
    (etc)

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Sep 9, 2005
    #2
    1. Advertising

  3. Henry Law

    Henry Law Guest

    On 9 Sep 2005 10:05:19 GMT, -berlin.de (Anno
    Siegel) wrote:

    >Reading
    >the file paragraph-wise simplifies things because each loop processes
    >one person.


    That's the key suggestion; I hadn't even considered trying paragraphs.

    >In particular, you can now declare a lexical hash %person
    >in the loop body. It will be re-used each time through the loop (which
    >wouldn't work if each person required multiple passes).


    .... which is precisely what I need.

    > my %people;

    .... followed by code that does exactly what I want.

    > __DATA__
    > (etc)


    Thank you Anno. How good it is that you're in a European time zone,
    so I don't have to wait for Sinan, Paul etc to wake up :)
    --

    Henry Law <>< Manchester, England
     
    Henry Law, Sep 9, 2005
    #3
  4. Henry Law <> wrote in
    news::

    > --------------- <start of example> -----------------
    > #! C:\Perl\bin\perl.exe


    I have nothing useful to add to Anno's response, but I am just going to
    point out that you do not need the shebang line above on Windows.

    It is convenient to use

    #!/usr/bin/perl

    in case you later want to move the script to a *nix system. In setups I
    have worked with, that tends to be symlinked to the system default perl.

    Note that you can still specify options on the shebang line above on
    Windows. perl will respect them.


    Hope this helps.

    Sinan
    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
     
    A. Sinan Unur, Sep 9, 2005
    #4
  5. Henry Law

    Anno Siegel Guest

    Henry Law <> wrote in comp.lang.perl.misc:
    > On 9 Sep 2005 10:05:19 GMT, -berlin.de (Anno
    > Siegel) wrote:
    >
    > >Reading
    > >the file paragraph-wise simplifies things because each loop processes
    > >one person.

    >
    > That's the key suggestion; I hadn't even considered trying paragraphs.


    The key is really to have the loop operate one person at a time. Then
    "my %person" recreates %person in sync with how it is consumed into
    %people. Paragraph mode is a convenient way to do that, given the
    data format.

    An alternative would be to enclose an inner loop that builds a person
    (well a %person, somewhat less of a challenge). The outer loop would
    only ever read the first line of each person, like this:

    my %people;
    while ( <DATA> ) {
    my %person;
    while ( defined ) {
    chomp;
    last unless length;
    my ( $key, $val) = split /:/;
    if ( $key eq 'qual' ) {
    push @{ $person{ quals}}, $val;
    } else {
    $person{ $key} = $val;
    }
    $_ = <DATA>;
    }
    $people{ $person{ name}} = \ %person;
    }
    __DATA__
    ...

    It could be streamlined into doing all input in the inner loop.

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Sep 9, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dennis Gavrilov
    Replies:
    1
    Views:
    1,470
    Dennis Gavrilov
    Jul 24, 2003
  2. Alfonso Morra
    Replies:
    11
    Views:
    752
    Emmanuel Delahaye
    Sep 24, 2005
  3. kz
    Replies:
    8
    Views:
    151
  4. David Filmer
    Replies:
    19
    Views:
    279
    Kevin Collins
    May 21, 2004
  5. Andrew Falanga
    Replies:
    2
    Views:
    215
    Andrew Falanga
    Nov 22, 2008
Loading...

Share This Page