perl - data structure build to transpose data

Discussion in 'Perl Misc' started by shree, Aug 29, 2004.

  1. shree

    shree Guest

    Hi,

    I have been asked to transpose a data file extracted from an Excel
    report and saved as a .txt file. It lists time (MonthYear) in the
    header (first row). The data consists of blocks of 3 lines per
    supplier. In the example extract shown below, for Jan-04, total items
    supplied by supplier 1 were 1000, of which there were 200 defects,
    giving a defect ratio of 20%. I need to read-in this data file and
    output a file whose format I can best illustrate via an example shown
    below. Please note in the outfile's last column, it shows MonthID. If
    the data were to begin with Feb-04 and go till July-04, instead of
    Jan-04 to Mar-04 as shown below, then Feb-04 would be 1, Mar-04 2 and
    so on.

    Anyway, I'm struggling on thoughts of how to build a data structure to
    transform the data into the desired output file. Any pointers, code
    snippets will be greatly appreciated and I thank you in advance.

    Best wishes,
    Shree

    Sample Data filein.dat
    Jan-04 Feb-04 Mar-04
    Supp1 % 20.00% 10.17% 7.14%
    Defects 200 122 100
    Total 1000 1200 1400
    Supp2 % 3.00% 1.82% 1.90%
    Defects 60 40 40
    Total 2000 2200 2100

    Desired Output fileout.txt
    Supp1 % 20.00% Jan-04 1
    Supp1 Defects 200 Jan-04 1
    Supp1 Total 1000 Jan-04 1
    Supp1 % 10.17% Feb-04 2
    Supp1 Defects 122 Feb-04 2
    Supp1 Total 1200 Feb-04 2
    Supp1 % 7.14% Mar-04 3
    Supp1 Defects 100 Mar-04 3
    Supp1 Total 1400 Mar-04 3
    Supp2 % 3.00% Jan-04 1
    Supp2 Defects 60 Jan-04 1
    Supp2 Total 2000 Jan-04 1
    Supp2 % 1.82% Feb-04 2
    Supp2 Defects 40 Feb-04 2
    Supp2 Total 2200 Feb-04 2
    Supp2 % 1.90% Mar-04 3
    Supp2 Defects 40 Mar-04 3
    Supp2 Total 2100 Mar-04 3
    shree, Aug 29, 2004
    #1
    1. Advertising

  2. shree

    wfsp Guest

    "shree" <> wrote in message
    news:...
    > Hi,
    >
    > I have been asked to transpose a data file extracted from an Excel
    > report and saved as a .txt file. It lists time (MonthYear) in the
    > header (first row). The data consists of blocks of 3 lines per
    > supplier. In the example extract shown below, for Jan-04, total items
    > supplied by supplier 1 were 1000, of which there were 200 defects,
    > giving a defect ratio of 20%. I need to read-in this data file and
    > output a file whose format I can best illustrate via an example shown
    > below. Please note in the outfile's last column, it shows MonthID. If
    > the data were to begin with Feb-04 and go till July-04, instead of
    > Jan-04 to Mar-04 as shown below, then Feb-04 would be 1, Mar-04 2 and
    > so on.
    >
    > Anyway, I'm struggling on thoughts of how to build a data structure to
    > transform the data into the desired output file. Any pointers, code
    > snippets will be greatly appreciated and I thank you in advance.
    >
    > Best wishes,
    > Shree
    >
    > Sample Data filein.dat
    > Jan-04 Feb-04 Mar-04
    > Supp1 % 20.00% 10.17% 7.14%
    > Defects 200 122 100
    > Total 1000 1200 1400
    > Supp2 % 3.00% 1.82% 1.90%
    > Defects 60 40 40
    > Total 2000 2200 2100
    >
    > Desired Output fileout.txt
    > Supp1 % 20.00% Jan-04 1
    > Supp1 Defects 200 Jan-04 1
    > Supp1 Total 1000 Jan-04 1
    > Supp1 % 10.17% Feb-04 2
    > Supp1 Defects 122 Feb-04 2
    > Supp1 Total 1200 Feb-04 2
    > Supp1 % 7.14% Mar-04 3
    > Supp1 Defects 100 Mar-04 3
    > Supp1 Total 1400 Mar-04 3
    > Supp2 % 3.00% Jan-04 1
    > Supp2 Defects 60 Jan-04 1
    > Supp2 Total 2000 Jan-04 1
    > Supp2 % 1.82% Feb-04 2
    > Supp2 Defects 40 Feb-04 2
    > Supp2 Total 2200 Feb-04 2
    > Supp2 % 1.90% Mar-04 3
    > Supp2 Defects 40 Mar-04 3
    > Supp2 Total 2100 Mar-04 3


    This produces the output you indicated. If you do need a structure (e.g. a
    hash of hashes) you could build it instead of using the for loop.
    perlreftut, perldsc and perllol provide everything you need to construct
    complex data structures.

    #!/bin/perl5

    use strict;
    use warnings;

    chomp( my $header = <DATA> );
    my @months = split ' ', $header;
    my $month = $#months;
    while ( ! eof(DATA) ){
    my @pc = split ' ', <DATA>;
    my $supplier = shift @pc;
    my @defects = split ' ', <DATA>;
    my @total = split ' ', <DATA>;
    for (my $i=1;$i<=$month+1;$i++){
    my $month = $months[$i-1];
    print $supplier, "\t",
    $pc[0], "\t",
    $pc[$i], "\t",
    $month, "\t",
    $i, "\n",
    $supplier, "\t",
    $defects[0], "\t",
    $defects[$i], "\t",
    $month, "\t",
    $i, "\n",
    $supplier, "\t",
    $total[0], "\t",
    $total[$i], "\t",
    $month, "\t",
    $i, "\n";
    }
    }

    __DATA__
    Jan-04 Feb-04 Mar-04
    Supp1 % 20.00% 10.17% 7.14%
    Defects 200 122 100
    Total 1000 1200 1400
    Supp2 % 3.00% 1.82% 1.90%
    Defects 60 40 40
    Total 2000 2200 2100
    wfsp, Aug 29, 2004
    #2
    1. Advertising

  3. shree

    wfsp Guest

    <snip question>
    > This produces the output you indicated. If you do need a structure (e.g. a
    > hash of hashes) you could build it instead of using the for loop.
    > perlreftut, perldsc and perllol provide everything you need to construct
    > complex data structures.
    >
    > #!/bin/perl5
    >
    > use strict;
    > use warnings;
    >
    > chomp( my $header = <DATA> );
    > my @months = split ' ', $header;
    > my $month = $#months;
    > while ( ! eof(DATA) ){
    > my @pc = split ' ', <DATA>;
    > my $supplier = shift @pc;
    > my @defects = split ' ', <DATA>;
    > my @total = split ' ', <DATA>;
    > for (my $i=1;$i<=$month+1;$i++){
    > my $month = $months[$i-1];
    > print $supplier, "\t",
    > $pc[0], "\t",
    > $pc[$i], "\t",
    > $month, "\t",
    > $i, "\n",
    > $supplier, "\t",
    > $defects[0], "\t",
    > $defects[$i], "\t",
    > $month, "\t",
    > $i, "\n",
    > $supplier, "\t",
    > $total[0], "\t",
    > $total[$i], "\t",
    > $month, "\t",
    > $i, "\n";
    > }
    > }
    >
    > __DATA__
    > Jan-04 Feb-04 Mar-04
    > Supp1 % 20.00% 10.17% 7.14%
    > Defects 200 122 100
    > Total 1000 1200 1400
    > Supp2 % 3.00% 1.82% 1.90%
    > Defects 60 40 40
    > Total 2000 2200 2100
    >

    Poor choice of variable names!
    I should have said:
    > my $month_count = $#months;

    and:
    > for (my $i=1;$i<=$month_count+1;$i++){

    It works as is but looking at it again it's clearer this way.
    wfsp, Aug 29, 2004
    #3
  4. wfsp <> wrote:

    > I should have said:
    >> my $month_count = $#months;


    >> for (my $i=1;$i<=$month_count+1;$i++){

    > It works as is but looking at it again it's clearer this way.



    foreach my $i ( 1 .. @mounths )

    would be clearer yet.


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Aug 29, 2004
    #4
  5. (shree) wrote:

    > Sample Data filein.dat
    > Jan-04 Feb-04 Mar-04
    > Supp1 % 20.00% 10.17% 7.14%
    > Defects 200 122 100
    > Total 1000 1200 1400
    > Supp2 % 3.00% 1.82% 1.90%
    > Defects 60 40 40
    > Total 2000 2200 2100
    >
    > Desired Output fileout.txt
    > Supp1 % 20.00% Jan-04 1
    > Supp1 Defects 200 Jan-04 1
    > Supp1 Total 1000 Jan-04 1
    > Supp1 % 10.17% Feb-04 2
    > Supp1 Defects 122 Feb-04 2
    > Supp1 Total 1200 Feb-04 2
    > Supp1 % 7.14% Mar-04 3
    > Supp1 Defects 100 Mar-04 3
    > Supp1 Total 1400 Mar-04 3
    > Supp2 % 3.00% Jan-04 1
    > Supp2 Defects 60 Jan-04 1
    > Supp2 Total 2000 Jan-04 1
    > Supp2 % 1.82% Feb-04 2
    > Supp2 Defects 40 Feb-04 2
    > Supp2 Total 2200 Feb-04 2
    > Supp2 % 1.90% Mar-04 3
    > Supp2 Defects 40 Mar-04 3
    > Supp2 Total 2100 Mar-04 3


    Yet another way:

    use strict;
    use warnings;

    chomp( my @months = split ' ', <DATA> );
    while (not eof DATA) {

    my (%supplier, $supplier_name);
    for ( 1..3 ) {
    my ($name, $type, @data) = split /\s+/, <DATA>;
    $supplier_name = $name if $name;
    @{$supplier{$type}}{@months} = @data;
    }

    my $month_num;
    for my $month ( @months ) {
    $month_num++;
    for my $type (sort keys %supplier) {
    print join( "\t",
    $supplier_name,
    $type,
    $supplier{$type}{$month},
    $month,
    $month_num
    ),
    "\n";
    }
    }
    }


    __DATA__
    Jan-04 Feb-04 Mar-04
    Supp1 % 20.00% 10.17% 7.14%
    Defects 200 122 100
    Total 1000 1200 1400
    Supp2 % 3.00% 1.82% 1.90%
    Defects 60 40 40
    Total 2000 2200 2100
    David K. Wall, Aug 29, 2004
    #5
  6. shree

    Anno Siegel Guest

    shree <> wrote in comp.lang.perl.misc:
    > Hi,
    >
    > I have been asked to transpose a data file extracted from an Excel
    > report and saved as a .txt file. It lists time (MonthYear) in the
    > header (first row). The data consists of blocks of 3 lines per
    > supplier. In the example extract shown below, for Jan-04, total items
    > supplied by supplier 1 were 1000, of which there were 200 defects,
    > giving a defect ratio of 20%. I need to read-in this data file and
    > output a file whose format I can best illustrate via an example shown
    > below. Please note in the outfile's last column, it shows MonthID. If
    > the data were to begin with Feb-04 and go till July-04, instead of
    > Jan-04 to Mar-04 as shown below, then Feb-04 would be 1, Mar-04 2 and
    > so on.
    >
    > Anyway, I'm struggling on thoughts of how to build a data structure to
    > transform the data into the desired output file. Any pointers, code
    > snippets will be greatly appreciated and I thank you in advance.


    It would be nice to see a snippet of your code, or even of your
    thoughts about the problem. Just dumping the problem description
    *you* got to the newsgroup is frowned upon. You haven't even begun
    an analysis.

    > Sample Data filein.dat
    > Jan-04 Feb-04 Mar-04
    > Supp1 % 20.00% 10.17% 7.14%
    > Defects 200 122 100
    > Total 1000 1200 1400
    > Supp2 % 3.00% 1.82% 1.90%
    > Defects 60 40 40
    > Total 2000 2200 2100


    So the first data line is special and gives you the months to expect.
    Assuming the data in DATA, get the list of months and generate the
    MonthID's like this (all untested):

    my ( @months, %month_id);
    @months = split for ( scalar <DATA> );
    @month_id{ @months} = 1 .. @months;

    Also set an output format at this point, you'll need it later:

    my $ofmt = "%-5s %-7s %-7s %-6s %2d\n";

    You can save yourself the effort of setting this up and use my
    module Text::Table instead, but the format will do.

    Now you can process the following lines in groups of three:

    line: while ( 1 ) {
    my ( $supp, %supp_data);
    for ( scalar <DATA> ) { # get one line in $_
    last line unless defined; # regular end of file
    ( $supp, undef, @{ $supp_data{ '%'}}{ @months}) = split;
    }
    for ( scalar <DATA> ) {
    die "data error 2" unless defined;
    ( undef, @{ $supp_data{ Defects}}{ @months}) = split;
    }
    for ( scalar <DATA> ) {
    die "data error 3" unless defined;
    ( undef, @{ $supp_data{ Total}}{ @months}) = split;
    }

    There should probably be data checks in a real program, besides the
    end-of-file test I provided. Anyway, now you have collected all data
    for one supplier and can print it in any format you want, for instance
    this:

    for my $month ( @months) {
    for ( qw( % Defects Total) ) {
    printf $ofmt, $supp, $_, $supp_data{ $_}->{ $month},
    $month, $month_id{ $month};
    }
    }
    }


    > Desired Ouput file out.txt


    [snipped]

    Anno
    Anno Siegel, Aug 29, 2004
    #6
  7. shree

    Anno Siegel Guest

    bowsayge <> wrote in comp.lang.perl.misc:
    > shree said to us:
    >
    > [...]
    > > Anyway, I'm struggling on thoughts of how to build a data structure to
    > > transform the data into the desired output file. Any pointers, code
    > > snippets will be greatly appreciated and I thank you in advance.

    > [...]
    >
    > This isn't pretty, but it is one way of doing it.
    >
    > local $_;
    > my ($current, %supp, @months);
    >
    > while ($_ = <STDIN>) {
    > chomp;
    > if (@months < 1) {
    > s/(\w{3}-\d{2})/push @months, $1; ''/eg;
    > } else {
    > if (/^([a-zA-Z0-9]+)\s+\%\s+/) {
    > my $sn = $1;
    > push @{$supp{suppliers}}, $sn;
    > s/^.*?%\s+//;
    > s/([\d\.\%]+)/push @{$supp{"$sn,\%"}}, $1; ''/eg;
    > $current = $sn;
    > } elsif (/^\s+Defects/) {
    > s/^\D+//;
    > s/(\d+)\s*/push @{$supp{"$current,defects"}}, $1; ''/eg;
    > } elsif (/^\s+Total/) {
    > s/^\D+//;
    > s/(\d+)\s*/push @{$supp{"$current,total"}}, $1; ''/eg;
    > }
    > }
    > }
    >
    > foreach my $sn (@{$supp{suppliers}}) {
    > foreach my $no (0..$#months) {
    > my $pref = $supp{"$sn,%"};
    > my $dref = $supp{"$sn,defects"};
    > my $tref = $supp{"$sn,total"};
    > my $suffix = "\t$months[$no]\t@{[ $no + 1 ]}\n";
    > print "$sn\t\%\t$pref->[$no]$suffix";
    > print "$sn\tDefects\t$dref->[$no]$suffix";
    > print "$sn\tTotal\t$tref->[$no]$suffix";
    > }
    > }
    >
    > __END__


    I'll believe you, for one because I know you test your programs :)

    > EXPLANATION:
    > The list of months is grabbed from the first line.
    >
    > Then a hash is created that contains a list of supplier names. The hash also
    > is built up to contain the defect percentages, the number of defects and
    > the totals.
    >
    > When it's time to create the output, the program iterates over the
    > list of suppliers. For each supplier, the program iterates over the
    > months, outputting the various statistics for that month.


    I haven't analyzed your program to the last statement, but I have
    some remarks.

    It is much more general than necessary, in that it could read the
    input data in any sequence and produce the right output. Even the
    title line (which defines the expected months) could be buried
    anywhere, if I'm not mistaken.

    I interpret the OPs sample data to say that there is a title line
    and then a sequence of groups of three, all formatted alike. It
    is easier to read the file that way, expecting from each line a
    given format. You can also handle each supplier as soon as you have
    read the three lines, so you don't have to keep everything in memory.
    With your approach, you will have to do that.

    I'm also not too happy about your way to do serious data processing in
    an s///e expression. This approach can be powerful, but it's hard to
    follow, and it's not needed here. The data is far better split() (on
    white space) first. Then the fields can be processed as needed.

    The rule (known as Randal's Rule) is: If you know what to keep, use
    a match, if you know what to throw away, use split. "Know" can
    be translated as "know the simpler regex for". Here, the default
    split on white space is the obvious choice.

    > The program doesn't do exactly what you want, since it gets input from
    > STDIN and outputs to STDOUT, but you can easily adjust it.


    That's a minor point. Example programs on Usenet (in Perl) routinely
    print to STDOUT, and read from DATA or STDIN.

    > Now watch someone convert this into a one-liner :)


    Hardly. I have posted another solution (before I saw yours), that
    takes the three-lines-at-a-time approach. You will note that it
    takes some effort to deal with end-of-file correctly. That is typical
    for this way of reading a file in groups of n lines and is a drawback.

    Anno
    Anno Siegel, Aug 29, 2004
    #7
  8. -berlin.de (Anno Siegel) wrote:

    > It is much more general than necessary, in that it could read the
    > input data in any sequence and produce the right output. Even the
    > title line (which defines the expected months) could be buried
    > anywhere, if I'm not mistaken.


    That's a valid criticism of my post as well, since it will produce the
    "correct" output as long as the three lines for title/percent, defects, and
    total are all grouped together. I'll admit I was thinking of this as a
    strength, because another line could be added to a supplier record with
    minimal changes to the code. But in your post you mentioned data checks,
    something I hadn't considered since I assumed the data was okay. It's good to
    see another viewpoint.

    Maybe if it were MY data I would have been more careful. :)
    David K. Wall, Aug 30, 2004
    #8
  9. shree

    Anno Siegel Guest

    David K. Wall <> wrote in comp.lang.perl.misc:
    > (shree) wrote:


    [specifications]

    > Yet another way:


    Shree got lucky, undeservedly.

    > use strict;
    > use warnings;
    >
    > chomp( my @months = split ' ', <DATA> );
    > while (not eof DATA) {
    >
    > my (%supplier, $supplier_name);
    > for ( 1..3 ) {


    This could use

    die "data error $_" if eof;

    > my ($name, $type, @data) = split /\s+/, <DATA>;
    > $supplier_name = $name if $name;
    > @{$supplier{$type}}{@months} = @data;
    > }


    Ah, nice common format for all data records. BTW, the default
    split on ' ' wouldn't do, because it skips initial white space,
    suppressing leading empty fields. Did I ever mention that split()
    is too clever for its own good? :)

    This combines part of bowsayge's flexibility WRT line sequence
    with sequential processing in groups of three. The lines could
    be permuted within each supplier and would end up in the right
    place.

    > my $month_num;
    > for my $month ( @months ) {
    > $month_num++;
    > for my $type (sort keys %supplier) {
    > print join( "\t",
    > $supplier_name,
    > $type,
    > $supplier{$type}{$month},
    > $month,
    > $month_num
    > ),
    > "\n";
    > }
    > }
    > }


    I would... umm, did use a pre-assigned hash %month_num (or somesuch),
    instead of counting $month_num each time. It takes some clutter out
    of the print loop.

    > __DATA__


    [snipped]

    I notice that you used the exact same data structure that I used to store
    the data for each supplier. I wouldn't be amazed if you'd come up with it
    independently, it seems inherent to the problem. I haven't analyzed
    bowsayge's, but aside from some (unnecessary) trick with appending
    key parts, it appears similar.

    Anno
    Anno Siegel, Aug 30, 2004
    #9
  10. I wrote:

    > my (%supplier, $supplier_name);
    > for ( 1..3 ) {
    > my ($name, $type, @data) = split /\s+/, <DATA>;
    > $supplier_name = $name if $name;
    > @{$supplier{$type}}{@months} = @data;
    > }


    Looking at this again after responding to Anno, I recall just why I used a
    for loop: my original (unposted) solution handled each line separately, but
    when I looked at it I realized I was repeating basically the same code three
    times. I didn't like that, so I condensed it.

    (Can you tell it's Sunday evening and I'm bored? :)
    David K. Wall, Aug 30, 2004
    #10
  11. shree

    Anno Siegel Guest

    David K. Wall <> wrote in comp.lang.perl.misc:
    > -berlin.de (Anno Siegel) wrote:
    >
    > > It is much more general than necessary, in that it could read the
    > > input data in any sequence and produce the right output. Even the
    > > title line (which defines the expected months) could be buried
    > > anywhere, if I'm not mistaken.

    >
    > That's a valid criticism of my post as well, since it will produce the
    > "correct" output as long as the three lines for title/percent, defects, and
    > total are all grouped together. I'll admit I was thinking of this as a
    > strength, because another line could be added to a supplier record with
    > minimal changes to the code. But in your post you mentioned data checks,
    > something I hadn't considered since I assumed the data was okay. It's good to
    > see another viewpoint.


    Your assumption may well be justified. I seem to remember the data
    is produced by some other program, some spreadsheet or the like.
    Whatever it is, it won't be in the habit of producing incomplete data or
    changing the format at whim.

    I don't think my criticism applies as much as it does to bowsayge's.
    If I got it right, the lines could be in any permutation, including
    the title line. That is way too much generality.

    Allowing for permutations in each group of three (or n) doesn't
    disturb the sequential processing of groups, which is my main
    objection against the generality. Un-needed generality is a bonus
    if it comes at no cost, as it does with your solution. In fact,
    it enables the processing of all lines in a loop. My solution,
    which expects a fixed sequence, has to unroll the loop.

    > Maybe if it were MY data I would have been more careful. :)


    I've been slightly wondering about the purpose of this exercise.
    It seems to put the data back into a rawer, more redundant form,
    one it might have had before it entered the spreadsheet.

    But it makes a nice example.

    Anno
    Anno Siegel, Aug 30, 2004
    #11
  12. -berlin.de (Anno Siegel) wrote:

    > David K. Wall <> wrote in comp.lang.perl.misc:
    >> (shree) wrote:

    >
    > [specifications]
    >
    >> Yet another way:

    >
    > Shree got lucky, undeservedly.


    I'm bored.

    >> while (not eof DATA) {
    >>
    >> my (%supplier, $supplier_name);
    >> for ( 1..3 ) {

    >
    > This could use
    >
    > die "data error $_" if eof;


    Yup.

    >
    >> my ($name, $type, @data) = split /\s+/, <DATA>;
    >> $supplier_name = $name if $name;
    >> @{$supplier{$type}}{@months} = @data;
    >> }

    >
    > Ah, nice common format for all data records. BTW, the default
    > split on ' ' wouldn't do, because it skips initial white space,
    > suppressing leading empty fields. Did I ever mention that split()
    > is too clever for its own good? :)


    Yeah. Not using the default split on whitespace was deliberate: I *wanted*
    that empty field. :)

    > This combines part of bowsayge's flexibility WRT line sequence
    > with sequential processing in groups of three. The lines could
    > be permuted within each supplier and would end up in the right
    > place.


    I've already admitted that in another post. Must have crossed each other
    during propogation through usenet.

    > I would... umm, did use a pre-assigned hash %month_num (or somesuch),
    > instead of counting $month_num each time. It takes some clutter out
    > of the print loop.


    That was nice. Wish I'd thought of it.

    > I notice that you used the exact same data structure that I used to
    > store the data for each supplier. I wouldn't be amazed if you'd come up
    > with it independently, it seems inherent to the problem.


    Yeah, it was independent: it seemed the natural way to do it.

    > I haven't
    > analyzed bowsayge's, but aside from some (unnecessary) trick with
    > appending key parts, it appears similar.


    I didn't actually read his; I didn't feel like digging into the s/// stuff.
    It just gave me the urge to write something easier to read. :)
    David K. Wall, Aug 30, 2004
    #12
  13. shree

    Anno Siegel Guest

    David K. Wall <> wrote in comp.lang.perl.misc:
    > -berlin.de (Anno Siegel) wrote:
    >
    > > David K. Wall <> wrote in comp.lang.perl.misc:
    > >> (shree) wrote:

    > >
    > > [specifications]
    > >
    > >> Yet another way:

    > >
    > > Shree got lucky, undeservedly.

    >
    > I'm bored.


    To summarize (time for bed here, so I'm in a summarizing mood) and
    to take the chance to plug Text::Table for real, I've put the parts
    together. It's still far from a one-liner, but it's about as compact
    as it gets while trying to keep it readable.

    Anno

    use Text::Table;
    my $tb = Text::Table->new( ( '&left') x 5);
    my ( @months, %month_id);
    @months = split for ( scalar <DATA> );
    @month_id{ @months} = 1 .. @months;
    while ( ! eof DATA ) {
    my ( $supp_name, %supp_data);
    for ( 1 .. 3 ) {
    die "bad data $_" if eof;
    my ( $name, $type, @data) = split /\s+/, <DATA>;
    $supp_name = $name if $name;
    @{$supp_data{$type}}{@months} = @data;
    }
    for my $month ( @months) {
    for ( sort keys %supp_data ) {
    $tb->add( $supp_name, $_, $supp_data{ $_}->{ $month},
    $month, $month_id{ $month});
    }
    }
    }
    print $tb;

    __DATA__
    Jan-04 Feb-04 Mar-04
    Supp1 % 20.00% 10.17% 7.14%
    Defects 200 122 100
    Total 1000 1200 1400
    Supp2 % 3.00% 1.82% 1.90%
    Defects 60 40 40
    Total 2000 2200 2100
    Anno Siegel, Aug 30, 2004
    #13
  14. shree

    shree Guest

    -berlin.de (Anno Siegel) wrote in message news:<cgtifu$44j$-Berlin.DE>...
    > shree <> wrote in comp.lang.perl.misc:
    > > Hi,
    > >
    > > I have been asked to transpose a data file extracted from an Excel
    > > report and saved as a .txt file. It lists time (MonthYear) in the
    > > header (first row). The data consists of blocks of 3 lines per
    > > supplier. In the example extract shown below, for Jan-04, total items
    > > supplied by supplier 1 were 1000, of which there were 200 defects,
    > > giving a defect ratio of 20%. I need to read-in this data file and
    > > output a file whose format I can best illustrate via an example shown
    > > below. Please note in the outfile's last column, it shows MonthID. If
    > > the data were to begin with Feb-04 and go till July-04, instead of
    > > Jan-04 to Mar-04 as shown below, then Feb-04 would be 1, Mar-04 2 and
    > > so on.
    > >
    > > Anyway, I'm struggling on thoughts of how to build a data structure to
    > > transform the data into the desired output file. Any pointers, code
    > > snippets will be greatly appreciated and I thank you in advance.

    >
    > It would be nice to see a snippet of your code, or even of your
    > thoughts about the problem. Just dumping the problem description
    > *you* got to the newsgroup is frowned upon. You haven't even begun
    > an analysis.
    >
    > > Sample Data filein.dat
    > > Jan-04 Feb-04 Mar-04
    > > Supp1 % 20.00% 10.17% 7.14%
    > > Defects 200 122 100
    > > Total 1000 1200 1400
    > > Supp2 % 3.00% 1.82% 1.90%
    > > Defects 60 40 40
    > > Total 2000 2200 2100

    >
    > So the first data line is special and gives you the months to expect.
    > Assuming the data in DATA, get the list of months and generate the
    > MonthID's like this (all untested):
    >
    > my ( @months, %month_id);
    > @months = split for ( scalar <DATA> );
    > @month_id{ @months} = 1 .. @months;
    >
    > Also set an output format at this point, you'll need it later:
    >
    > my $ofmt = "%-5s %-7s %-7s %-6s %2d\n";
    >
    > You can save yourself the effort of setting this up and use my
    > module Text::Table instead, but the format will do.
    >
    > Now you can process the following lines in groups of three:
    >
    > line: while ( 1 ) {
    > my ( $supp, %supp_data);
    > for ( scalar <DATA> ) { # get one line in $_
    > last line unless defined; # regular end of file
    > ( $supp, undef, @{ $supp_data{ '%'}}{ @months}) = split;
    > }
    > for ( scalar <DATA> ) {
    > die "data error 2" unless defined;
    > ( undef, @{ $supp_data{ Defects}}{ @months}) = split;
    > }
    > for ( scalar <DATA> ) {
    > die "data error 3" unless defined;
    > ( undef, @{ $supp_data{ Total}}{ @months}) = split;
    > }
    >
    > There should probably be data checks in a real program, besides the
    > end-of-file test I provided. Anyway, now you have collected all data
    > for one supplier and can print it in any format you want, for instance
    > this:
    >
    > for my $month ( @months) {
    > for ( qw( % Defects Total) ) {
    > printf $ofmt, $supp, $_, $supp_data{ $_}->{ $month},
    > $month, $month_id{ $month};
    > }
    > }
    > }
    >
    >
    > > Desired Ouput file out.txt

    >
    > [snipped]
    >
    > Anno


    Hello All,

    Let me first begin with saying a big 'thank you' to all and I really
    appreciate your help and the time you took. I tried out the examples
    you guys shared and every one of them gave the expected result. I
    apologize for not posting my attempts earlier primarily because I felt
    I was clueless.

    I was wondering if anyone can suggest a perl book or website thats
    filled with illustrative examples of problems involving complex data
    structures. To me, this is the hardest concept to grasp.

    Thank you,
    Shree
    shree, Aug 30, 2004
    #14
  15. [ please do not top-post. TIA ]

    shree wrote:
    >
    > I was wondering if anyone can suggest a perl book or website thats
    > filled with illustrative examples of problems involving complex data
    > structures. To me, this is the hardest concept to grasp.


    You should probably check out this book and see if it meets your requirements:
    http://www.oreilly.com/catalog/maperl/index.html


    John
    --
    use Perl;
    program
    fulfillment
    John W. Krahn, Aug 30, 2004
    #15
  16. "John W. Krahn" <> wrote:

    > shree wrote:
    >>
    >> I was wondering if anyone can suggest a perl book or website thats
    >> filled with illustrative examples of problems involving complex data
    >> structures. To me, this is the hardest concept to grasp.

    >
    > You should probably check out this book and see if it meets your
    > requirements: http://www.oreilly.com/catalog/maperl/index.html


    On a less advanced level, you (shree) should probably read -- if you haven't
    already -- these parts of the Perl docs: perldsc, perllol, perlreftut, and
    perlref. (Did I miss any?) "The Perl Cookbook" is full of useful tips; you
    might like it, too. I'd also recommend "Learning Perl Objects, References &
    Modules", even though it's not exactly what you asked for.

    See http://learn.perl.org/ for these and other recommendations.

    Oh, and possibly search the Google archives of comp.lang.perl.misc for such
    phrases as "hash of hashes", "hash of arrays", "array of hashes", and so on.
    Lots of good stuff in the archives -- and lots of crap, too, some of it
    posted by me.
    David K. Wall, Aug 31, 2004
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    865
  2. VisionSet

    Graph - node - transpose

    VisionSet, Oct 15, 2005, in forum: Java
    Replies:
    4
    Views:
    497
    VisionSet
    Oct 18, 2005
  3. infiniti
    Replies:
    4
    Views:
    2,719
    monique
    Jan 16, 2006
  4. Raoul
    Replies:
    6
    Views:
    316
    Raoul Meuldijk
    Jul 20, 2004
  5. deepak p
    Replies:
    0
    Views:
    93
    deepak p
    Aug 2, 2004
Loading...

Share This Page