extracting properties of companies with a tag for company number

Discussion in 'Perl Misc' started by Vumani Dlamini, Nov 1, 2003.

  1. I would like to extract properties of companies from a huge text data
    set. The data is structured as follows;

    ##### data #########
    Area=3706
    Company=101
    PROPdes1=1 # description/type of property
    PROPpri1=2 # public/private
    PROPemp1=54 # number of employees
    PROPdes2=6
    PROPpri2=2
    PROPemp2=23
    ###################

    I would like to create data like,
    3706|101|1|1|2|54
    3706|101|2|6|2|23

    where column 3 corresponds to the property tag, attached to each
    variable corresponding to a particular property.

    There are a lot more properties per company in my data set and thus I
    opted to loop over that tag; but my code gives errors where those tags
    are. Am not sure what I am missing.

    ##### Perl script ######
    use strict;
    open DATA, "c:/../properties.txt" or die "Unable to open file:$\n";
    my ($Area , $Comp, $i, $Pdes, $Ppri, $Pemp);
    open PRIVATE, ">c:/.../private.txt";
    while (<DATA>){
    if (/Area=(\d+)/) {
    $Area = $1;
    }
    elsif (/Company=(\d+)/) {
    $Comp = $1;
    }
    # Loop over properties by the same company (not more than 5 in
    this data set)
    for ($i = 1; $i<= 5;$i++){
    # Each of the variable has a postfix for the property
    number
    elsif (/PROPdes($i)=(\d+)/) { # ERROR OCCURS
    $Pdes = $1;
    }
    elsif (/PROPpri($i)=(\d+)/) { # ERROR OCCURS
    $Ppri = $1;
    }
    elsif (/PROPemp(\d+)c=(\d+)/) {
    print PRIVATE "$Area$Comp$i$Pdes$Ppri$1\n";
    }
    }
    }
    ##### Perl script ######


    Thanks, Vumani
     
    Vumani Dlamini, Nov 1, 2003
    #1
    1. Advertising

  2. Vumani Dlamini

    Bob Walton Guest

    Vumani Dlamini wrote:

    > I would like to extract properties of companies from a huge text data
    > set. The data is structured as follows;
    >
    > ##### data #########
    > Area=3706
    > Company=101
    > PROPdes1=1 # description/type of property
    > PROPpri1=2 # public/private
    > PROPemp1=54 # number of employees
    > PROPdes2=6
    > PROPpri2=2
    > PROPemp2=23
    > ###################
    >
    > I would like to create data like,
    > 3706|101|1|1|2|54
    > 3706|101|2|6|2|23
    >
    > where column 3 corresponds to the property tag, attached to each
    > variable corresponding to a particular property.
    >
    > There are a lot more properties per company in my data set and thus I
    > opted to loop over that tag; but my code gives errors where those tags
    > are. Am not sure what I am missing.
    >
    > ##### Perl script ######
    > use strict;
    > open DATA, "c:/../properties.txt" or die "Unable to open file:$\n";
    > my ($Area , $Comp, $i, $Pdes, $Ppri, $Pemp);
    > open PRIVATE, ">c:/.../private.txt";
    > while (<DATA>){
    > if (/Area=(\d+)/) {
    > $Area = $1;
    > }
    > elsif (/Company=(\d+)/) {
    > $Comp = $1;
    > }
    > # Loop over properties by the same company (not more than 5 in
    > this data set)
    > for ($i = 1; $i<= 5;$i++){
    > # Each of the variable has a postfix for the property
    > number
    > elsif (/PROPdes($i)=(\d+)/) { # ERROR OCCURS
    > $Pdes = $1;
    > }
    > elsif (/PROPpri($i)=(\d+)/) { # ERROR OCCURS
    > $Ppri = $1;
    > }
    > elsif (/PROPemp(\d+)c=(\d+)/) {
    > print PRIVATE "$Area$Comp$i$Pdes$Ppri$1\n";
    > }
    > }
    > }
    > ##### Perl script ######

    ....


    Well, for starters, the code you supplied doesn't compile, even after
    the wrapped commentary is fixed (hint: fix that too). Fix the
    compilation problem up and try again. Most folks here don't like to
    waste their time guessing at what your real code might have been. It
    would also be helpful to place your sample input data after a __END__
    line and omit the open DATA,..., and, instead of opening an output file,
    set the PRIVATE filehandle so it outputs to STDOUT. That way anyone can
    cut/paste/run your code with no further fussing, and you'll get more and
    better responses :).

    --
    Bob Walton
    Email: http://bwalton.com/cgi-bin/emailbob.pl
     
    Bob Walton, Nov 1, 2003
    #2
    1. Advertising

  3. Vumani Dlamini wrote:
    >
    > elsif (/PROPdes($i)=(\d+)/) { # ERROR OCCURS


    Should be:

    if (/PROPdes$i=(\d+)/) { # ERROR OCCURS

    You may not start a conditional construct with 'elsif'.
    No parentheses surrounding the $i variable, or else you capture the
    value of $i at the next line, which is not what you want.

    Your code includes a couple of other bugs, but this should be enough
    to help you fix them by yourself.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Nov 2, 2003
    #3
  4. > You may not start a conditional construct with 'elsif'.
    > No parentheses surrounding the $i variable, or else you capture the
    > value of $i at the next line, which is not what you want.


    This did the trick. Just changed the first 'elsif' to 'if' and
    everything worked. Also had to change the captured variable to $2.

    > Your code includes a couple of other bugs, but this should be enough
    > to help you fix them by yourself.


    Maybe, I don't seem to know exactly how to ask the questions, but I
    felt this time I had a lot of detail???

    Thanks a lot.


    Vumani
     
    Vumani Dlamini, Nov 2, 2003
    #4
  5. Vumani Dlamini wrote:
    >> You may not start a conditional construct with 'elsif'. No
    >> parentheses surrounding the $i variable, or else you capture the
    >> value of $i at the next line, which is not what you want.

    >
    > This did the trick. Just changed the first 'elsif' to 'if' and
    > everything worked. Also had to change the captured variable to $2.


    I doubt that the last elsif statement matched:

    > elsif (/PROPemp(\d+)c=(\d+)/) {

    -----------------------------^

    >> Your code includes a couple of other bugs, but this should be
    >> enough to help you fix them by yourself.

    >
    > Maybe, I don't seem to know exactly how to ask the questions, but I
    > felt this time I had a lot of detail???


    Personally I think that the level of detail is fine. Bob gave you some
    good advice, even if I think he missed that the fact that the code
    didn't compile was the reason why you asked for help.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
     
    Gunnar Hjalmarsson, Nov 2, 2003
    #5
  6. Vumani Dlamini

    Bob Walton Guest

    Gunnar Hjalmarsson wrote:

    > Vumani Dlamini wrote:

    ....
    > Personally I think that the level of detail is fine. Bob gave you some
    > good advice, even if I think he missed that the fact that the code
    > didn't compile was the reason why you asked for help.
    >


    Yeah, any more I go into auto-rant mode when posted code doesn't
    compile, assuming the poster retyped instead of copy/pasted. I think
    this is the first posting I've seen where the question was actually what
    was causing a compilation error. My auto-rant could, of course, have
    been avoided if the poster had stated what error it was he was getting.

    --
    Bob Walton
    Email: http://bwalton.com/cgi-bin/emailbob.pl
     
    Bob Walton, Nov 5, 2003
    #6
  7. Vumani Dlamini wrote:
    >
    > I would like to extract properties of companies from a huge text data
    > set. The data is structured as follows;
    >
    > ##### data #########
    > Area=3706
    > Company=101
    > PROPdes1=1 # description/type of property
    > PROPpri1=2 # public/private
    > PROPemp1=54 # number of employees
    > PROPdes2=6
    > PROPpri2=2
    > PROPemp2=23
    > ###################
    >
    > I would like to create data like,
    > 3706|101|1|1|2|54
    > 3706|101|2|6|2|23
    >
    > where column 3 corresponds to the property tag, attached to each
    > variable corresponding to a particular property.
    >
    > There are a lot more properties per company in my data set and thus I
    > opted to loop over that tag; but my code gives errors where those tags
    > are. Am not sure what I am missing.


    This seems to do what you want:

    #!/usr/bin/perl
    use warnings;
    use strict;

    open DATA, 'c:/../properties.txt' or die "Unable to open c:/../properties.txt: $!";
    open PRIVATE, '>c:/.../private.txt' or die "Unable to open c:/.../private.txt: $!";

    my %data;
    my @head = qw( Area Company );
    my @rest = qw( record PROPdes PROPpri PROPemp );

    while ( <DATA> ) {
    my ( $name, $record, $num ) = /(\S+?)(\d+)?=(\d+)/ or next;
    $data{ $name } = $num;
    $data{ record } = $record if defined $record;
    if ( keys( %data ) == @head + @rest ) {
    print PRIVATE join( '|', @data{ @head, @rest } ), "\n";
    delete @data{ @rest };
    }
    }

    __END__


    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, Nov 9, 2003
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?U2lyZQ==?=
    Replies:
    12
    Views:
    541
    clintonG
    Sep 15, 2004
  2. shruds
    Replies:
    1
    Views:
    870
    John C. Bollinger
    Jan 27, 2006
  3. Barathi
    Replies:
    1
    Views:
    328
    Andrew Thompson
    Nov 21, 2007
  4. JTP PR
    Replies:
    0
    Views:
    338
    JTP PR
    Dec 3, 2008
  5. VanL
    Replies:
    1
    Views:
    345
    Giampaolo Rodola'
    Aug 7, 2009
Loading...

Share This Page