extracting properties of companies with a tag for company number

V

Vumani Dlamini

I would like to extract properties of companies from a huge text data
set. The data is structured as follows;

##### data #########
Area=3706
Company=101
PROPdes1=1 # description/type of property
PROPpri1=2 # public/private
PROPemp1=54 # number of employees
PROPdes2=6
PROPpri2=2
PROPemp2=23
###################

I would like to create data like,
3706|101|1|1|2|54
3706|101|2|6|2|23

where column 3 corresponds to the property tag, attached to each
variable corresponding to a particular property.

There are a lot more properties per company in my data set and thus I
opted to loop over that tag; but my code gives errors where those tags
are. Am not sure what I am missing.

##### Perl script ######
use strict;
open DATA, "c:/../properties.txt" or die "Unable to open file:$\n";
my ($Area , $Comp, $i, $Pdes, $Ppri, $Pemp);
open PRIVATE, ">c:/.../private.txt";
while (<DATA>){
if (/Area=(\d+)/) {
$Area = $1;
}
elsif (/Company=(\d+)/) {
$Comp = $1;
}
# Loop over properties by the same company (not more than 5 in
this data set)
for ($i = 1; $i<= 5;$i++){
# Each of the variable has a postfix for the property
number
elsif (/PROPdes($i)=(\d+)/) { # ERROR OCCURS
$Pdes = $1;
}
elsif (/PROPpri($i)=(\d+)/) { # ERROR OCCURS
$Ppri = $1;
}
elsif (/PROPemp(\d+)c=(\d+)/) {
print PRIVATE "$Area$Comp$i$Pdes$Ppri$1\n";
}
}
}
##### Perl script ######


Thanks, Vumani
 
B

Bob Walton

Vumani said:
I would like to extract properties of companies from a huge text data
set. The data is structured as follows;

##### data #########
Area=3706
Company=101
PROPdes1=1 # description/type of property
PROPpri1=2 # public/private
PROPemp1=54 # number of employees
PROPdes2=6
PROPpri2=2
PROPemp2=23
###################

I would like to create data like,
3706|101|1|1|2|54
3706|101|2|6|2|23

where column 3 corresponds to the property tag, attached to each
variable corresponding to a particular property.

There are a lot more properties per company in my data set and thus I
opted to loop over that tag; but my code gives errors where those tags
are. Am not sure what I am missing.

##### Perl script ######
use strict;
open DATA, "c:/../properties.txt" or die "Unable to open file:$\n";
my ($Area , $Comp, $i, $Pdes, $Ppri, $Pemp);
open PRIVATE, ">c:/.../private.txt";
while (<DATA>){
if (/Area=(\d+)/) {
$Area = $1;
}
elsif (/Company=(\d+)/) {
$Comp = $1;
}
# Loop over properties by the same company (not more than 5 in
this data set)
for ($i = 1; $i<= 5;$i++){
# Each of the variable has a postfix for the property
number
elsif (/PROPdes($i)=(\d+)/) { # ERROR OCCURS
$Pdes = $1;
}
elsif (/PROPpri($i)=(\d+)/) { # ERROR OCCURS
$Ppri = $1;
}
elsif (/PROPemp(\d+)c=(\d+)/) {
print PRIVATE "$Area$Comp$i$Pdes$Ppri$1\n";
}
}
}
##### Perl script ######
....


Well, for starters, the code you supplied doesn't compile, even after
the wrapped commentary is fixed (hint: fix that too). Fix the
compilation problem up and try again. Most folks here don't like to
waste their time guessing at what your real code might have been. It
would also be helpful to place your sample input data after a __END__
line and omit the open DATA,..., and, instead of opening an output file,
set the PRIVATE filehandle so it outputs to STDOUT. That way anyone can
cut/paste/run your code with no further fussing, and you'll get more and
better responses :).
 
G

Gunnar Hjalmarsson

Vumani said:
elsif (/PROPdes($i)=(\d+)/) { # ERROR OCCURS

Should be:

if (/PROPdes$i=(\d+)/) { # ERROR OCCURS

You may not start a conditional construct with 'elsif'.
No parentheses surrounding the $i variable, or else you capture the
value of $i at the next line, which is not what you want.

Your code includes a couple of other bugs, but this should be enough
to help you fix them by yourself.
 
V

Vumani Dlamini

You may not start a conditional construct with 'elsif'.
No parentheses surrounding the $i variable, or else you capture the
value of $i at the next line, which is not what you want.

This did the trick. Just changed the first 'elsif' to 'if' and
everything worked. Also had to change the captured variable to $2.
Your code includes a couple of other bugs, but this should be enough
to help you fix them by yourself.

Maybe, I don't seem to know exactly how to ask the questions, but I
felt this time I had a lot of detail???

Thanks a lot.


Vumani
 
G

Gunnar Hjalmarsson

Vumani said:
This did the trick. Just changed the first 'elsif' to 'if' and
everything worked. Also had to change the captured variable to $2.

I doubt that the last elsif statement matched:
elsif (/PROPemp(\d+)c=(\d+)/) { -----------------------------^


Maybe, I don't seem to know exactly how to ask the questions, but I
felt this time I had a lot of detail???

Personally I think that the level of detail is fine. Bob gave you some
good advice, even if I think he missed that the fact that the code
didn't compile was the reason why you asked for help.
 
B

Bob Walton

Gunnar said:
Vumani Dlamini wrote: ....
Personally I think that the level of detail is fine. Bob gave you some
good advice, even if I think he missed that the fact that the code
didn't compile was the reason why you asked for help.

Yeah, any more I go into auto-rant mode when posted code doesn't
compile, assuming the poster retyped instead of copy/pasted. I think
this is the first posting I've seen where the question was actually what
was causing a compilation error. My auto-rant could, of course, have
been avoided if the poster had stated what error it was he was getting.
 
J

John W. Krahn

Vumani said:
I would like to extract properties of companies from a huge text data
set. The data is structured as follows;

##### data #########
Area=3706
Company=101
PROPdes1=1 # description/type of property
PROPpri1=2 # public/private
PROPemp1=54 # number of employees
PROPdes2=6
PROPpri2=2
PROPemp2=23
###################

I would like to create data like,
3706|101|1|1|2|54
3706|101|2|6|2|23

where column 3 corresponds to the property tag, attached to each
variable corresponding to a particular property.

There are a lot more properties per company in my data set and thus I
opted to loop over that tag; but my code gives errors where those tags
are. Am not sure what I am missing.

This seems to do what you want:

#!/usr/bin/perl
use warnings;
use strict;

open DATA, 'c:/../properties.txt' or die "Unable to open c:/../properties.txt: $!";
open PRIVATE, '>c:/.../private.txt' or die "Unable to open c:/.../private.txt: $!";

my %data;
my @head = qw( Area Company );
my @rest = qw( record PROPdes PROPpri PROPemp );

while ( <DATA> ) {
my ( $name, $record, $num ) = /(\S+?)(\d+)?=(\d+)/ or next;
$data{ $name } = $num;
$data{ record } = $record if defined $record;
if ( keys( %data ) == @head + @rest ) {
print PRIVATE join( '|', @data{ @head, @rest } ), "\n";
delete @data{ @rest };
}
}

__END__


John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top