extracting text data in the presence of a "look-up" file: Is it possible?

Discussion in 'Perl Misc' started by Vumani Dlamini, Jan 7, 2004.

  1. This problem follows up on a couple of problems I sent to the list 2
    months back. The data is structured as follows;

    ##### data #########
    Area=3706
    Company=101
    PROPdes=1 # description/type of property
    PROPpri=2 # public/private
    PROPemp=54 # number of employees
    PROPdes=6
    PROPpri=2
    PROPemp=23
    Company=106
    PROPdes=4
    PROPpri=2
    PROPemp=56
    Area=3709
    Company=116
    PROPdes=9
    PROPpri=1
    PROPemp=200
    ###################

    And the data set created is;
    3706|101|1|2|054
    3706|101|6|2|023
    3706|106|4|2|056
    3709|116|9|1|200

    using the following Perl script;
    ##### Perl script ######
    use strict;
    use warnings;
    open DATA, "c:/../properties.txt" or die "Unable to open file:$\n";
    my ($Area , $Comp, $Pdes, $Ppri, $Pemp);
    open PRIVATE, ">c:/.../private.txt";
    while (<DATA>){
    if (/Area=(\d+)/) {
    $Area = $1;
    }
    elsif (/Company=(\d+)/) {
    $Comp = $1;
    }
    elsif (/PROPdes=(\d+)/) {
    $Pdes = $1;
    }
    elsif (/PROPpri=(\d+)/) {
    $Ppri = $1;
    }
    elsif (/PROPemp=(\d+)/) {
    print PRIVATE "$Area$Comp$Pdes$Ppri$1\n";
    }
    }
    }
    ##### Perl script ######

    I now have a "area text file" with specific companies that have to be
    extracted, with each row in the "area text file" having a code for an
    area. I would like to extract companies only in areas listed in the
    "area text file".

    If within the areas in the "area text file" I am only interested in
    areas with more than 10 companies, is it possible to write a script
    which utilizes all this information?

    Thanks al lot, again.

    Vumani Dlamini

    PS: My previous posts related to this problem can be found here:
    http://groups.google.nl/groups?hl=
    http://groups.google.nl/groups?q=
    http://groups.google.nl/groups?q=













    http://groups.google.nl/groups?hl=
    http://groups.google.nl/groups?q=
    http://groups.google.nl/groups?q=
    Vumani Dlamini, Jan 7, 2004
    #1
    1. Advertising

  2. Vumani Dlamini

    Tore Aursand Guest

    Re: extracting text data in the presence of a "look-up" file: Is it possible?

    On Wed, 07 Jan 2004 11:43:09 -0800, Vumani Dlamini wrote:
    > [...]
    > And the data set created is;
    > 3706|101|1|2|054
    > 3706|101|6|2|023
    > 3706|106|4|2|056
    > 3709|116|9|1|200
    >
    > [...]
    >
    > I now have a "area text file" with specific companies that have to be
    > extracted, with each row in the "area text file" having a code for an
    > area. I would like to extract companies only in areas listed in the
    > "area text file".
    >
    > If within the areas in the "area text file" I am only interested in
    > areas with more than 10 companies, is it possible to write a script
    > which utilizes all this information?


    If I understand your problem correctly, you could use a hash to do that;

    my %areas = ();
    while ( <DATA> ) {
    chomp;
    my ($area, $company, @tmp) = split( /\Q|\E/ );
    push( @{$areas{$area}}, $company );
    }

    foreach ( keys %areas ) {
    if ( @{$areas{$_}} > 10 ) {
    print "Area $_ has more than 10 companies\n";
    }
    }


    --
    Tore Aursand <>
    "Writing is a lot like sex. At first you do it because you like it.
    Then you find yourself doing it for a few close friends and people you
    like. But if you're any good at all, you end up doing it for money."
    -- Unknown
    Tore Aursand, Jan 7, 2004
    #2
    1. Advertising

  3. Vumani Dlamini <> wrote:

    > This problem follows up on a couple of problems I sent to the list 2

    ^^^^^^^^
    ^^^^^^^^

    This is not a mailing list.

    This is a Usenet newsgroup.


    > using the following Perl script;



    I kinda doubt that.

    The following is not a Perl script at all! It has a syntax error.

    Please be careful to post your _real_ code.


    > ##### Perl script ######
    > use strict;
    > use warnings;
    > open DATA, "c:/../properties.txt" or die "Unable to open file:$\n";

    ^^^
    ^^^
    I think you meant "$!\n" instead of "$\n" there.

    (if so, then why are you putting the newline there?)


    > open PRIVATE, ">c:/.../private.txt";



    You should always, yes *always*, check the return value from open():

    open PRIVATE, '>c:/.../private.txt' or
    die "could not open '>c:/.../private.txt' $!";

    You did it earlier, why did you stop?


    > while (<DATA>){



    DATA is a special filehandle, you should choose some other name.


    > if (/Area=(\d+)/) {
    > $Area = $1;
    > }
    > elsif (/Company=(\d+)/) {
    > $Comp = $1;
    > }
    > elsif (/PROPdes=(\d+)/) {
    > $Pdes = $1;
    > }
    > elsif (/PROPpri=(\d+)/) {
    > $Ppri = $1;
    > }
    > elsif (/PROPemp=(\d+)/) {
    > print PRIVATE "$Area$Comp$Pdes$Ppri$1\n";
    > }
    > }
    > }

    ^
    ^
    ^
    What does that curly match up with?


    > I now have a "area text file"



    Maybe you do and maybe you don't.

    If the open() failed, then there _is no_ file...


    > If within the areas in the "area text file" I am only interested in
    > areas with more than 10 companies, is it possible to write a script
    > which utilizes all this information?



    Yes.


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Jan 7, 2004
    #3
  4. Vumani Dlamini

    Jay Tilton Guest

    Tore Aursand <> wrote:

    : my ($area, $company, @tmp) = split( /\Q|\E/ );
    ^^^^^
    An unusual style choice. Am I overlooking an advantage that has over
    saying split( /\|/ ) or split( /[|]/ ) ?
    Jay Tilton, Jan 8, 2004
    #4
  5. Vumani Dlamini

    Tore Aursand Guest

    Re: extracting text data in the presence of a "look-up" file: Is it possible?

    On Thu, 08 Jan 2004 08:29:09 +0000, Jay Tilton wrote:
    >> my ($area, $company, @tmp) = split( /\Q|\E/ );

    > ^^^^^
    > An unusual style choice. Am I overlooking an advantage that has over
    > saying split( /\|/ ) or split( /[|]/ ) ?


    No advantages that I know of. I've made my editor (FTE) highlight \Q\E in
    a special way, so...


    --
    Tore Aursand <>
    "To cease smoking is the easiset thing I ever did. I ought to know,
    I've done it a thousand times." -- Mark Twain
    Tore Aursand, Jan 8, 2004
    #5
  6. On 7 Jan 2004 11:43:09 -0800, (Vumani Dlamini)
    wrote:

    >This problem follows up on a couple of problems I sent to the list 2
    >months back. The data is structured as follows;

    [snip]
    >And the data set created is;

    [snip]
    >using the following Perl script;
    >##### Perl script ######
    >use strict;
    >use warnings;
    >open DATA, "c:/../properties.txt" or die "Unable to open file:$\n";


    Probably not a very good idea calling it "DATA": no harm done, but you
    may end up needing Perl's own DATA fh first or later...

    >open PRIVATE, ">c:/.../private.txt";


    aren't we checking here, eh?!?
    ;-)

    [snip]

    > elsif (/PROPemp=(\d+)/) {
    > print PRIVATE "$Area$Comp$Pdes$Ppri$1\n";


    This doesn't seem consistent with the "data set created" cut away from
    the above paragraph.

    Here's how I'd do it anyway:

    #!/usr/bin/perl -l

    use strict;
    use warnings;

    die "Usage: $0 <infile> <outfile>\n" unless @ARGV == 2;

    my ($data,$priv);
    open $data, '<', $_ or die "Unable to open `$_': $!\n" for shift;
    open $priv, '>', $_ or die "Unable to open `$_': $!\n" for shift;
    select $priv;

    my %props;
    while (<$data>) {
    chomp;
    warn("Input data mismatch"), next unless /^(\w+)=(\d+)\s*$/;
    $props{$1}=$2;
    if ($1 eq 'PROPemp') {
    no warnings 'uninitialized';
    local $,='|';
    print map $props{$_},
    qw/Area Company PROPdes PROPpri PROPemp/;
    }
    }

    __END__

    This is basically just as your own script. Only, IMHO, slightly more
    perlish and more maintainable.

    >I now have a "area text file" with specific companies that have to be
    >extracted, with each row in the "area text file" having a code for an
    >area. I would like to extract companies only in areas listed in the
    >"area text file".


    Oh, but then just add as the first statement of the 'if' block the
    following line:

    next unless in $props{'Area'}, @Areas;

    Of course it is up to you to write a suitable 'in' sub (see a recent
    thread on the subject too!) or substitute suitable code, and populate
    @Areas. But that shouldn't be a problem...


    Michele
    --
    # This prints: Just another Perl hacker,
    seek DATA,15,0 and print q... <DATA>;
    __END__
    Michele Dondi, Jan 9, 2004
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. a_srivathsan
    Replies:
    2
    Views:
    3,378
    a_srivathsan
    Sep 8, 2004
  2. |{evin

    MSN Presence info?

    |{evin, Jan 27, 2004, in forum: ASP .Net
    Replies:
    6
    Views:
    394
    |{evin
    Jan 28, 2004
  3. Bomb Diggy
    Replies:
    17
    Views:
    1,171
    Roedy Green
    Aug 29, 2003
  4. Replies:
    7
    Views:
    559
    Tom Anderson
    Nov 9, 2005
  5. Phoe6
    Replies:
    11
    Views:
    379
    Tim Roberts
    Dec 10, 2005
Loading...

Share This Page