suggestions on intelligent processing of data sets in a file

Discussion in 'Perl Misc' started by alt.testing@{g}mail.com, May 9, 2007.

  1. Hi all,
    I am writing a script to parse files, and insert data into mysql.
    The task is simple enough with files containing "standard" fields.
    However; there are many files, and this is not the case.
    Some of the files even vary in the number of fields therein.

    Example: (fields are email, name, postcode, phone)
    , Firstname Lastname
    , Firstname Lastname, 2004, 0412 321 512
    , Firstname Lastname, 0412 321 512


    Now; other than the obvious and easy solution of breaking up the files
    into chunks that are "known" and consistent in themselves, in terms of
    data fields, I want to build a mechanism that can:

    1. Autodetect the number of fields and "line-by-line" respectively
    build the data structure as it goes.
    2. Verify (or guess the "type" of field using regex)

    I don't mind using modules, but would prefer to use ones shipped as
    standard. Else, build my own, as I really want to start a bit of "OO",
    and this could be a good start.

    I have a felling, that creating a class, and building some methods
    that can create objects (each respective to a different set) that
    reference/manipulate the actual data structures (or something similar)
    might be a good approach. This way operations can actually be built on
    the fly? Mind you, I've not yet created a module, so this is my first
    time. Best approach, or something else, perhaps?

    Could anyone suggest some things, that I might try?

    tia


    Full Context (some rough ideas as a starting point)
    ===============================================================================
    #!/usr/bin/perl

    use strict;
    use warnings;

    use DBI;

    my $email_index;
    my $name_index;
    my $location_index;
    my $mobile_index;


    my $input_file = $ARGV[0];
    my @working_data_array;
    my $email;
    my $mobile;
    my $name;
    my $location;
    my $counter;

    my $email_regex = qr/^
    *[a-zA-Z0-9_.-]*@[a-zA-Z0-9_.-]*\.[a-zA-Z0-9_.-]*/;
    my $mobile_regex = qr/^ *[04][0-9 ]{8,12}/;
    my $name_regex = qr/^ *[a-z -]*/;
    my $location_regex = qr/^ *[a-zA-Z0-9 ]*/;

    &set_indexes;

    open ( IN_FILE, "< $input_file" ) or die "$!";

    while ( <IN_FILE> ) {
    next unless ( /@/ );
    chomp;
    @working_data_array = split( /,/ );

    $email = $working_data_array[$email_index];
    $name = $working_data_array[$name_index];
    $location = $working_data_array[$location_index];
    $mobile = $working_data_array[$mobile_index];

    print "$email";
    print "$name";
    print "$location";
    print "$mobile\n";

    }

    close IN_FILE;

    exit;

    sub set_indexes() {
    for $counter ( 0 .. $#ARGV ){
    $email_index = $counter-1 if ( $ARGV[$counter] =~ /email/ );
    $name_index = $counter-1 if ( $ARGV[$counter] =~ /name/ );
    $location_index = $counter-1 if ( $ARGV[$counter] =~ /location/ );
    $mobile_index = $counter-1 if ( $ARGV[$counter] =~ /mobile/ );
    }
    }
     
    alt.testing@{g}mail.com, May 9, 2007
    #1
    1. Advertising

  2. alt.testing@{g}mail.com <alt.testing@{g}mail.com> wrote:

    > Some of the files even vary in the number of fields therein.
    >
    > Example: (fields are email, name, postcode, phone)
    > , Firstname Lastname
    > , Firstname Lastname, 2004, 0412 321 512
    > , Firstname Lastname, 0412 321 512



    > I want to build a mechanism that can:
    >
    > 1. Autodetect the number of fields and "line-by-line" respectively
    > build the data structure as it goes.
    > 2. Verify (or guess the "type" of field using regex)



    ------------------------
    #!/usr/bin/perl
    use warnings;
    use strict;
    use Data::Dumper;

    while ( <DATA> ) {
    chomp;
    my %record;
    foreach my $part ( split /,\s*/ ) {
    if ( $part =~ /^\d+$/ ) # all digits
    { $record{postcode} = $part }
    elsif ( $part =~ /^[\d\s]+$/ ) # digits with spaces
    { $record{phone} = $part }
    elsif ( $part =~ /@/ ) # contains at-sign
    { $record{email} = $part }
    else
    { $record{name} = $part }
    }
    print Dumper \%record;
    }

    __DATA__
    , Firstname Lastname
    , Firstname Lastname, 2004, 0412 321 512
    , Firstname Lastname, 0412 321 512
    ------------------------


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, May 9, 2007
    #2
    1. Advertising

  3. On Wed, 9 May 2007 06:05:40 -0500, Tad McClellan
    <> wrote:

    >alt.testing@{g}mail.com <alt.testing@{g}mail.com> wrote:
    >
    >> Some of the files even vary in the number of fields therein.
    >>
    >> Example: (fields are email, name, postcode, phone)
    >> , Firstname Lastname
    >> , Firstname Lastname, 2004, 0412 321 512
    >> , Firstname Lastname, 0412 321 512

    >
    >
    >> I want to build a mechanism that can:
    >>
    >> 1. Autodetect the number of fields and "line-by-line" respectively
    >> build the data structure as it goes.
    >> 2. Verify (or guess the "type" of field using regex)

    >
    >
    >------------------------
    >#!/usr/bin/perl
    >use warnings;
    >use strict;
    >use Data::Dumper;
    >
    >while ( <DATA> ) {
    > chomp;
    > my %record;
    > foreach my $part ( split /,\s*/ ) {
    > if ( $part =~ /^\d+$/ ) # all digits
    > { $record{postcode} = $part }
    > elsif ( $part =~ /^[\d\s]+$/ ) # digits with spaces
    > { $record{phone} = $part }
    > elsif ( $part =~ /@/ ) # contains at-sign
    > { $record{email} = $part }
    > else
    > { $record{name} = $part }
    > }
    > print Dumper \%record;
    >}
    >
    >__DATA__
    >, Firstname Lastname
    >, Firstname Lastname, 2004, 0412 321 512
    >, Firstname Lastname, 0412 321 512
    >------------------------


    thanks Tad
     
    alt.testing@{g}mail.com, May 14, 2007
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. walala
    Replies:
    6
    Views:
    735
    Renaud Pacalet
    Sep 25, 2003
  2. Maxim
    Replies:
    0
    Views:
    426
    Maxim
    Jul 7, 2003
  3. alien2_51
    Replies:
    2
    Views:
    313
    Jeffrey Huntsman
    Aug 5, 2003
  4. Replies:
    9
    Views:
    1,029
  5. Terry L. Ridder
    Replies:
    4
    Views:
    139
    Quantum Mechanic
    Oct 14, 2003
Loading...

Share This Page