Breaking into tokens based on white space

Discussion in 'Perl Misc' started by j2ee@att.net, Jul 15, 2004.

  1. Guest

    I have a file which has these 3 columns (for example)

    Name Size1 Size2
    + abc_p.h 12345 432
    *unknown
    + dfe_e_io.h 210989 123
    + dfx_e_io.c 210912 1290 and so on upto 500 entries.

    I have to retreive Name(file names) and size1 and store it in an array

    Then I have to retrieve name and size2 and store it in another array
    My solution:
    I checked if the each line in the file matched the file name using regular
    expression. If there is match then store those filenames and size1 in array1
    using substr operation.
    But the problem is I hardcoded the values of starting position and
    length of the string in the substr operation. So my code will work only for a
    given length of string. for eg. say 20. If a name is of lenght> 20, my code
    won't work.
    Can you tell if there is a generic way of writing regular expression that
    matches the name in my file , and then size1 and stores them in a array? Special
    cases: IN the name column you may have some unwanted string like *unknown which
    should be ignored.

    Let me know if you need clarifications. Thanks..
     
    , Jul 15, 2004
    #1
    1. Advertising

  2. wrote in
    news::

    > I have a file which has these 3 columns (for example)
    >
    > Name Size1 Size2
    > + abc_p.h 12345 432
    > *unknown
    > + dfe_e_io.h 210989 123
    > + dfx_e_io.c 210912 1290 and so on upto 500 entries.


    Why do you repeatedly post the same message? If you need a clarification or
    you have further questions about replies to your earlier posts on this
    topics, you should post those comments in the same thread.

    --
    A. Sinan Unur
    d
    (remove '.invalid' and reverse each component for email address)
     
    A. Sinan Unur, Jul 15, 2004
    #2
    1. Advertising

  3. Paul Lalli Guest

    On Thu, 15 Jul 2004 wrote:

    > I have a file which has these 3 columns (for example)
    >
    > Name Size1 Size2
    > + abc_p.h 12345 432
    > *unknown
    > + dfe_e_io.h 210989 123
    > + dfx_e_io.c 210912 1290 and so on upto 500 entries.
    >
    > I have to retreive Name(file names) and size1 and store it in an array
    >
    > Then I have to retrieve name and size2 and store it in another array


    What do you mean by 'array' here? How are you storing both the size and
    the name in the array? Are you sure you don't want hashes? More to the
    point, are you sure you don't want a multi-dimensional hash for the two
    sizes?

    > My solution:
    > I checked if the each line in the file matched the file name using regular
    > expression. If there is match then store those filenames and size1 in array1
    > using substr operation.


    Why? Why are you parsing the line once to see if it matched, and second
    time to pull it out?

    > But the problem is I hardcoded the values of starting position and
    > length of the string in the substr operation. So my code will work only for a
    > given length of string. for eg. say 20. If a name is of lenght> 20, my code
    > won't work.
    > Can you tell if there is a generic way of writing regular expression that
    > matches the name in my file , and then size1 and stores them in a array? Special
    > cases: IN the name column you may have some unwanted string like *unknown which
    > should be ignored.


    You should perhaps read up on regular expressions (perldoc perlre) and
    search for the section on capturing parentheses.

    #!/usr/bin/perl
    use strict;
    use warnings;
    my %files;
    #UNTESTED
    while (<DATA>){
    if (/^\+ (\S+)\s+(\d+)\s+(\d+)\s*$/){
    push @{$files{$1}}, $2, $3; #add size1 and size2 to file's array
    }
    }
    #You never said what you wanted to do with these arrays...
    print "Size 1:\n\n";
    print "$_ => $files{$_}[0]\n" for keys %files;
    print "\nSize 2:\n\n";
    print "$_ => $files{$_}[1]\n" for keys %files;


    __DATA__
    Name Size1 Size2
    + abc_p.h 12345 432
    *unknown
    + dfe_e_io.h 210989 123
    + dfx_e_io.c 210912 1290




    Paul Lalli
     
    Paul Lalli, Jul 15, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. neverstill
    Replies:
    1
    Views:
    538
    neverstill
    Dec 5, 2003
  2. Greg N.
    Replies:
    8
    Views:
    553
    Neredbojias
    Jul 28, 2005
  3. Shuo Xiang

    Stack space, global space, heap space

    Shuo Xiang, Jul 9, 2003, in forum: C Programming
    Replies:
    10
    Views:
    2,915
    Bryan Bullard
    Jul 11, 2003
  4. Ben C
    Replies:
    6
    Views:
    2,168
    Leif K-Brooks
    Jan 28, 2007
  5. Hal Fulton

    Breaking Ruby code into tokens

    Hal Fulton, Oct 4, 2003, in forum: Ruby
    Replies:
    2
    Views:
    128
    Hal Fulton
    Oct 5, 2003
Loading...

Share This Page