Breaking into tokens based on white space

J

j2ee

I have a file which has these 3 columns (for example)

Name Size1 Size2
+ abc_p.h 12345 432
*unknown
+ dfe_e_io.h 210989 123
+ dfx_e_io.c 210912 1290 and so on upto 500 entries.

I have to retreive Name(file names) and size1 and store it in an array

Then I have to retrieve name and size2 and store it in another array
My solution:
I checked if the each line in the file matched the file name using regular
expression. If there is match then store those filenames and size1 in array1
using substr operation.
But the problem is I hardcoded the values of starting position and
length of the string in the substr operation. So my code will work only for a
given length of string. for eg. say 20. If a name is of lenght> 20, my code
won't work.
Can you tell if there is a generic way of writing regular expression that
matches the name in my file , and then size1 and stores them in a array? Special
cases: IN the name column you may have some unwanted string like *unknown which
should be ignored.

Let me know if you need clarifications. Thanks..
 
A

A. Sinan Unur

(e-mail address removed) wrote in
I have a file which has these 3 columns (for example)

Name Size1 Size2
+ abc_p.h 12345 432
*unknown
+ dfe_e_io.h 210989 123
+ dfx_e_io.c 210912 1290 and so on upto 500 entries.

Why do you repeatedly post the same message? If you need a clarification or
you have further questions about replies to your earlier posts on this
topics, you should post those comments in the same thread.
 
P

Paul Lalli

I have a file which has these 3 columns (for example)

Name Size1 Size2
+ abc_p.h 12345 432
*unknown
+ dfe_e_io.h 210989 123
+ dfx_e_io.c 210912 1290 and so on upto 500 entries.

I have to retreive Name(file names) and size1 and store it in an array

Then I have to retrieve name and size2 and store it in another array

What do you mean by 'array' here? How are you storing both the size and
the name in the array? Are you sure you don't want hashes? More to the
point, are you sure you don't want a multi-dimensional hash for the two
sizes?
My solution:
I checked if the each line in the file matched the file name using regular
expression. If there is match then store those filenames and size1 in array1
using substr operation.

Why? Why are you parsing the line once to see if it matched, and second
time to pull it out?
But the problem is I hardcoded the values of starting position and
length of the string in the substr operation. So my code will work only for a
given length of string. for eg. say 20. If a name is of lenght> 20, my code
won't work.
Can you tell if there is a generic way of writing regular expression that
matches the name in my file , and then size1 and stores them in a array? Special
cases: IN the name column you may have some unwanted string like *unknown which
should be ignored.

You should perhaps read up on regular expressions (perldoc perlre) and
search for the section on capturing parentheses.

#!/usr/bin/perl
use strict;
use warnings;
my %files;
#UNTESTED
while (<DATA>){
if (/^\+ (\S+)\s+(\d+)\s+(\d+)\s*$/){
push @{$files{$1}}, $2, $3; #add size1 and size2 to file's array
}
}
#You never said what you wanted to do with these arrays...
print "Size 1:\n\n";
print "$_ => $files{$_}[0]\n" for keys %files;
print "\nSize 2:\n\n";
print "$_ => $files{$_}[1]\n" for keys %files;


__DATA__
Name Size1 Size2
+ abc_p.h 12345 432
*unknown
+ dfe_e_io.h 210989 123
+ dfx_e_io.c 210912 1290




Paul Lalli
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top