Newbie question: most efficient way to search fields of this file

Discussion in 'Perl Misc' started by martin, Apr 14, 2006.

  1. martin

    martin Guest

    Hi, I am quite new to perl and need to filter/process a file and print
    into another file the processed file's results. I have .csv file that
    is created by a perl script. The fomat of the file is
    'M' lines or rows, and 'N' columns. the first row and the first column
    are headers or labels.
    all the fields are comma separated and are numeric values with the
    excpetion of header and row columns shown below which are string
    values, or labels.

    Also the number of rows and columns could be as large as 1000 each.


    I need to extract for a given column, example col4hdr, the
    corresponding value for some of the rows and then print them in a
    different file.


    col1label, col2label, col3label, col4label, ..., colNlabel
    row1hdr, int1, int2, ..., ... , intN-1
    row2hdr, ... , ....,
    .... ,... , ... , ... , ...., ...
    rowMhdr ...., ..., ... , ..., ....

    for example: if I were to extract and display the results for column4
    and rows 2, 5, and 10, the output should like something like this

    col1label col4label
    row2hdr field value for 2,4
    row5hdr field value for 5,4
    row10hdr field value for 10,4

    What is the most efficient way to do this, is there a built in
    function in perl that does it,
    how can I perl script this? should I turn the file into an array line
    by line, or simply grep the file line by line for patterns maching
    rows2hdr, row5hdr, row10hdr and then count 4 fileds till I extract the
    corresponding value and store it for display.

    and also is it best to use perl or unix shell scripting for this. any
    input appreciated.

    thanks. martin
    martin, Apr 14, 2006
    #1
    1. Advertising

  2. martin

    Guest

    "martin" <> wrote:
    > Hi, I am quite new to perl and need to filter/process a file and print
    > into another file the processed file's results. I have .csv file that
    > is created by a perl script. The fomat of the file is
    > 'M' lines or rows, and 'N' columns. the first row and the first column
    > are headers or labels.
    > all the fields are comma separated and are numeric values with the
    > excpetion of header and row columns shown below which are string
    > values, or labels.
    >
    > Also the number of rows and columns could be as large as 1000 each.
    >
    > I need to extract for a given column, example col4hdr, the
    > corresponding value for some of the rows and then print them in a
    > different file.
    >
    > col1label, col2label, col3label, col4label, ..., colNlabel
    > row1hdr, int1, int2, ..., ... , intN-1
    > row2hdr, ... , ....,
    > ... ,... , ... , ... , ...., ...
    > rowMhdr ...., ..., ... , ..., ....
    >
    > for example: if I were to extract and display the results for column4
    > and rows 2, 5, and 10, the output should like something like this
    >
    > col1label col4label
    > row2hdr field value for 2,4
    > row5hdr field value for 5,4
    > row10hdr field value for 10,4


    Do you recognize row 2 because the label is row2hdr, or because it is
    2nd line after the header?

    > What is the most efficient way to do this,


    Efficient in your time? (depends on how good you are in every possible
    language). Efficient on the computer's time? Probably assembly language.


    > is there a built in
    > function in perl that does it,


    No, not directly.

    > how can I perl script this? should I turn the file into an array line
    > by line,


    No. There is no reason to slurp it (if that is what you meant), although
    it probably wouldn't hurt much for 1000 lines of 1000 numbers. And
    certainly no reason to split every line, including the ones you don't care
    about, if that is what you meant.

    > or simply grep the file line by line for patterns maching
    > rows2hdr, row5hdr, row10hdr and then count 4 fileds till I extract the
    > corresponding value and store it for display.


    Something like that, sure.

    Assuming the list of rows is in order, something like this:

    my $col=4;
    my @rowlist=(2,5,10)
    $col--; # convert to 0-started array
    foreach my $line (@rowlist) {
    defined ($_=<$fh>) or die until /^row${line}hdr,/;
    chomp;
    print "row${line}hdr,", (split/,/)[$col], "\n"
    };


    > and also is it best to use perl or unix shell scripting for this. any
    > input appreciated.


    As someone who doesn't remember how to shell script anything more
    interesting than a loop, I'd have to say that, for me at least, Perl is
    better than shell scripting for this.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
    , Apr 14, 2006
    #2
    1. Advertising

  3. Re: Newbie question: most efficient way to search fields of thisfile

    martin wrote:
    > Hi, I am quite new to perl and need to filter/process a file and print
    > into another file the processed file's results. I have .csv file that
    > is created by a perl script. The fomat of the file is
    > 'M' lines or rows, and 'N' columns. the first row and the first column
    > are headers or labels.
    > all the fields are comma separated and are numeric values with the
    > excpetion of header and row columns shown below which are string
    > values, or labels.
    >
    > Also the number of rows and columns could be as large as 1000 each.
    >
    >
    > I need to extract for a given column, example col4hdr, the
    > corresponding value for some of the rows and then print them in a
    > different file.
    >
    >
    > col1label, col2label, col3label, col4label, ..., colNlabel
    > row1hdr, int1, int2, ..., ... , intN-1
    > row2hdr, ... , ....,
    > ... ,... , ... , ... , ...., ...
    > rowMhdr ...., ..., ... , ..., ....
    >
    > for example: if I were to extract and display the results for column4
    > and rows 2, 5, and 10, the output should like something like this
    >
    > col1label col4label
    > row2hdr field value for 2,4
    > row5hdr field value for 5,4
    > row10hdr field value for 10,4
    >
    > What is the most efficient way to do this, is there a built in
    > function in perl that does it,
    > how can I perl script this? should I turn the file into an array line
    > by line, or simply grep the file line by line for patterns maching
    > rows2hdr, row5hdr, row10hdr and then count 4 fileds till I extract the
    > corresponding value and store it for display.
    >
    > and also is it best to use perl or unix shell scripting for this. any
    > input appreciated.


    I wouldn't go to shell-hell for this... :)

    A new module File::Tablular has lots of functionality for this if
    you anticipate more complexity ... or maybe DBD::CSV.

    Or, if you need only simple operations Tie::File might help in
    framing a data structure that you could expand by splitting the
    lines.

    Or, if you can't be bothered with all that... something like
    this might work:

    perl -F, -lane 'push( @{$line[$i++]}, @F );
    END{ print @{line[$_]}[0,3] for (0, 1, 4, 9) }' csv_file > out


    hth,
    --
    Charles DeRykus
    Charles DeRykus, Apr 14, 2006
    #3
  4. martin

    martin Guest

    Thanks, I should have clarified one point. with respect to your
    question

    " Do you recognize row 2 because the label is row2hdr, or because it is
    > 2nd line after the header?"


    I used 'row2hdr' simplay as a label, so number 2 there or any other
    digit is not the point. The row and column labels are strings, and for
    that matter could have been "apple", "orange", "tomato". so what needs
    to be done is to reference entries through string labels, without
    numerical characters. I don't want to use number in rows and strings
    because the position of rows could change; and prefer to index entries
    through matching row labels and column labels.

    and by efficiency I meant a combination of both, leaning towards time
    of execution. of course I was not referring to assembly though.

    tx

    martin

    > > Hi, I am quite new to perl and need to filter/process a file and print
    > > into another file the processed file's results. I have .csv file that
    > > is created by a perl script. The fomat of the file is
    > > 'M' lines or rows, and 'N' columns. the first row and the first column
    > > are headers or labels.
    > > all the fields are comma separated and are numeric values with the
    > > excpetion of header and row columns shown below which are string
    > > values, or labels.
    > >
    > > Also the number of rows and columns could be as large as 1000 each.
    > >
    > > I need to extract for a given column, example col4hdr, the
    > > corresponding value for some of the rows and then print them in a
    > > different file.
    > >
    > > col1label, col2label, col3label, col4label, ..., colNlabel
    > > row1hdr, int1, int2, ..., ... , intN-1
    > > row2hdr, ... , ....,
    > > ... ,... , ... , ... , ...., ...
    > > rowMhdr ...., ..., ... , ..., ....
    > >
    > > for example: if I were to extract and display the results for column4
    > > and rows 2, 5, and 10, the output should like something like this
    > >
    > > col1label col4label
    > > row2hdr field value for 2,4
    > > row5hdr field value for 5,4
    > > row10hdr field value for 10,4

    >
    > Do you recognize row 2 because the label is row2hdr, or because it is
    > 2nd line after the header?
    >
    > > Hi, I am quite new to perl and need to filter/process a file and print
    > > into another file the processed file's results. I have .csv file that
    > > is created by a perl script. The fomat of the file is
    > > 'M' lines or rows, and 'N' columns. the first row and the first column
    > > are headers or labels.
    > > all the fields are comma separated and are numeric values with the
    > > excpetion of header and row columns shown below which are string
    > > values, or labels.
    > >
    > > Also the number of rows and columns could be as large as 1000 each.
    > >
    > > I need to extract for a given column, example col4hdr, the
    > > corresponding value for some of the rows and then print them in a
    > > different file.
    > >
    > > col1label, col2label, col3label, col4label, ..., colNlabel
    > > row1hdr, int1, int2, ..., ... , intN-1
    > > row2hdr, ... , ....,
    > > ... ,... , ... , ... , ...., ...
    > > rowMhdr ...., ..., ... , ..., ....
    > >
    > > for example: if I were to extract and display the results for column4
    > > and rows 2, 5, and 10, the output should like something like this
    > >
    > > col1label col4label
    > > row2hdr field value for 2,4
    > > row5hdr field value for 5,4
    > > row10hdr field value for 10,4

    >
    > Do you recognize row 2 because the label is row2hdr, or because it is
    > 2nd line after the header?
    >

    wrote:
    > "martin" <> wrote:


    > > Hi, I am quite new to perl and need to filter/process a file and print
    > > into another file the processed file's results. I have .csv file that
    > > is created by a perl script. The fomat of the file is
    > > 'M' lines or rows, and 'N' columns. the first row and the first column
    > > are headers or labels.
    > > all the fields are comma separated and are numeric values with the
    > > excpetion of header and row columns shown below which are string
    > > values, or labels.
    > >
    > > Also the number of rows and columns could be as large as 1000 each.
    > >
    > > I need to extract for a given column, example col4hdr, the
    > > corresponding value for some of the rows and then print them in a
    > > different file.
    > >
    > > col1label, col2label, col3label, col4label, ..., colNlabel
    > > row1hdr, int1, int2, ..., ... , intN-1
    > > row2hdr, ... , ....,
    > > ... ,... , ... , ... , ...., ...
    > > rowMhdr ...., ..., ... , ..., ....
    > >
    > > for example: if I were to extract and display the results for column4
    > > and rows 2, 5, and 10, the output should like something like this
    > >
    > > col1label col4label
    > > row2hdr field value for 2,4
    > > row5hdr field value for 5,4
    > > row10hdr field value for 10,4

    >
    > Do you recognize row 2 because the label is row2hdr, or because it is
    > 2nd line after the header?
    >
    > > What is the most efficient way to do this,

    >
    > Efficient in your time? (depends on how good you are in every possible
    > language). Efficient on the computer's time? Probably assembly language.
    >
    >
    > > is there a built in
    > > function in perl that does it,

    >
    > No, not directly.
    >
    > > how can I perl script this? should I turn the file into an array line
    > > by line,

    >
    > No. There is no reason to slurp it (if that is what you meant), although
    > it probably wouldn't hurt much for 1000 lines of 1000 numbers. And
    > certainly no reason to split every line, including the ones you don't care
    > about, if that is what you meant.
    >
    > > or simply grep the file line by line for patterns maching
    > > rows2hdr, row5hdr, row10hdr and then count 4 fileds till I extract the
    > > corresponding value and store it for display.

    >
    > Something like that, sure.
    >
    > Assuming the list of rows is in order, something like this:
    >
    > my $col=4;
    > my @rowlist=(2,5,10)
    > $col--; # convert to 0-started array
    > foreach my $line (@rowlist) {
    > defined ($_=<$fh>) or die until /^row${line}hdr,/;
    > chomp;
    > print "row${line}hdr,", (split/,/)[$col], "\n"
    > };
    >
    >
    > > and also is it best to use perl or unix shell scripting for this. any
    > > input appreciated.

    >
    > As someone who doesn't remember how to shell script anything more
    > interesting than a loop, I'd have to say that, for me at least, Perl is
    > better than shell scripting for this.
    >
    > Xho
    >
    > --
    > -------------------- http://NewsReader.Com/ --------------------
    > Usenet Newsgroup Service $9.95/Month 30GB
    martin, Apr 14, 2006
    #4
  5. Re: Newbie question: most efficient way to search fields of thisfile

    martin wrote:
    > Hi, I am quite new to perl and need to filter/process a file and print
    > into another file the processed file's results.


    <snip>

    > What is the most efficient way to do this,


    Which ways have you considered? You can always use the Benchmark module
    to compare them...

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Apr 14, 2006
    #5
  6. martin

    Xicheng Jia Guest

    martin wrote:
    > Hi, I am quite new to perl and need to filter/process a file and print
    > into another file the processed file's results. I have .csv file that
    > is created by a perl script. The fomat of the file is
    > 'M' lines or rows, and 'N' columns. the first row and the first column
    > are headers or labels.
    > all the fields are comma separated and are numeric values with the
    > excpetion of header and row columns shown below which are string
    > values, or labels.
    >
    > Also the number of rows and columns could be as large as 1000 each.
    >
    >
    > I need to extract for a given column, example col4hdr, the
    > corresponding value for some of the rows and then print them in a
    > different file.
    >
    >
    > col1label, col2label, col3label, col4label, ..., colNlabel
    > row1hdr, int1, int2, ..., ... , intN-1
    > row2hdr, ... , ....,
    > ... ,... , ... , ... , ...., ...
    > rowMhdr ...., ..., ... , ..., ....
    >
    > for example: if I were to extract and display the results for column4
    > and rows 2, 5, and 10, the output should like something like this
    >
    > col1label col4label
    > row2hdr field value for 2,4
    > row5hdr field value for 5,4
    > row10hdr field value for 10,4
    >
    > What is the most efficient way to do this, is there a built in
    > function in perl that does it,
    > how can I perl script this? should I turn the file into an array line
    > by line, or simply grep the file line by line for patterns maching
    > rows2hdr, row5hdr, row10hdr and then count 4 fileds till I extract the
    > corresponding value and store it for display.


    == and also is it best to use perl or unix shell scripting for this.
    any
    == input appreciated.>

    You should handle this by line-mode and it can be easily done on the
    command line. I dont think perl will be better than awk though:

    perl -F, -anle 'print join"\t", @F[0,3] if $. =~ /^(1|2|5|10)$/'
    myfile.csv

    awk -F, 'NR ~ /^(1|2|5|10)$/ {print $1,$3}' myfile.csv

    Xicheng
    Xicheng Jia, Apr 14, 2006
    #6
  7. martin

    Xicheng Jia Guest

    Xicheng Jia wrote:
    > martin wrote:
    > > Hi, I am quite new to perl and need to filter/process a file and print
    > > into another file the processed file's results. I have .csv file that
    > > is created by a perl script. The fomat of the file is
    > > 'M' lines or rows, and 'N' columns. the first row and the first column
    > > are headers or labels.
    > > all the fields are comma separated and are numeric values with the
    > > excpetion of header and row columns shown below which are string
    > > values, or labels.
    > >
    > > Also the number of rows and columns could be as large as 1000 each.
    > >
    > >
    > > I need to extract for a given column, example col4hdr, the
    > > corresponding value for some of the rows and then print them in a
    > > different file.
    > >
    > >
    > > col1label, col2label, col3label, col4label, ..., colNlabel
    > > row1hdr, int1, int2, ..., ... , intN-1
    > > row2hdr, ... , ....,
    > > ... ,... , ... , ... , ...., ...
    > > rowMhdr ...., ..., ... , ..., ....
    > >
    > > for example: if I were to extract and display the results for column4
    > > and rows 2, 5, and 10, the output should like something like this
    > >
    > > col1label col4label
    > > row2hdr field value for 2,4
    > > row5hdr field value for 5,4
    > > row10hdr field value for 10,4
    > >
    > > What is the most efficient way to do this, is there a built in
    > > function in perl that does it,
    > > how can I perl script this? should I turn the file into an array line
    > > by line, or simply grep the file line by line for patterns maching
    > > rows2hdr, row5hdr, row10hdr and then count 4 fileds till I extract the
    > > corresponding value and store it for display.

    >
    > == and also is it best to use perl or unix shell scripting for this.
    > any
    > == input appreciated.>
    >
    > You should handle this by line-mode and it can be easily done on the
    > command line. I dont think perl will be better than awk though:
    >
    > perl -F, -anle 'print join"\t", @F[0,3] if $. =~ /^(1|2|5|10)$/'
    > myfile.csv
    >

    = awk -F, 'NR ~ /^(1|2|5|10)$/ {print $1,$3}' myfile.csv

    ~~change to $4

    In fact, if regex is a consideration factor, 'awk' may be better than
    'perl', since awk is using DFA regex engine which is faster than perl's
    NFA.

    Xicheng
    Xicheng Jia, Apr 14, 2006
    #7
  8. martin

    Anno Siegel Guest

    martin <> wrote in comp.lang.perl.misc:
    > Thanks, I should have clarified one point. with respect to your
    > question
    >
    > " Do you recognize row 2 because the label is row2hdr, or because it is
    > > 2nd line after the header?"

    >
    > I used 'row2hdr' simplay as a label, so number 2 there or any other
    > digit is not the point. The row and column labels are strings, and for
    > that matter could have been "apple", "orange", "tomato". so what needs
    > to be done is to reference entries through string labels, without
    > numerical characters. I don't want to use number in rows and strings
    > because the position of rows could change; and prefer to index entries
    > through matching row labels and column labels.
    >
    > and by efficiency I meant a combination of both, leaning towards time
    > of execution. of course I was not referring to assembly though.


    You have essentially two problems then (which is good, it's called
    problem separation). One is to select the lines to process, the other
    is to select from those lines the columns requested.

    Assume the requirements in two variables, for instance:

    my @lines = qw( row2hdr rowMhdr);
    my @cols = qw( col1label colNlabel col2label);

    We want to select all lines that begin with one of the given headers.
    That is a job for a regular expression, so let's compile one:

    my $line_re = do {
    my $re = join '|', sort { length( $b) <=> length( $a) } @lines;
    qr/^$re/;
    };

    We sort long alternatives ahead of short ones in case one is a prefix
    of another. Now $line_re will match the lines that interest us and no
    others.

    Column selection cannot be done using strings alone, because the header
    strings are nowhere around when we look at a data line. Instead, we
    translate the set of column headers into a set of integer indices that
    point to the columns we want. For the translation we use the first line
    of the data file, containing the actual headers for this set of data:

    my @sel = do {
    chomp( my $h_line = <DATA>);
    my @headers = split /\s*,\s*/, $h_line;
    my %col_of_header = map { shift( @headers) => $_ } 0 .. $#headers;
    my @bad = grep !defined $col_of_header{ $_}, @cols;
    die "bad header(s): @bad" if @bad;
    @col_of_header{ @cols};
    };

    Now we're ready to print the header line and the selected data:

    print join( ', ', @cols), "\n";
    while ( <DATA> ) {
    next unless /$line_re/;
    my @recs = split /\s*,\s*/;
    print join( ', ', @recs[ @sel]), "\n";
    }

    __DATA__
    col1label, col2label, col3label, col4label, ..., colNlabel
    row1hdr, int1, int2, ..., ... , intN-1
    row2hdr, ... , ...., xxxx, ...., yyyy, zzzz
    ... ,... , ... , ... , ...., ...
    rowMhdr, ...., ..., ... , ..., ...., corner

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
    Anno Siegel, Apr 14, 2006
    #8
  9. martin

    martin Guest

    Hi everyone, thanks for all the valuable input. I need to read a bit to
    digest. The last reply brought another question, that I will read a bit
    on before doing a posting. Thanks again. Martin
    martin, Apr 15, 2006
    #9
  10. martin

    Xicheng Jia Guest

    martin wrote:
    > Thanks, I should have clarified one point. with respect to your
    > question
    >
    > " Do you recognize row 2 because the label is row2hdr, or because it is
    > > 2nd line after the header?"

    >

    => I used 'row2hdr' simplay as a label, so number 2 there or any other
    => digit is not the point. The row and column labels are strings, and
    for
    => that matter could have been "apple", "orange", "tomato". so what
    needs
    => to be done is to reference entries through string labels, without
    => numerical characters. I don't want to use number in rows and strings
    => because the position of rows could change; and prefer to index
    entries
    => through matching row labels and column labels.

    this kind of text searching can be done in Perl like:

    perl -ne ' @x = split/\s*,\s*/;
    print join ",", @x[0, 3] if $.==1 || /^(row1hdr|row2hdr)/'
    myfile.csv

    or in awk:

    awk -F' *, *' 'NR==1 || $1 ~ /^(row1hdr|row2hdr)/ {print $1,$4}'
    myfile.csv

    ( awk's -F option on the command-line is more powerful than perl's )

    Xicheng

    > and by efficiency I meant a combination of both, leaning towards time
    > of execution. of course I was not referring to assembly though.
    >
    > tx
    >
    > martin
    >
    > > > Hi, I am quite new to perl and need to filter/process a file and print
    > > > into another file the processed file's results. I have .csv file that
    > > > is created by a perl script. The fomat of the file is
    > > > 'M' lines or rows, and 'N' columns. the first row and the first column
    > > > are headers or labels.
    > > > all the fields are comma separated and are numeric values with the
    > > > excpetion of header and row columns shown below which are string
    > > > values, or labels.
    > > >
    > > > Also the number of rows and columns could be as large as 1000 each.
    > > >
    > > > I need to extract for a given column, example col4hdr, the
    > > > corresponding value for some of the rows and then print them in a
    > > > different file.
    > > >
    > > > col1label, col2label, col3label, col4label, ..., colNlabel
    > > > row1hdr, int1, int2, ..., ... , intN-1
    > > > row2hdr, ... , ....,
    > > > ... ,... , ... , ... , ...., ...
    > > > rowMhdr ...., ..., ... , ..., ....
    > > >
    > > > for example: if I were to extract and display the results for column4
    > > > and rows 2, 5, and 10, the output should like something like this
    > > >
    > > > col1label col4label
    > > > row2hdr field value for 2,4
    > > > row5hdr field value for 5,4
    > > > row10hdr field value for 10,4

    > >
    > > Do you recognize row 2 because the label is row2hdr, or because it is
    > > 2nd line after the header?
    > >
    > > > Hi, I am quite new to perl and need to filter/process a file and print
    > > > into another file the processed file's results. I have .csv file that
    > > > is created by a perl script. The fomat of the file is
    > > > 'M' lines or rows, and 'N' columns. the first row and the first column
    > > > are headers or labels.
    > > > all the fields are comma separated and are numeric values with the
    > > > excpetion of header and row columns shown below which are string
    > > > values, or labels.
    > > >
    > > > Also the number of rows and columns could be as large as 1000 each.
    > > >
    > > > I need to extract for a given column, example col4hdr, the
    > > > corresponding value for some of the rows and then print them in a
    > > > different file.
    > > >
    > > > col1label, col2label, col3label, col4label, ..., colNlabel
    > > > row1hdr, int1, int2, ..., ... , intN-1
    > > > row2hdr, ... , ....,
    > > > ... ,... , ... , ... , ...., ...
    > > > rowMhdr ...., ..., ... , ..., ....
    > > >
    > > > for example: if I were to extract and display the results for column4
    > > > and rows 2, 5, and 10, the output should like something like this
    > > >
    > > > col1label col4label
    > > > row2hdr field value for 2,4
    > > > row5hdr field value for 5,4
    > > > row10hdr field value for 10,4

    > >
    > > Do you recognize row 2 because the label is row2hdr, or because it is
    > > 2nd line after the header?
    > >

    > wrote:
    > > "martin" <> wrote:

    >
    > > > Hi, I am quite new to perl and need to filter/process a file and print
    > > > into another file the processed file's results. I have .csv file that
    > > > is created by a perl script. The fomat of the file is
    > > > 'M' lines or rows, and 'N' columns. the first row and the first column
    > > > are headers or labels.
    > > > all the fields are comma separated and are numeric values with the
    > > > excpetion of header and row columns shown below which are string
    > > > values, or labels.
    > > >
    > > > Also the number of rows and columns could be as large as 1000 each.
    > > >
    > > > I need to extract for a given column, example col4hdr, the
    > > > corresponding value for some of the rows and then print them in a
    > > > different file.
    > > >
    > > > col1label, col2label, col3label, col4label, ..., colNlabel
    > > > row1hdr, int1, int2, ..., ... , intN-1
    > > > row2hdr, ... , ....,
    > > > ... ,... , ... , ... , ...., ...
    > > > rowMhdr ...., ..., ... , ..., ....
    > > >
    > > > for example: if I were to extract and display the results for column4
    > > > and rows 2, 5, and 10, the output should like something like this
    > > >
    > > > col1label col4label
    > > > row2hdr field value for 2,4
    > > > row5hdr field value for 5,4
    > > > row10hdr field value for 10,4

    > >
    > > Do you recognize row 2 because the label is row2hdr, or because it is
    > > 2nd line after the header?
    > >
    > > > What is the most efficient way to do this,

    > >
    > > Efficient in your time? (depends on how good you are in every possible
    > > language). Efficient on the computer's time? Probably assembly language.
    > >
    > >
    > > > is there a built in
    > > > function in perl that does it,

    > >
    > > No, not directly.
    > >
    > > > how can I perl script this? should I turn the file into an array line
    > > > by line,

    > >
    > > No. There is no reason to slurp it (if that is what you meant), although
    > > it probably wouldn't hurt much for 1000 lines of 1000 numbers. And
    > > certainly no reason to split every line, including the ones you don't care
    > > about, if that is what you meant.
    > >
    > > > or simply grep the file line by line for patterns maching
    > > > rows2hdr, row5hdr, row10hdr and then count 4 fileds till I extract the
    > > > corresponding value and store it for display.

    > >
    > > Something like that, sure.
    > >
    > > Assuming the list of rows is in order, something like this:
    > >
    > > my $col=4;
    > > my @rowlist=(2,5,10)
    > > $col--; # convert to 0-started array
    > > foreach my $line (@rowlist) {
    > > defined ($_=<$fh>) or die until /^row${line}hdr,/;
    > > chomp;
    > > print "row${line}hdr,", (split/,/)[$col], "\n"
    > > };
    > >
    > >
    > > > and also is it best to use perl or unix shell scripting for this. any
    > > > input appreciated.

    > >
    > > As someone who doesn't remember how to shell script anything more
    > > interesting than a loop, I'd have to say that, for me at least, Perl is
    > > better than shell scripting for this.
    > >
    > > Xho
    > >
    > > --
    > > -------------------- http://NewsReader.Com/ --------------------
    > > Usenet Newsgroup Service $9.95/Month 30GB
    Xicheng Jia, Apr 15, 2006
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Brent Minder
    Replies:
    3
    Views:
    389
    Brent
    Dec 28, 2003
  2. Peter
    Replies:
    1
    Views:
    362
    Steve C. Orr [MVP, MCSD]
    Nov 9, 2004
  3. Linus Nikander
    Replies:
    5
    Views:
    527
  4. Razvan
    Replies:
    11
    Views:
    525
    Dale King
    Oct 12, 2004
  5. Arash Nikkar
    Replies:
    8
    Views:
    564
    Arash Nikkar
    Nov 27, 2006
Loading...

Share This Page