File parsing

S

Sree

Hi all,

I have a file in the below format.

## COMP1: MAIN: SUB1: OK
## COMP1: MAIN: SUB2: N/A
## COMP2: MAIN: SUB1: OK

and I have to print in the output file as below.

1. COMP1:
1.1 MAIN: SUB1: OK
1.2 MAIN: SUB2: N/A
2. COMP2
2.1 MAIN: SUB1: OK


I am learning perl,please help me out in this.

regards
Sree
 
J

jesper

Sree said:
Hi all,

I have a file in the below format.

## COMP1: MAIN: SUB1: OK
## COMP1: MAIN: SUB2: N/A
## COMP2: MAIN: SUB1: OK

and I have to print in the output file as below.

1. COMP1:
1.1 MAIN: SUB1: OK
1.2 MAIN: SUB2: N/A
2. COMP2
2.1 MAIN: SUB1: OK


I am learning perl,please help me out in this.

regards
Sree
open(FILE,"/yourpath");
while ($line = readline(FILE)) {
@parse = split($line,'\s');
print "1. $parse[1]:\n";
print "1.1 $parse[2]:\t"; #\t for <TAB>
etc,
}
close(FILE);
 
M

Martin Kissner

Sree wrote :
Hi all,

I have a file in the below format.

## COMP1: MAIN: SUB1: OK
## COMP1: MAIN: SUB2: N/A
## COMP2: MAIN: SUB1: OK

and I have to print in the output file as below.

1. COMP1:
1.1 MAIN: SUB1: OK
1.2 MAIN: SUB2: N/A
2. COMP2
2.1 MAIN: SUB1: OK


I am learning perl,please help me out in this.

regards
Sree

Since i am learning Perl, too, this is a good practice for me:
I am sure this can get optimized.
I'd appreciate any suggestions by the regulars

#!/bin/perl

use warnings;
use strict;

my $file="file";
my ($line, @list);
my $top ="";
my ($digit1, $digit2) = 0;

open (FILE, $file) or die "Can not open $file: $!";
while ( $line = <FILE>)
{
$digit2++;
$line =~s/##//;
@list = split (/ / , $line);
unless ( $top eq $list[1])
{
$digit1++;
print "$digit1. $list[1]\n";
}
$top = $list[1];
print "\t$digit1.$digit2. $list[2]\t$list[3]\t$list[4]";
}
 
A

Arndt Jonasson

Martin Kissner said:
Sree wrote :
Hi all,

I have a file in the below format.

## COMP1: MAIN: SUB1: OK
## COMP1: MAIN: SUB2: N/A
## COMP2: MAIN: SUB1: OK

and I have to print in the output file as below.

1. COMP1:
1.1 MAIN: SUB1: OK
1.2 MAIN: SUB2: N/A
2. COMP2
2.1 MAIN: SUB1: OK

Since i am learning Perl, too, this is a good practice for me:
I am sure this can get optimized.
I'd appreciate any suggestions by the regulars

#!/bin/perl

use warnings;
use strict;

my $file="file";
my ($line, @list);
my $top ="";
my ($digit1, $digit2) = 0;

open (FILE, $file) or die "Can not open $file: $!";
while ( $line = <FILE>)
{
$digit2++;
$line =~s/##//;
@list = split (/ / , $line);
unless ( $top eq $list[1])
{
$digit1++;
print "$digit1. $list[1]\n";
}
$top = $list[1];
print "\t$digit1.$digit2. $list[2]\t$list[3]\t$list[4]";

I suggest four changes:

1) Read from "<>" and remove the $file variable. Then the program can
get its input specified on the command line, either as files or STDIN.

2) Remove the $line variable and use the implicit behaviour of $_.

3) Remove the second period from
print "\t$digit1.$digit2. $list[2]\t$list[3]\t$list[4]";

4) Reset $digit2 to 1 at some appropriate point.

Changes 3 and 4 are needed to make the output conform to what the OP
wanted.

This line:
my ($digit1, $digit2) = 0;
doesn't set both variables to 0, which maybe you thought it does. It
sets only $digit1. Apparently $digit2++ does not give a warning when
$digit2 is undef, although $digit2=$digit2+1 does.
 
M

Martin Kissner

Arndt Jonasson wrote :
I suggest four changes:

1) Read from "<>" and remove the $file variable. Then the program can
get its input specified on the command line, either as files or STDIN.
Good practice for me.
I give filenames on the command line, but how can I read fom STDIN ?

Where do I find anything about "<>" in the docs.
2) Remove the $line variable and use the implicit behaviour of $_.
Okay, this works well.
3) Remove the second period from
print "\t$digit1.$digit2. $list[2]\t$list[3]\t$list[4]";
Didn't read closely enough :)
4) Reset $digit2 to 1 at some appropriate point.
Of course - I have overseen this.
Changes 3 and 4 are needed to make the output conform to what the OP
wanted.

This line:
my ($digit1, $digit2) = 0;
doesn't set both variables to 0, which maybe you thought it does.
Thanks for the hint; I know how to do it right, now though it's not
needed any more.

Here's my code:

#!/bin/perl

use warnings;
use strict;

my ($line, $digit2 ,@list);
my $top ="";
my $digit1 = 0;

open (FILE, <>) or die "Can not open : $!\n";
while (<FILE>)
{
$digit2++;
$_ =~s/##//;
@list = split (/ / );
unless ( $top eq $list[1])
{
$digit1++;
$digit2=1;
print "$digit1. $list[1]\n";
}
$top = $list[1];
print "\t$digit1.$digit2 $list[2]\t$list[3]\t$list[4]";
}
 
T

Tad McClellan

jesper said:
open(FILE,"/yourpath");


You should always, yes *always*, check the return value from ope():

open(FILE, '/yourpath') or die "could not open '/yourpath' $!";

@parse = split($line,'\s');


Have you read the documentation for the split()?

It doesn't look like you have...
 
A

Arndt Jonasson

Martin Kissner said:
[...]

I give filenames on the command line, but how can I read fom STDIN ?

Where do I find anything about "<>" in the docs.
perldoc -q "<>", perldoc -q "Diamond-Operator" and others didn't give me
the desired output.

[...]

Here's my code:

open (FILE, <>) or die "Can not open : $!\n";
while (<FILE>)

But it doesn't work, does it? I meant just read from <>, like this:
while (<>)

The "diamond" operator <> is described in perlop and perlopentut.
You read from STDIN by using the normal Unix syntax:

./myperlscript.pl < file
 
T

Tad McClellan

Martin Kissner said:
Arndt Jonasson wrote :
I give filenames on the command line, but how can I read fom STDIN ?


If you want <> to read from STDIN after all of the command line files,
then supply a final argument of '-'.

If you want to read from STDIN independent of what is in @ARGV,
Where do I find anything about "<>" in the docs.


The "I/O Operators" section in perlop.pod:

Here's my code:

open (FILE, <>) or die "Can not open : $!\n";


The diamond operator does *input*.

You are reading the 2nd argument from a file (or from STDIN).

Do you mean to be reading the file name from the _contents_ of
some file named on the command line? Does the name of the file
end with a newline?

That is what that code does.


It is unclear to me what you _want_ to do.

Go through all the lines from all the files named on the command line?

use the diamond operator and _no_ open()
the diamond operator handles open()ing the files for you

while ( <> ) {


Go through all the lines from all the files named on the command line
but with an explicit open()?

foreach my $fname ( @ARGV ) {
open FILE, $fname or die...
while ( <FILE> ) {


Go through all the lines from a single file?

open FILE, 'somefile' or die ...
 
A

Arndt Jonasson

Tad McClellan said:
It is unclear to me what you _want_ to do.

The original poster (who is not Martin) only said this:
"I have a file in the below format."
so it's not well defined where the input comes from.
 
M

Martin Kissner

Arndt Jonasson wrote :
Martin Kissner said:
[...]

I give filenames on the command line, but how can I read fom STDIN ?

Where do I find anything about "<>" in the docs.
perldoc -q "<>", perldoc -q "Diamond-Operator" and others didn't give me
the desired output.

[...]

Here's my code:

open (FILE, <>) or die "Can not open : $!\n";
while (<FILE>)

But it doesn't work, does it? I meant just read from <>, like this:

It does.
But now I see, that I got you wrong.
I started the script withthout "< file" ans then gave the filename on
the command line. I allready was asking myself what would be the sense
of that.
I have learned something by that anyways.
while (<>)
I tried this before.
It gave me errors and a strange output.
Calling the script with "< file" of course makes this work, too.

Thank's for the advices.
 
M

Martin Kissner

Tad McClellan wrote :
The diamond operator does *input*.

You are reading the 2nd argument from a file (or from STDIN).

Do you mean to be reading the file name from the _contents_ of
some file named on the command line? Does the name of the file
end with a newline?

That is what that code does.


It is unclear to me what you _want_ to do.
I used the question from the OP for my own practise.
Things became a little clearer to me by checking out the different
oportunities.

Thank you for your feedback.
 
J

John W. Krahn

Sree said:
I have a file in the below format.

## COMP1: MAIN: SUB1: OK
## COMP1: MAIN: SUB2: N/A
## COMP2: MAIN: SUB1: OK

and I have to print in the output file as below.

1. COMP1:
1.1 MAIN: SUB1: OK
1.2 MAIN: SUB2: N/A
2. COMP2
2.1 MAIN: SUB1: OK


I am learning perl,please help me out in this.

regards
Sree


This appears to be close to what you want:

#!/usr/bin/perl
use warnings;
use strict;


my %seen;

while ( <DATA> ) {
next unless /^##/;
my @nums = /\d+/g;
my ( undef, $first, @fields ) = split;

unless ( $seen{ $first }++ ) {
print "$nums[0]. $first\n";
%seen = ( $first => 1 );
}

print join( "\t", '', join( '.', @nums ), @fields ), "\n";
}


__DATA__
## COMP1: MAIN: SUB1: OK
## COMP1: MAIN: SUB2: N/A
## COMP2: MAIN: SUB1: OK



John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top