Extract text from a file and write to another.

Discussion in 'Perl Misc' started by Sesa Woruban, Feb 1, 2004.

  1. Sesa Woruban

    Sesa Woruban Guest

    Hiya,

    I'm very new to Perl and my brain is dead. I'm trying to create a
    simple programme that will extract the pertinent lines (only those
    with a : in them) from a plain flat text file (that represents my
    inbox) and write only those files to another text file. This is what
    I've got so far:

    use strict;
    use warnings;

    my $key;
    my %hash;
    my $infile = '/home/sesaworu/mail/sesaworuban.net/test/input'; #
    store the file
    my $outfile = '>/home/sesaworu/mail/sesaworuban.net/test/output.txt';
    open (INFILE, $infile) or die "cannot open $infile: $!"; # opens the
    file
    open (OUTFILE, $outfile) or die "cannot open $outfile: $!"; # opens
    the file

    while()
    {
    chomp;
    $key='',next if /^\s*$/;
    if(/([\w\s]+):(.*)/){
    $key=$1;
    push @{$hash{$key}},$2;
    }
    else
    {
    push @{$hash{$key}},$_ if $key;
    }
    }
    for(sort keys %hash)
    {
    print OUTFILE "$_ : ".join("\n",@{$hash{$_}})."\n";
    }

    How does that look?

    Cheers
    Sesa
     
    Sesa Woruban, Feb 1, 2004
    #1
    1. Advertising

  2. Sesa Woruban

    Ben Morrow Guest

    (Sesa Woruban) wrote:
    > use strict;
    > use warnings;


    Good.

    > my $key;
    > my %hash;


    Give this a better name, like %headers.

    > my $infile = '/home/sesaworu/mail/sesaworuban.net/test/input'; #
    > store the file


    There's no need for that comment: it is perfctly clear from the code.

    > my $outfile = '>/home/sesaworu/mail/sesaworuban.net/test/output.txt';
    > open (INFILE, $infile) or die "cannot open $infile: $!"; # opens the
    > file


    I would recommend using a lexical FH:

    open my $IN, $infile or die "cannot open $infile: $!";

    > open (OUTFILE, $outfile) or die "cannot open $outfile: $!"; # opens
    > the file
    >
    > while()
    > {


    You mean
    while (<INFILE>) {

    > chomp;
    > $key='',next if /^\s*$/;


    Use undef rather than ''... it's a better representation of 'no
    value'.

    > if(/([\w\s]+):(.*)/){


    This is a mail message, right? In which case, don't you mean something
    more like /(.+?) \s* : \s* (.*)/x ? '-' is not included in \w. Also,
    do you know that a continuation header line can (and often will) contain
    ':': it is the whitespace at the start which marks it as a
    continuation?

    If this is a mailbox, you would be better off using one of the Mail::
    modules on CPAN to parse it.

    > $key=$1;
    > push @{$hash{$key}},$2;
    > }
    > else
    > {
    > push @{$hash{$key}},$_ if $key;
    > }
    > }
    > for(sort keys %hash)
    > {
    > print OUTFILE "$_ : ".join("\n",@{$hash{$_}})."\n";
    > }


    Ben

    --
    If I were a butterfly I'd live for a day, / I would be free, just blowing away.
    This cruel country has driven me down / Teased me and lied, teased me and lied.
    I've only sad stories to tell to this town: / My dreams have withered and died.
    <=>=<=>=<=>=<=>=<=>=<=>=<=>=<=>=<=>=<=>=<=> (Kate Rusby)
     
    Ben Morrow, Feb 1, 2004
    #2
    1. Advertising

  3. Sesa Woruban wrote:
    > I'm very new to Perl and my brain is dead. I'm trying to create a
    > simple programme that will extract the pertinent lines (only those
    > with a : in them) from a plain flat text file (that represents my
    > inbox) and write only those files to another text file. This is what
    > I've got so far:
    >
    > use strict;
    > use warnings;


    Good and good!

    > my $key;
    > my %hash;
    > my $infile = '/home/sesaworu/mail/sesaworuban.net/test/input'; #
    > store the file
    > my $outfile = '>/home/sesaworu/mail/sesaworuban.net/test/output.txt';


    Personally I find this missleading. Intuitively I would assume that $outfile
    contains the file name of the output file. However it doesn't. It also
    contains some wierd additional chevron.

    > open (INFILE, $infile) or die "cannot open $infile: $!"; # opens the
    > file
    > open (OUTFILE, $outfile) or die "cannot open $outfile: $!"; # opens
    > the file


    And here I am missing the chevron, which would indicate that you are opening
    the file for writing. No big deal, it works like you coded it, but a few
    month from now it will confuse you and it will cost extra time to figure out
    why the chevron is missing here.

    > while()


    While what? I guess you meant
    while (<INFILE>)

    > {


    Spaces for indentation are cheap, use them liberaly

    > chomp;
    > $key='',next if /^\s*$/;
    > if(/([\w\s]+):(.*)/){
    > $key=$1;
    > push @{$hash{$key}},$2;
    > }
    > else
    > {
    > push @{$hash{$key}},$_ if $key;
    > }
    > }


    No idea what this loop body is supposed to do. It looks way to complicated
    to me.
    A simple single-line
    { print OUTFILE if (/:/); }
    appears to be all you would need according to your spec above. "Copy a line
    if it contains a colon sign".

    > for(sort keys %hash)
    > {
    > print OUTFILE "$_ : ".join("\n",@{$hash{$_}})."\n";
    > }


    Well, why? Your spec doesn't say anything about sorting.

    jue
     
    Jürgen Exner, Feb 1, 2004
    #3
  4. Sesa Woruban <> wrote:

    > I'm trying to create a
    > simple programme that will extract the pertinent lines (only those
    > with a : in them) from a plain flat text file (that represents my
    > inbox) and write only those files to another text file.



    perl -ne 'print if /:/' infile >outfile


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Feb 1, 2004
    #4
  5. Sesa Woruban

    Sesa Woruban Guest

    > If this is a mailbox, you would be better off using one of the Mail::
    > modules on CPAN to parse it.


    Possibly, and I went to have a look at these modules but I think the
    learning curve for those will be even higher than just getting this
    simple programme cobbled together... unless you can give me a quick
    intro?

    Cheers
    Sesa
     
    Sesa Woruban, Feb 2, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ding
    Replies:
    1
    Views:
    528
    Andrew Thompson
    Jul 1, 2004
  2. DaBeef
    Replies:
    1
    Views:
    625
    Matt Humphrey
    Jul 21, 2006
  3. Replies:
    3
    Views:
    897
    James Kanze
    Jun 20, 2007
  4. Roger Reeks
    Replies:
    1
    Views:
    106
    Jesús Gabriel y Galán
    Oct 16, 2008
  5. Mladen
    Replies:
    5
    Views:
    186
    Peter Scott
    Feb 22, 2011
Loading...

Share This Page