Could someone give me suggestions with my code?

Discussion in 'Perl Misc' started by Jonathan Clark, Jul 12, 2008.

  1. Hello, all.

    I'm re-learning perl (again), and have put myself into a while loop which
    i've just broken out of ( while $problemSolved != 1 {thinkOfSolution
    ();trySolution();} ), I'd like to post my code, complete with example
    data, for you to look over, and give some suggestions on how to improve
    my style.

    Most of the hard stuff was figuring out the regexps I needed for this
    task. It's essentially a two-rexexp solution at the moment, but was
    wondering if it could be done.

    I've created this with the help of perlfaq, other perldocs, Beginning
    Perl, and reading this newsgroup. I did some copy-pasting from the
    perldoc IIRC, for the 'framework' of the program.

    Oh, yes, the purpose of this program is to strip headers from newsgroup
    messages so that I can feed the files to my megahal bot later :)

    ------- headerstrip.pl --------

    #!/usr/bin/perl
    use warnings;
    use strict;
    undef $/; #read that this allows the whole file to be read in, rather
    than as lines or paragraphs.

    my $file = "@ARGV"; #Get the filename passed from the command line


    open my $in, '<', $file or die "Can't read old file: $!";
    open my $out, '>', "$file.new" or die "Can't write new file: $!";
    #Standard file openings, opens the original for reading, and the new file
    #(with .new extension added) to be changed...


    while( <$in> )
    {
    s/^Path(.*?)^Xref//gsm; #clears out most of the headers,
    from the Path line to the end of the word Xref, where Xref starts on it's
    own line
    s/^: news.*\n//; #removes the rest of that Xref line
    print $out $_; #writes file
    }

    close $out; #And, close the file handle
    --------- EOF ------------

    \

    Now, I'd like to be able to pass multiple files to it, and I can tell
    from my limited knowledge of perl that a) it isn't gonna work yet, and b)
    I'll probably have to stick my $file [...] close $out; into a while loop,
    most likely with @ARGV (or a 'copy' of it) as the iterator... Or would
    that be a for loop? hmm... :)

    I'll leave you Perl gurus to pointing out my mistakes, and await your
    suggestions :)

    --
    Clarjon1


    Bingo, gas station, hamburger with a side order of airplane noise,
    and you'll be Gary, Indiana. - Jessie in the movie "Greaser's Palace"
     
    Jonathan Clark, Jul 12, 2008
    #1
    1. Advertisements

  2. So far, so good. :D
    Slurping is okay for small files but doesn't scale. It should be fine
    for newsgroup postings.
    If the user specifies multiple arguments this will concatenate them. You
    only want one argument.

    my $file = shift @ARGV;

    You might want to throw an error if the user doesn't specify a file at
    all. For example:

    Good x 3. You're using three-arg open(), checking its return value, and
    including the error in any message.

    Since you're slurping the file you don't need a loop at all.

    I suspect the /g isn't necessary there as it appears you're trying to
    remove all the headers in a single pass.

    Since you aren't using $1 and don't need any grouping you can get rid of
    the parentheses.

    s/^Path.*?^Xref//sm;
    I think you can merge the two regular expressions into one.
    [Note: untested]

    s/^Path.*?^Xref:[^\n]*\n/sm;

    Are you trying to delete *all* message headers? What if there's
    something after Xref?

    I'd process the file (without undef'ing $/) this way:

    while (<$in>) {
    next if /^[\w-]+:/ .. /^(?![\w-]+:)/; # skip header lines
    print $out $_; # print first line of message
    This isn't strictly necessary, although I prefer to explicitly close
    handles. Be consistent though. Either close both $in and $out or neither.
    You need a loop. The idiomatic loop for perl is a "foreach" loop.

    foreach my $file (@ARGV) {
    next unless -f $file; # skip missing or non-text files
    # (open files)
    # (process file)
    }


    -mjc
     
    Michael Carman, Jul 12, 2008
    #2
    1. Advertisements

  3. Jonathan Clark

    Uri Guttman Guest

    MC> So far, so good. :D

    MC> Slurping is okay for small files but doesn't scale. It should be fine
    MC> for newsgroup postings.

    the definition of small files has grown considerably. it used to mean
    <1k and today 1mb can be condsidered small. most common text files
    (sources, *ML, docs, configs, etc.) are easily slurped. and slurping if
    done right is faster. of course doing it right means using
    File::Slurp. it is cleaner code than undefing $/ and it is faster too.



    MC> Good x 3. You're using three-arg open(), checking its return value,
    MC> and including the error in any message.

    a good extra is to put the file name in the (improved) error string.

    open my $in, '<', $file or die "Can't open $file: $!";
    open my $out, '>', "$file.new" or die "Can't create $file.new: $!";

    MC> Since you're slurping the file you don't need a loop at all.

    MC> $_ = <$in>;

    use File::Slurp ;

    # no need to call open and <>. also no need to touch $/
    my $post = read_file( $file ) ;

    MC> Are you trying to delete *all* message headers? What if there's
    MC> something after Xref?

    then delete up to the first blank line (2 newlines or crlf
    pairs). someone else commented on this already

    MC> I'd process the file (without undef'ing $/) this way:

    MC> while (<$in>) {
    MC> next if /^[\w-]+:/ .. /^(?![\w-]+:)/; # skip header lines

    what about continuation header lines?

    uri
     
    Uri Guttman, Jul 12, 2008
    #3
  4. Jonathan Clark

    Dr.Ruud Guest

    Uri Guttman schreef:
    millibits were already small.
     
    Dr.Ruud, Jul 12, 2008
    #4
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.