Matching spaces at start of line

Discussion in 'Perl Misc' started by Paul Jones, Nov 24, 2005.

  1. Paul Jones

    Paul Jones Guest

    Perhaps someone can help me with a little perl problem.

    I want to take an message and write it out to a file, but I only
    want specific header information. I have included what I have
    got below (ripped from a mail2news script on the web), however
    the comments header is often split over multiple lines (usually
    4) and I want to include them all. The 2-4th lines of the header
    all begin with spaces (usually 6 - perhaps its a tab).

    The program does exactly what I want of it, except for missing
    the lines beginning with a space.




    open (INEWS, "| $program $options") ||
    die "$program: can't run $news_poster_program\n";

    # header loop
    while (<STDIN>) {
    last if /^$/;

    s/(?i)^Date/Date/;
    s/(?i)^From/From/;
    s/(?i)^Subject/Subject/;
    s/(?i)^Newsgroups/Newsgroups/;
    s/(?i)^Comments/Comments/;

    print INEWS

    if /^(Date|From|Subject|Newsgroups|Comments):/i;
    $saw_newsgroup |= ( $+ eq 'Newsgroups' );

    }

    die "$program: didn't get newsgroup from headers\n"
    unless $saw_newsgroup;

    print INEWS "\n";

    print INEWS while <STDIN>; # gobble rest of message
     
    Paul Jones, Nov 24, 2005
    #1
    1. Advertising

  2. Paul Jones

    Dr.Ruud Guest

    Paul Jones schreef:

    > Perhaps someone can help me with a little perl problem.
    >
    > I want to take an message and write it out to a file, but I only
    > want specific header information. I have included what I have
    > got below (ripped from a mail2news script on the web), however
    > the comments header is often split over multiple lines (usually
    > 4) and I want to include them all. The 2-4th lines of the header
    > all begin with spaces (usually 6 - perhaps its a tab).
    >
    > The program does exactly what I want of it, except for missing
    > the lines beginning with a space.



    use strict;
    use warnings;

    > open (INEWS, "| $program $options") ||
    > die "$program: can't run $news_poster_program\n";
    >
    > # header loop
    > while (<STDIN>) {
    > last if /^$/;


    You process line-by-line, but some header fields are folded. Reading the
    whole header at once will make things simpler, you won't even need the
    while-loop. See perldoc perlvar, specifically $/, the input record
    separator.

    local $/ = '';

    When the header is in $_, you can use 's/\n\s+/ /g' to unfold the folded
    header fields.
    (err, that \s needs to be [[:blank:]])

    After that you can remove the header fields that you don't want:

    s/\n(?!(Date|From|Subject|Newsgroups|Comments):).*//gi


    > s/(?i)^Date/Date/;
    > s/(?i)^From/From/;
    > s/(?i)^Subject/Subject/;
    > s/(?i)^Newsgroups/Newsgroups/;
    > s/(?i)^Comments/Comments/;


    It is better to include the ':' too:

    s/^date:/Date:/mi;


    Basically:

    #!/usr/bin/perl

    use strict;
    use warnings;

    { local ($/, $\) = ('', "\n\n");

    # read the header
    $_ = <STDIN>;

    # unfold header fields
    s/\n[[:blank:]]+/ /g;

    # remove unwanted header fields
    s/\n(?!(Date|From|Subject|etc.):).*//gi;

    # standardize header field names
    s/^date:/Date:/mi;
    # etc.

    print;

    # read and print the body
    undef $/;
    undef $\;
    $_ = <STDIN>;
    print;
    }

    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Nov 24, 2005
    #2
    1. Advertising

  3. Paul Jones <> wrote:

    > the comments header is often split over multiple lines (usually
    > 4) and I want to include them all. The 2-4th lines of the header
    > all begin with spaces (usually 6 - perhaps its a tab).



    > print INEWS
    >
    > if /^(Date|From|Subject|Newsgroups|Comments):/i;



    if /^(Date|From|Subject|Newsgroups|Comments):/i or /^\s/;


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Nov 24, 2005
    #3
  4. Tad McClellan wrote:
    > Paul Jones <> wrote:
    >
    >>the comments header is often split over multiple lines (usually
    >>4) and I want to include them all. The 2-4th lines of the header
    >>all begin with spaces (usually 6 - perhaps its a tab).

    >
    >
    >> print INEWS
    >>
    >> if /^(Date|From|Subject|Newsgroups|Comments):/i;

    >
    >
    > if /^(Date|From|Subject|Newsgroups|Comments):/i or /^\s/;


    That is going to print from every header that is folded, not just the Comments
    header. :-(


    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, Nov 25, 2005
    #4
  5. Paul Jones wrote:
    > Perhaps someone can help me with a little perl problem.
    >
    > I want to take an message and write it out to a file, but I only
    > want specific header information. I have included what I have
    > got below (ripped from a mail2news script on the web), however
    > the comments header is often split over multiple lines (usually
    > 4) and I want to include them all. The 2-4th lines of the header
    > all begin with spaces (usually 6 - perhaps its a tab).
    >
    > The program does exactly what I want of it, except for missing
    > the lines beginning with a space.
    >
    >
    >
    > open (INEWS, "| $program $options") ||
    > die "$program: can't run $news_poster_program\n";
    >
    > # header loop
    > while (<STDIN>) {
    > last if /^$/;
    >
    > s/(?i)^Date/Date/;
    > s/(?i)^From/From/;
    > s/(?i)^Subject/Subject/;
    > s/(?i)^Newsgroups/Newsgroups/;
    > s/(?i)^Comments/Comments/;
    >
    > print INEWS
    >
    > if /^(Date|From|Subject|Newsgroups|Comments):/i;
    > $saw_newsgroup |= ( $+ eq 'Newsgroups' );
    >
    > }
    >
    > die "$program: didn't get newsgroup from headers\n"
    > unless $saw_newsgroup;
    >
    > print INEWS "\n";
    >
    > print INEWS while <STDIN>; # gobble rest of message


    Here is one way to do it:


    open INEWS, '|-', split( ' ', "$program $options" )
    or die "$program: can't run $news_poster_program: $!";

    my $saw_newsgroup;
    while ( <STDIN> ) {
    last if /^$/;
    if ( s/^(comments:)/\u\L$1/i ) {
    my $line = <>;
    if ( $line =~ /^\s/ ) {
    $_ .= $line;
    }
    else {
    print INEWS;
    $_ = $line;
    }
    redo;
    }
    elsif ( s/^(newsgroups:)/\u\L$1/i ) {
    $saw_newsgroup++;
    }
    elsif ( ! s/^(date|from|subject):/\u\L$1:/i ) {
    next;
    }
    print INEWS;
    }

    die "$program: didn't get newsgroup from headers\n"
    unless $saw_newsgroup;

    print INEWS "\n";
    print INEWS while <STDIN>;

    close INEWS
    or warn $! ? "Error closing $program: $!"
    : "Exit status $? from $program";

    __END__



    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, Nov 25, 2005
    #5
  6. Paul Jones

    Anno Siegel Guest

    John W. Krahn <> wrote in comp.lang.perl.misc:
    > Tad McClellan wrote:
    > > Paul Jones <> wrote:
    > >
    > >>the comments header is often split over multiple lines (usually
    > >>4) and I want to include them all. The 2-4th lines of the header
    > >>all begin with spaces (usually 6 - perhaps its a tab).

    > >
    > >
    > >> print INEWS
    > >>
    > >> if /^(Date|From|Subject|Newsgroups|Comments):/i;

    > >
    > >
    > > if /^(Date|From|Subject|Newsgroups|Comments):/i or /^\s/;

    >
    > That is going to print from every header that is folded, not just the Comments
    > header. :-(


    I think it should, for those headers that are to be printed at all. I
    know that OP singled out the Comments header, but that's probably just
    because it's the only one that *has* continuation lines with any regularity.

    I read the problem as "Print all valid headers ( Date, From, ...) and
    their continuation lines, suppress all others, including their continuation
    lines". Could be done like this:

    my $saw_newsgroup;
    my $valid_header;
    while (<DATA>) {
    last if /^$/;

    if ( /^\S/ ) {
    $valid_header =
    s{^(Date|From|Subject|Newsgroups|Comments):}
    {\u\L$1:}i;
    $saw_newsgroup ||= $valid_header && $1 eq 'Newsgroups';
    }
    print if $valid_header;
    }
    die "didn't get newsgroup from headers\n" unless $saw_newsgroup;

    print "\n";
    print while <DATA>; # gobble rest of message

    Besides adding the logic to control $valid_header, I have used the
    s/// operation that normalizes header spelling to determine if
    we have a valid header. This way, the list of valid headers appears
    only once in the code, while the original code had it twice. A
    maintainer will appreciate that.

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Nov 25, 2005
    #6
  7. Paul Jones

    Paul Jones Guest

    In article <>
    "Dr.Ruud" <> wrote:
    >
    > Paul Jones schreef:
    >
    > > Perhaps someone can help me with a little perl problem.
    > >
    > > I want to take an message and write it out to a file, but I only
    > > want specific header information. I have included what I have
    > > got below (ripped from a mail2news script on the web), however
    > > the comments header is often split over multiple lines (usually
    > > 4) and I want to include them all. The 2-4th lines of the header
    > > all begin with spaces (usually 6 - perhaps its a tab).
    > >
    > > The program does exactly what I want of it, except for missing
    > > the lines beginning with a space.

    >
    >
    > use strict;
    > use warnings;
    >
    > > open (INEWS, "| $program $options") ||
    > > die "$program: can't run $news_poster_program\n";
    > >
    > > # header loop
    > > while (<STDIN>) {
    > > last if /^$/;

    >
    > You process line-by-line, but some header fields are folded. Reading the
    > whole header at once will make things simpler, you won't even need the
    > while-loop. See perldoc perlvar, specifically $/, the input record
    > separator.
    >



    <snip great advice>



    > Basically:


    Thanks very much!

    Unfortunately I chopped my script down a bit to post it on here,
    thinking that as some of it is in essence just repeated, or so I
    thought, I needent bother you with it all (it's really about 2-
    3x the length of what I posted). Unfortunately my first attempt
    at expanding what you have given me failed to work fairly
    miserably!

    I will try a little longer at getting it to work, but if I still
    have problems I'll post the whole thing and maybe you can point
    out what I'm doing wrong. I'm very new to perl and all I know is
    what I've managed to get from the web or from other peoples
    annotated scripts.
     
    Paul Jones, Nov 25, 2005
    #7
  8. Paul Jones

    Paul Jones Guest

    In article <dm6slf$se2$-Berlin.DE>
    -berlin.de (Anno Siegel) wrote:
    >
    > John W. Krahn <> wrote in comp.lang.perl.misc:
    > > Tad McClellan wrote:
    > > > Paul Jones <> wrote:
    > > >
    > > >>the comments header is often split over multiple lines (usually
    > > >>4) and I want to include them all. The 2-4th lines of the header
    > > >>all begin with spaces (usually 6 - perhaps its a tab).
    > > >
    > > >
    > > >> print INEWS
    > > >>
    > > >> if /^(Date|From|Subject|Newsgroups|Comments):/i;
    > > >
    > > >
    > > > if /^(Date|From|Subject|Newsgroups|Comments):/i or /^\s/;

    > >
    > > That is going to print from every header that is folded, not just the Comments
    > > header. :-(

    >
    > I think it should, for those headers that are to be printed at all. I
    > know that OP singled out the Comments header, but that's probably just
    > because it's the only one that *has* continuation lines with any regularity.
    >
    > I read the problem as "Print all valid headers ( Date, From, ...) and
    > their continuation lines, suppress all others, including their continuation
    > lines". Could be done like this:


    Perhaps a better way of approaching the problem would have been
    to print all headers except a list that should be excluded.

    Anyway, maybe I should have just listed the entire requirements
    of what I wanted, rather than show you what I've got and ask how
    to do it better. So here we go:

    The message is routed via Exim4 in it's transport phase.

    mail2news_transport:
    debug_print = "T: mail2news for $local_part@$domain"
    driver = pipe
    command = /usr/local/bin/mail2news.pl
    user = mail

    It's then picked up by the perl script.

    The perl script starts with:



    $news_poster_program = "/usr/bin/rnews";
    $news_poster_options = "-r localhost";

    open (INEWS, "| $news_poster_program $news_poster_options") ||
    die "$program: can't run $news_poster_program\n";



    Then it goes on to work on the headers. I would like to exclude:

    Control:
    Supersedes:
    Authorised:
    Received:

    (Note Received: can go over multiple lines)

    Certain headers need to be supplied or it fails:

    From:
    Newsgroups:
    Date:


    Some are required but can be created if absent:

    Subject:
    Message-ID:
    Organization:

    (I think I've got that bit sorted)

    ($sec,$min,$hour,$mday,$mon,$year)=localtime(time);
    $madeupid = "\<$year$mon$mday.$hour$min$sec.$$\@myhost.com\>";

    print INEWS "Subject: Untitled\n" unless $saw_subject;
    printf INEWS "Message-ID: %s\n", $madeupid unless $saw_msgid;
    print INEWS "Organization: Unknown\n" unless $saw_organization;


    Some must be created (removing any that already exist):

    print INEWS "Path: posted\n";


    Others should be printed, unless excluded.

    One final thing - it would be nice to grab the IP address from
    the first Received header to add it to an NNTP-Posting-Host:
    header.

    This is basically a mail2news posting perl script. The ones
    already out there dind't seem to work as well as I liked, so I
    tried fiddling with them to improve them. I've got it working,
    however it could be (much) better (because I know next to
    nothing about PERL).

    Anyway, thanks for the helps so far. I'll keep tinkering. But I
    would be happy to receive any more tips!
     
    Paul Jones, Nov 25, 2005
    #8
  9. Anno Siegel wrote:
    > John W. Krahn <> wrote in comp.lang.perl.misc:
    >>Tad McClellan wrote:
    >>>Paul Jones <> wrote:
    >>>
    >>>>the comments header is often split over multiple lines (usually
    >>>>4) and I want to include them all. The 2-4th lines of the header
    >>>>all begin with spaces (usually 6 - perhaps its a tab).
    >>>
    >>>> print INEWS
    >>>>
    >>>> if /^(Date|From|Subject|Newsgroups|Comments):/i;
    >>>
    >>> if /^(Date|From|Subject|Newsgroups|Comments):/i or /^\s/;

    >>That is going to print from every header that is folded, not just the Comments
    >>header. :-(

    >
    > I think it should, for those headers that are to be printed at all. I
    > know that OP singled out the Comments header, but that's probably just
    > because it's the only one that *has* continuation lines with any regularity.
    >
    > I read the problem as "Print all valid headers ( Date, From, ...) and
    > their continuation lines, suppress all others, including their continuation
    > lines". Could be done like this:
    >
    > my $saw_newsgroup;
    > my $valid_header;
    > while (<DATA>) {
    > last if /^$/;
    >
    > if ( /^\S/ ) {
    > $valid_header =
    > s{^(Date|From|Subject|Newsgroups|Comments):}
    > {\u\L$1:}i;
    > $saw_newsgroup ||= $valid_header && $1 eq 'Newsgroups';


    If the original header is not exactly 'Newsgroups', for example 'newsgroups'
    or 'NewsGroups', then that expression will fail. Perhaps:

    $saw_newsgroup ||= $valid_header && lc( $1 ) eq 'newsgroups';

    > }
    > print if $valid_header;
    > }
    > die "didn't get newsgroup from headers\n" unless $saw_newsgroup;
    >
    > print "\n";
    > print while <DATA>; # gobble rest of message
    >
    > Besides adding the logic to control $valid_header, I have used the
    > s/// operation that normalizes header spelling to determine if
    > we have a valid header. This way, the list of valid headers appears
    > only once in the code, while the original code had it twice. A
    > maintainer will appreciate that.



    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, Nov 26, 2005
    #9
  10. Paul Jones

    Anno Siegel Guest

    John W. Krahn <> wrote in comp.lang.perl.misc:
    > Anno Siegel wrote:


    [...]

    > > $valid_header =
    > > s{^(Date|From|Subject|Newsgroups|Comments):}
    > > {\u\L$1:}i;
    > > $saw_newsgroup ||= $valid_header && $1 eq 'Newsgroups';

    >
    > If the original header is not exactly 'Newsgroups', for example 'newsgroups'
    > or 'NewsGroups', then that expression will fail. Perhaps:
    >
    > $saw_newsgroup ||= $valid_header && lc( $1 ) eq 'newsgroups';
    >
    > > }
    > > print if $valid_header;
    > > }
    > > die "didn't get newsgroup from headers\n" unless $saw_newsgroup;


    You are right, I noticed that bug myself after posting. I thought of the
    same fix too.

    It is remarkable (and comforting) how carefully code is read that is
    posted here.

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Nov 26, 2005
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John B. Matthews
    Replies:
    4
    Views:
    671
    John B. Matthews
    Sep 12, 2008
  2. Roedy Green
    Replies:
    3
    Views:
    627
  3. Tomasz Chmielewski

    [CSS] how can I show spaces as spaces?

    Tomasz Chmielewski, Sep 5, 2009, in forum: HTML
    Replies:
    21
    Views:
    1,610
    Jan C. Faerber
    Sep 10, 2009
  4. Tomasz Chmielewski

    how can I show spaces as spaces, part 2

    Tomasz Chmielewski, Sep 9, 2009, in forum: HTML
    Replies:
    14
    Views:
    878
    Jan C. Faerber
    Sep 10, 2009
  5. johkar
    Replies:
    2
    Views:
    2,903
    Mayeul
    Dec 10, 2009
Loading...

Share This Page