Any Idea why this code doesn't remove all the blank lines?

Discussion in 'Perl Misc' started by Jack Wang, Feb 14, 2008.

  1. Jack Wang

    Jack Wang Guest

    This is the code I've written so far.

    #!/usr/bin/perl
    my $result = "";
    while (<>){
    if (/---START---/../--END\s---/){
    $result.=$_;
    }
    }
    $text="";
    $result=~m/^---START---(.*)--END\s---$/s;
    $text.=$1;
    $text =~ s/\n+/\n/g;
    print $text;

    This is the text that it should handle (shortened, ........ represents
    more data).

    ---START---

    1342A 1O B10/B11
    1003 1O B45/Z46
    1094 1O F39/F40
    1416 1O G37/G38
    1007 1O Z33/A34
    ..........................

    .............................
    .............................
    .....stuff here..........
    .....................

    4105 4L F31/F32
    .......................
    ......................

    --END ---


    I want to extract the data betweeen ---START--- and --END ---,
    removing any blanklines. However, the above mentioned program would
    outputs everything correctly except it leaves a blank line at the top
    and I can't figure out why. Thanks for any help!
     
    Jack Wang, Feb 14, 2008
    #1
    1. Advertising

  2. Jack Wang wrote:
    > This is the code I've written so far.
    >
    > #!/usr/bin/perl


    use warnings;
    use strict;

    > my $result = "";
    > while (<>){
    > if (/---START---/../--END\s---/){


    next unless /\S/;
    next if /---START---/ || /--END\s---/;

    > $result.=$_;
    > }
    > }
    > $text="";
    > $result=~m/^---START---(.*)--END\s---$/s;
    > $text.=$1;
    > $text =~ s/\n+/\n/g;
    > print $text;




    John
    --
    Perl isn't a toolbox, but a small machine shop where you
    can special-order certain sorts of tools at low cost and
    in short order. -- Larry Wall
     
    John W. Krahn, Feb 14, 2008
    #2
    1. Advertising

  3. Jack Wang

    Guest

    Jack Wang <> wrote:


    > $result=~m/^---START---(.*)--END\s---$/s;
    > $text.=$1;
    > $text =~ s/\n+/\n/g;


    ....

    > However, the above mentioned program would
    > outputs everything correctly except it leaves a blank line at the top
    > and I can't figure out why. Thanks for any help!


    You get a blank line either when there are two \n in a row, or when
    the string has a single \n at the beginning. Your regex captures one,
    but not the other.

    Either don't capture them in the first place:

    $result=~m/^---START---\n*(.*)--END\s---$/s;

    Or remove it particularly:

    $text =~ s/\n+/\n/g;
    $text =~ s/^\n+//;

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    The costs of publication of this article were defrayed in part by the
    payment of page charges. This article must therefore be hereby marked
    advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
    this fact.
     
    , Feb 14, 2008
    #3
  4. Jack Wang

    Uri Guttman Guest

    >>>>> "JWK" == John W Krahn <> writes:

    JWK> Jack Wang wrote:
    >> This is the code I've written so far.
    >> #!/usr/bin/perl


    JWK> use warnings;
    JWK> use strict;

    >> my $result = "";
    >> while (<>){
    >> if (/---START---/../--END\s---/){


    JWK> next unless /\S/;
    JWK> next if /---START---/ || /--END\s---/;

    you can use the return value of .. to eliminate the redundancy of those
    regexes:

    if ( my $range_num = /---START---/ .. /--END\s---/ ) {

    next if $range_num == 1 || $range_num =~ /e/i ;
    }


    i would even drop the block:

    my $range_num = /---START---/ .. /--END\s---/ ) {
    next unless $range_num ;
    next if $range_num == 1 || $range_num =~ /e/i ;
    next unless /\S/ ;

    but my favorite way is so much faster and shorter (untested):

    use File::Slurp ;

    my $text = read_file( \*STDIN ) ;
    while( my( $result ) = $text =~ m/^---START---(.+)--END\s---$/msg ) {

    # do newline and other cleanup here

    $result =~ tr/\n//s ;

    print $result ;
    }

    can't get much simpler than that.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Architecture, Development, Training, Support, Code Review ------
    ----------- Search or Offer Perl Jobs ----- http://jobs.perl.org ---------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
     
    Uri Guttman, Feb 14, 2008
    #4
  5. On Thu, 14 Feb 2008 13:09:08 -0800, Jack Wang wrote:

    > I want to extract the data betweeen ---START--- and --END ---,
    > removing any blanklines. However, the above mentioned program would
    > outputs everything correctly except it leaves a blank line at the top
    > and I can't figure out why. Thanks for any help!


    Because you ask it to?

    Your problem can be shortened to:
    $ perl -e '$t="\ntest\n\ntest\n"; $t=~ s/^\n+/\n/g; print "t=$t\n"'

    This does exactly the same thing, it leaves the first empty line. Why?
    Because you replace the newline there wit a newline.

    Try:
    $ perl -e '$t="\ntest\n\ntest\n"; $t=~ s/\n+/x/g; print "t=$t\n"'

    And you'll see what I mean.

    You probably want to add:
    $text =~ s/^\n//;
    to achieve what you want.

    Some stylistic issues:

    > #!/usr/bin/perl


    use strict;
    use warnings;

    > my $result = "";
    > while (<>){
    > if (/---START---/../--END\s---/){
    > $result.=$_;
    > }
    > }


    Indentation helps for readability.

    > $text="";
    > $result=~m/^---START---(.*)--END\s---$/s;
    > $text.=$1;


    Useless use of concatenation, Change to:

    $result=~m/^---START---(.*)--END\s---$/s;
    my $text = $1;

    > $text =~ s/\n+/\n/g;
    > print $text;


    HTH,
    M4
     
    Martijn Lievaart, Feb 14, 2008
    #5
  6. Try this:

    while(<>)
    {
    #
    # Grab the lines between these two lines (exclusive)
    #
    my $sequence = /---START---/.../--END\s---/;
    next unless $sequence > 1; # Excludes left-hand pattern
    next if $sequence =~ /E0$/; # Excludes right-hand pattern

    next if /^\s*$/; # Skip blank lines
    print;
    }


    "Jack Wang" <> wrote in message
    news:...
    > This is the code I've written so far.
    >
    > #!/usr/bin/perl
    > my $result = "";
    > while (<>){
    > if (/---START---/../--END\s---/){
    > $result.=$_;
    > }
    > }
    > $text="";
    > $result=~m/^---START---(.*)--END\s---$/s;
    > $text.=$1;
    > $text =~ s/\n+/\n/g;
    > print $text;
    >
    > This is the text that it should handle (shortened, ........ represents
    > more data).
    >
    > ---START---
    >
    > 1342A 1O B10/B11
    > 1003 1O B45/Z46
    > 1094 1O F39/F40
    > 1416 1O G37/G38
    > 1007 1O Z33/A34
    > .........................
    >
    > ............................
    > ............................
    > ....stuff here..........
    > ....................
    >
    > 4105 4L F31/F32
    > ......................
    > .....................
    >
    > --END ---
    >
    >
    > I want to extract the data betweeen ---START--- and --END ---,
    > removing any blanklines. However, the above mentioned program would
    > outputs everything correctly except it leaves a blank line at the top
    > and I can't figure out why. Thanks for any help!
     
    Mario D'Alessio, Feb 15, 2008
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mr. SweatyFinger
    Replies:
    2
    Views:
    2,030
    Smokey Grindel
    Dec 2, 2006
  2. Peter Otten

    Re: how to remove the blank lines?

    Peter Otten, Dec 8, 2006, in forum: Python
    Replies:
    1
    Views:
    646
    Peter Otten
    Dec 9, 2006
  3. Replies:
    7
    Views:
    486
    Peter J. Acklam
    Sep 12, 2005
  4. Replies:
    5
    Views:
    148
  5. Cah Sableng
    Replies:
    0
    Views:
    245
    Cah Sableng
    Apr 23, 2007
Loading...

Share This Page