split a multiple lines text

Discussion in 'Perl Misc' started by bingfeng, Oct 20, 2008.

  1. bingfeng

    bingfeng Guest

    Hello,
    Assume I have following string:
    my $cmds = <<DOC
    __begin {
    abc;
    def;
    {foo;bar}
    } __end;
    __begin {
    cde;
    } __end;
    abc;
    bad;
    DOC
    ;

    I want to split it into an array, the first item is "__begin {
    abc;
    def;
    {foo;bar}
    } __end", the second item is "__begin {
    cde;
    } __end", and the third is "abc" and the fourth is "bad".

    split obviously cannot be used here, so I use following regex:
    my @lines = ($cmds =~ /__begin.*?__end|[^;]+/sg);

    but it does not work at all. so how can I do with this?

    regards,
    bingfeng
     
    bingfeng, Oct 20, 2008
    #1
    1. Advertising

  2. bingfeng wrote:
    >
    > Assume I have following string:
    > my $cmds = <<DOC
    > __begin {
    > abc;
    > def;
    > {foo;bar}
    > } __end;
    > __begin {
    > cde;
    > } __end;
    > abc;
    > bad;
    > DOC
    > ;
    >
    > I want to split it into an array, the first item is "__begin {
    > abc;
    > def;
    > {foo;bar}
    > } __end", the second item is "__begin {
    > cde;
    > } __end", and the third is "abc" and the fourth is "bad".
    >
    > split obviously cannot be used here, so I use following regex:
    > my @lines = ($cmds =~ /__begin.*?__end|[^;]+/sg);
    >
    > but it does not work at all. so how can I do with this?


    my @lines = $cmds =~ /^\s*__begin(?s:.*?)__end;$|^\s*\S+;$/mg;



    John
    --
    Perl isn't a toolbox, but a small machine shop where you
    can special-order certain sorts of tools at low cost and
    in short order. -- Larry Wall
     
    John W. Krahn, Oct 20, 2008
    #2
    1. Advertising

  3. bingfeng

    Dr.Ruud Guest

    bingfeng schreef:

    > Assume I have following string:
    > my $cmds = <<DOC
    > __begin {
    > abc;
    > def;
    > {foo;bar}
    > } __end;
    > __begin {
    > cde;
    > } __end;
    > abc;
    > bad;
    > DOC
    > ;
    >
    > I want to split it into an array, the first item is "__begin {
    > abc;
    > def;
    > {foo;bar}
    > } __end", the second item is "__begin {
    > cde;
    > } __end", and the third is "abc" and the fourth is "bad".
    >
    > split obviously cannot be used here, so I use following regex:
    > my @lines = ($cmds =~ /__begin.*?__end|[^;]+/sg);
    >
    > but it does not work at all. so how can I do with this?


    my @blocks;

    for my $re ( qr/__begin.*?__end/s, qr/^[^;]+/m ) {
    while ($cmds =~ s/\s*($re)\s*;/" "x ($+[0] - $-[0])/es) {
    push @blocks, [ $-[0], $1 ];
    }
    }

    for (sort { $a->[0] <=> $b->[0] } @blocks) {
    print "<--\n", $_->[1], "\n-->\n";
    }


    In the end, $cmds will still have the same length, but will contain only
    whitespace (and any unmatched content).

    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Oct 20, 2008
    #3
  4. bingfeng <> wrote:
    >Hello,
    >Assume I have following string:
    >my $cmds = <<DOC
    > __begin {
    > abc;
    > def;
    > {foo;bar}
    > } __end;
    > __begin {
    > cde;
    > } __end;
    > abc;
    > bad;
    >DOC
    >;
    >
    >I want to split it into an array, the first item is "__begin {
    > abc;
    > def;
    > {foo;bar}
    > } __end", the second item is "__begin {
    > cde;
    > } __end", and the third is "abc" and the fourth is "bad".


    This sounds suspiciously like an X-Y problem to me. Are you reading this
    text from a file? If yes, then if a block starts with '__begin{' you can
    read that block until the '__end;' token is reached.

    And yes, you can use split() if you do it in two steps:
    First split on '__end;' . And in the second step repair the now missing
    '__end;' for those items, that have a leading '__begin{' and for the
    others split again at ';'.

    jue
     
    Jürgen Exner, Oct 20, 2008
    #4
  5. bingfeng

    bingfeng Guest

    On 10ÔÂ20ÈÕ, ÏÂÎç7ʱ36·Ö, "John W. Krahn" <> wrote:
    > bingfeng wrote:
    >
    > > Assume I have following string:
    > > my $cmds = <<DOC
    > > __begin {
    > > abc;
    > > def;
    > > {foo;bar}
    > > } __end;
    > > __begin {
    > > cde;
    > > } __end;
    > > abc;
    > > bad;
    > > DOC
    > > ;

    >
    > > I want to split it into an array, the first item is "__begin {
    > > abc;
    > > def;
    > > {foo;bar}
    > > } __end", the second item is "__begin {
    > > cde;
    > > } __end", and the third is "abc" and the fourth is "bad".

    >
    > > split obviously cannot be used here, so I use following regex:
    > > my @lines = ($cmds =~ /__begin.*?__end|[^;]+/sg);

    >
    > > but it does not work at all. so how can I do with this?

    >
    > my @lines = $cmds =~ /^\s*__begin(?s:.*?)__end;$|^\s*\S+;$/mg;
    >
    > John
    > --
    > Perl isn't a toolbox, but a small machine shop where you
    > can special-order certain sorts of tools at low cost and
    > in short order. -- Larry Wall- Òþ²Ø±»ÒýÓÃÎÄ×Ö -
    >
    > - ÏÔʾÒýÓõÄÎÄ×Ö -


    Thank you, John, it works very well. You help save some hours!
     
    bingfeng, Oct 20, 2008
    #5
  6. bingfeng

    Guest

    On Mon, 20 Oct 2008 02:42:03 -0700 (PDT), bingfeng <> wrote:

    >Hello,
    >Assume I have following string:
    >my $cmds = <<DOC
    > __begin {
    > abc;
    > def;
    > {foo;bar}
    > } __end;
    > __begin {
    > cde;
    > } __end;
    > abc;
    > bad;
    >DOC
    >;
    >
    >I want to split it into an array, the first item is "__begin {
    > abc;
    > def;
    > {foo;bar}
    > } __end", the second item is "__begin {
    > cde;
    > } __end", and the third is "abc" and the fourth is "bad".
    >
    >split obviously cannot be used here, so I use following regex:
    >my @lines = ($cmds =~ /__begin.*?__end|[^;]+/sg);

    ^^
    my @lines = ($cmds =~ /__begin.*?__end|[^\s;]+/sg);

    You were on the right track. [^;] however is first to match all before ';',
    which means it grabs the' __begin { .. abc;' then the next, then next.
    '__begin.*?__end' is never matched. By including not whitespace, [^\s;] in
    the character class, begin and end have a chance.

    sln

    --------------------

    use strict;
    use warnings;

    my $cmds = <<DOC
    __begin {
    abc;
    def;
    {foo;bar}
    } __end;
    __begin {
    cde;
    } __end;
    abc;
    bad;
    DOC
    ;

    my @lines = ($cmds =~ /__begin.*?__end|[^\s;]+/sg);

    for (my $i = 0; $i < @lines; $i++) {
    print "\n\$lines[$i] = \n\n\"$lines[$i]\"\n";
    }

    __END__

    output:

    $lines[0] =

    "__begin {
    abc;
    def;
    {foo;bar}
    } __end"

    $lines[1] =

    "__begin {
    cde;
    } __end"

    $lines[2] =

    "abc"

    $lines[3] =

    "bad"
     
    , Oct 20, 2008
    #6
  7. bingfeng

    bingfeng Guest

    On Oct 21, 1:27 am, wrote:
    > On Mon, 20 Oct 2008 02:42:03 -0700 (PDT), bingfeng <> wrote:
    > >Hello,
    > >Assume I have following string:
    > >my $cmds = <<DOC
    > >  __begin {
    > >     abc;
    > >     def;
    > >     {foo;bar}
    > >  } __end;
    > >  __begin {
    > >     cde;
    > >  } __end;
    > >  abc;
    > >  bad;
    > >DOC
    > >;

    >
    > >I want to split it into an array, the first item is "__begin {
    > >     abc;
    > >     def;
    > >     {foo;bar}
    > >  } __end", the second item  is  "__begin {
    > >     cde;
    > >  } __end", and the third is "abc" and the fourth is "bad".

    >
    > >split obviously cannot be used here, so I use following regex:
    > >my @lines = ($cmds =~ /__begin.*?__end|[^;]+/sg);

    >
    >                                          ^^
    > my @lines = ($cmds =~ /__begin.*?__end|[^\s;]+/sg);
    >
    > You were on the right track. [^;] however is first to match all before ';',
    > which means it grabs the'   __begin { .. abc;' then the next, then next..
    > '__begin.*?__end' is never matched. By including not whitespace, [^\s;] in
    > the character class, begin and end have a chance.
    >

    You are right. Thanks for your explanation. My sample is some
    oversimple. the standalone sentence may contain other word and space,
    with following test message:
    my $cmds = <<DOC
    __begin {
    abc sss;
    def;
    {foo;bar}
    } __end;
    __begin {
    cde;
    } __end;
    abc kkk;
    bad fde;
    DOC
    ;

    you solution gives following Dumper result:
    $VAR1 = '__begin {
    abc sss;
    def;
    {foo;bar}
    } __end';
    $VAR2 = '__begin {
    cde;
    } __end';
    $VAR3 = 'abc';
    $VAR4 = 'kkk';
    $VAR5 = 'bad';
    $VAR6 = 'fde';

    that's not what I want. Apart from John's solution, I have no other
    solution. Thank you

    > sln
    >
    > --------------------
    >
    > use strict;
    > use warnings;
    >
    > my $cmds = <<DOC
    >   __begin {
    >      abc;
    >      def;
    >      {foo;bar}
    >   } __end;
    >   __begin {
    >      cde;
    >   } __end;
    >   abc;
    >   bad;
    > DOC
    > ;
    >
    > my @lines = ($cmds =~ /__begin.*?__end|[^\s;]+/sg);
    >
    > for (my $i = 0; $i < @lines; $i++) {
    >         print "\n\$lines[$i] = \n\n\"$lines[$i]\"\n";
    >
    > }
    >
    > __END__
    >
    > output:
    >
    > $lines[0] =
    >
    > "__begin {
    >      abc;
    >      def;
    >      {foo;bar}
    >   } __end"
    >
    > $lines[1] =
    >
    > "__begin {
    >      cde;
    >   } __end"
    >
    > $lines[2] =
    >
    > "abc"
    >
    > $lines[3] =
    >
    > "bad"
     
    bingfeng, Oct 21, 2008
    #7
  8. bingfeng

    Guest

    On Mon, 20 Oct 2008 20:59:35 -0700 (PDT), bingfeng <> wrote:

    >On Oct 21, 1:27 am, wrote:
    >> On Mon, 20 Oct 2008 02:42:03 -0700 (PDT), bingfeng <> wrote:
    >> >Hello,
    >> >Assume I have following string:
    >> >my $cmds = <<DOC
    >> >  __begin {
    >> >     abc;
    >> >     def;
    >> >     {foo;bar}
    >> >  } __end;
    >> >  __begin {
    >> >     cde;
    >> >  } __end;
    >> >  abc;
    >> >  bad;
    >> >DOC
    >> >;

    >>
    >> >I want to split it into an array, the first item is "__begin {
    >> >     abc;
    >> >     def;
    >> >     {foo;bar}
    >> >  } __end", the second item  is  "__begin {
    >> >     cde;
    >> >  } __end", and the third is "abc" and the fourth is "bad".

    >>
    >> >split obviously cannot be used here, so I use following regex:
    >> >my @lines = ($cmds =~ /__begin.*?__end|[^;]+/sg);

    >>
    >>                                          ^^
    >> my @lines = ($cmds =~ /__begin.*?__end|[^\s;]+/sg);
    >>
    >> You were on the right track. [^;] however is first to match all before ';',
    >> which means it grabs the'   __begin { .. abc;' then the next, then next.
    >> '__begin.*?__end' is never matched. By including not whitespace, [^\s;] in
    >> the character class, begin and end have a chance.
    >>

    >You are right. Thanks for your explanation. My sample is some
    >oversimple. the standalone sentence may contain other word and space,
    >with following test message:
    >my $cmds = <<DOC
    > __begin {
    > abc sss;
    > def;
    > {foo;bar}
    > } __end;
    > __begin {
    > cde;
    > } __end;
    > abc kkk;
    > bad fde;
    >DOC
    >;
    >
    >you solution gives following Dumper result:
    >$VAR1 = '__begin {
    > abc sss;
    > def;
    > {foo;bar}
    > } __end';
    >$VAR2 = '__begin {
    > cde;
    > } __end';
    >$VAR3 = 'abc';
    >$VAR4 = 'kkk';
    >$VAR5 = 'bad';
    >$VAR6 = 'fde';
    >


    Thats too bad. You made a good attempt and I gave
    you credit by saying you almost had it right the first time.
    And the regex was altered slightly from why you yourself tried.

    I didn't write a regex for you. Because if I did that, you could
    always come back and say for example:

    #>You are right. Thanks for your explanation. My sample is some
    #>oversimple. the standalone sentence may contain other word and space,
    #>with following test message ...

    But you didn't say that in the first place.

    >that's not what I want. Apart from John's solution, I have no other
    >solution.
    >

    ^^^^^^^^^^^^^
    Think again ... you just invalidated his regex.

    my @lines = $str =~ /^\s*__begin(?s:.*?)__end;$|^\s*\S+;$/mg;

    $lines[0] =
    " __begin {
    abc sss;
    def;
    {foo;bar}
    } __end;"

    $lines[1] =
    " __begin {
    cde;
    } __end;"

    What are you going to do now?
    We're still in the extremely simple stage.
    In fact, the more you add, the simpler it gets.

    sln

    -------------------------------

    Version 2

    #################
    # Misc Parse 2
    #################

    use strict;
    use warnings;

    # the old
    my $cmd1 = <<DOC1
    __begin {
    abc;
    def;
    {foo;bar}
    } __end;
    __begin {
    cde;
    } __end;
    abc;
    bad;
    DOC1
    ;

    # the new
    my $cmds2 = <<DOC2
    __begin {
    abc sss;
    def;
    {foo;bar}
    } __end;
    __begin {
    cde;
    } __end;
    abc kkk;
    bad fde;
    DOC2
    ;

    my $str = $cmds2;

    my @lines = ($str =~ /\s*(__begin.*?__end|.*?);/sg);

    for (my $i = 0; $i < @lines; $i++) {
    print "\n\$lines[$i] = \n\n\"$lines[$i]\"\n";
    }

    __END__

    output:

    $lines[0] =

    "__begin {
    abc sss;
    def;
    {foo;bar}
    } __end"

    $lines[1] =

    "__begin {
    cde;
    } __end"

    $lines[2] =

    "abc kkk"

    $lines[3] =

    "bad fde"
     
    , Oct 21, 2008
    #8
  9. bingfeng

    bingfeng Guest

    On Oct 21, 1:46 pm, wrote:
    > On Mon, 20 Oct 2008 20:59:35 -0700 (PDT), bingfeng <> wrote:
    > >On Oct 21, 1:27 am, wrote:
    > >> On Mon, 20 Oct 2008 02:42:03 -0700 (PDT), bingfeng <> wrote:
    > >> >Hello,
    > >> >Assume I have following string:
    > >> >my $cmds = <<DOC
    > >> >  __begin {
    > >> >     abc;
    > >> >     def;
    > >> >     {foo;bar}
    > >> >  } __end;
    > >> >  __begin {
    > >> >     cde;
    > >> >  } __end;
    > >> >  abc;
    > >> >  bad;
    > >> >DOC
    > >> >;

    >
    > >> >I want to split it into an array, the first item is "__begin {
    > >> >     abc;
    > >> >     def;
    > >> >     {foo;bar}
    > >> >  } __end", the second item  is  "__begin {
    > >> >     cde;
    > >> >  } __end", and the third is "abc" and the fourth is "bad".

    >
    > >> >split obviously cannot be used here, so I use following regex:
    > >> >my @lines = ($cmds =~ /__begin.*?__end|[^;]+/sg);

    >
    > >>                                          ^^
    > >> my @lines = ($cmds =~ /__begin.*?__end|[^\s;]+/sg);

    >
    > >> You were on the right track. [^;] however is first to match all before';',
    > >> which means it grabs the'   __begin { .. abc;' then the next, then next.
    > >> '__begin.*?__end' is never matched. By including not whitespace, [^\s;] in
    > >> the character class, begin and end have a chance.

    >
    > >You are right. Thanks for your explanation. My sample is some
    > >oversimple. the standalone sentence may contain other word and space,
    > >with following test message:
    > >my $cmds = <<DOC
    > >  __begin {
    > >     abc sss;
    > >     def;
    > >     {foo;bar}
    > >  } __end;
    > >  __begin {
    > >     cde;
    > >  } __end;
    > >  abc kkk;
    > >  bad fde;
    > >DOC
    > >;

    >
    > >you solution gives following Dumper result:
    > >$VAR1 = '__begin {
    > >     abc sss;
    > >     def;
    > >     {foo;bar}
    > >  } __end';
    > >$VAR2 = '__begin {
    > >     cde;
    > >  } __end';
    > >$VAR3 = 'abc';
    > >$VAR4 = 'kkk';
    > >$VAR5 = 'bad';
    > >$VAR6 = 'fde';

    >
    > Thats too bad. You made a good attempt and I gave
    > you credit by saying you almost had it right the first time.
    > And the regex was altered slightly from why you yourself tried.
    >
    > I didn't write a regex for you. Because if I did that, you could
    > always come back and say for example:
    >
    >   #>You are right. Thanks for your explanation. My sample is some
    >   #>oversimple. the standalone sentence may contain other word and space,
    >   #>with following test message ...
    >
    > But you didn't say that in the first place.
    >
    > >that's not what I want. Apart from John's solution, I have no other
    > >solution.

    >
    >     ^^^^^^^^^^^^^
    > Think again ... you just invalidated his regex.
    >
    > my @lines = $str =~ /^\s*__begin(?s:.*?)__end;$|^\s*\S+;$/mg;
    >
    > $lines[0] =
    > "  __begin {
    >      abc sss;
    >      def;
    >      {foo;bar}
    >   } __end;"
    >
    > $lines[1] =
    > "  __begin {
    >      cde;
    >   } __end;"
    >
    > What are you going to do now?
    > We're still in the extremely simple stage.
    > In fact, the more you add, the simpler it gets.
    >
    > sln
    >
    > -------------------------------
    >
    > Version 2
    >
    > #################
    > # Misc Parse 2
    > #################
    >
    > use strict;
    > use warnings;
    >
    > # the old
    > my $cmd1 = <<DOC1
    >   __begin {
    >      abc;
    >      def;
    >      {foo;bar}
    >   } __end;
    >   __begin {
    >      cde;
    >   } __end;
    >   abc;
    >   bad;
    > DOC1
    > ;
    >
    > # the new
    > my $cmds2 = <<DOC2
    >   __begin {
    >      abc sss;
    >      def;
    >      {foo;bar}
    >   } __end;
    >   __begin {
    >      cde;
    >   } __end;
    >   abc kkk;
    >   bad fde;
    > DOC2
    > ;
    >
    > my $str = $cmds2;
    >
    > my @lines = ($str =~ /\s*(__begin.*?__end|.*?);/sg);
    >
    > for (my $i = 0; $i < @lines; $i++) {
    >         print "\n\$lines[$i] = \n\n\"$lines[$i]\"\n";
    >
    > }
    >
    > __END__
    >
    > output:
    >
    > $lines[0] =
    >
    > "__begin {
    >      abc sss;
    >      def;
    >      {foo;bar}
    >   } __end"
    >
    > $lines[1] =
    >
    > "__begin {
    >      cde;
    >   } __end"
    >
    > $lines[2] =
    >
    > "abc kkk"
    >
    > $lines[3] =
    >
    > "bad fde"


    Wow, I had to admit your regex is simpler, easy to understand and
    elegant. I'll study what you said carefully. Anyway, thank you very
    much.
     
    bingfeng, Oct 21, 2008
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Andy B

    Split Menu into multiple lines?

    Andy B, May 6, 2008, in forum: ASP .Net
    Replies:
    0
    Views:
    376
    Andy B
    May 6, 2008
  2. kj
    Replies:
    5
    Views:
    554
    alex23
    Aug 1, 2008
  3. Al Cholic
    Replies:
    4
    Views:
    115
    Al Cholic
    Jul 2, 2007
  4. Sara
    Replies:
    6
    Views:
    281
    John W. Krahn
    Apr 12, 2004
  5. Cah Sableng
    Replies:
    0
    Views:
    254
    Cah Sableng
    Apr 23, 2007
Loading...

Share This Page