understanding regexp, Text::ParseWords

Discussion in 'Perl Misc' started by ccc31807, Nov 5, 2010.

  1. ccc31807

    ccc31807 Guest

    This is copied from Text::parseWords. It appears in the function
    parse_line(delimiter, boolean, string). I understand most of this, but
    need some help understanding some if it. This appears in a loop:
    while (length($line)) {
    and parses a line with this call:
    my ($f, $m, $l) = parse_line(/,/, 0, $line)
    where line will be like this:
    "Barack","Hussein","Obama"
    I have numbered the lines for reference.

    <quote>
    # This pattern is optimised to be stack conservative on older perls.
    # Do not refactor without being careful and testing it on very long
    strings.
    # See Perl bug #42980 for an example of a stack busting input.
    1 $line =~ s/^
    2 (?:
    # double quoted string
    3 (") # $quote
    4 ((?>[^\\"]*(?:\\.[^\\"]*)*))" # $quoted
    5 | # --OR--
    # singe quoted string
    6 (') # $quote
    7 ((?>[^\\']*(?:\\.[^\\']*)*))' # $quoted
    8 | # --OR--
    # unquoted string
    9 ( # $unquoted
    10 (?:\\.|[^\\"'])*?
    11 )
    # followed by
    12 ( # $delim
    13 \Z(?!\n) # EOL
    14 | # --OR--
    15 (?-x:$delimiter) # delimiter
    16 | # --OR--
    17 (?!^)(?=["']) # a quote
    18 )
    )//xs or return; # extended layout
    my ($quote, $quoted, $unquoted, $delim) = (($1 ? ($1,$2) : ($3,$4)),
    $5, $6);
    </quote>

    Thanks, CC.
     
    ccc31807, Nov 5, 2010
    #1
    1. Advertising

  2. ccc31807

    Guest

    On Fri, 5 Nov 2010 07:41:10 -0700 (PDT), ccc31807 <> wrote:

    >This is copied from Text::parseWords. It appears in the function
    >parse_line(delimiter, boolean, string). I understand most of this, but
    >need some help understanding some if it. This appears in a loop:
    > while (length($line)) {
    >and parses a line with this call:
    > my ($f, $m, $l) = parse_line(/,/, 0, $line)
    >where line will be like this:
    > "Barack","Hussein","Obama"
    >I have numbered the lines for reference.
    >


    What is it you want to understand about it?
    Its basically 3 sections that peels off chunks of the line into some
    apparent quoted/unquoted, delimited/undelimited order.

    -sln

    -------------------------
    use strict;
    #use warnings;

    my @lines = (
    q{ "Barack", "Hussein", "Obama" },
    q{ "Bar'a'ck", "test", hello, "Hussein", 'Obama" },
    q{ 'Bar'a'ck", "test", hello, "Hussein", 'Obama" },
    );

    my $delimiter = ',';
    print "\n";

    for my $line (@lines) {
    print "** start line = [$line]\n\n";
    while (length($line)) {

    $line =~ s/^
    (?:
    # double quoted string
    (") # $quote
    ((?>[^\\"]*(?:\\.[^\\"]*)*))" # $quoted
    | # --OR--
    # singe quoted string
    (') # $quote
    ((?>[^\\']*(?:\\.[^\\']*)*))' # $quoted
    | # --OR--
    # unquoted string
    ( # $unquoted
    (?:\\.|[^\\"'])*?
    )
    # followed by
    ( # $delim
    \Z(?!\n) # EOL
    | # --OR--
    (?-x:$delimiter) # delimiter
    | # --OR--
    (?!^)(?=["']) # a quote
    )
    )//xs or last; # extended layout

    my ($quote, $quoted, $unquoted, $delim) = (($1 ? ($1,$2) : ($3,$4)), $5, $6);
    print "quote= <$quote> quoted= <$quoted> unquoted= <$unquoted> delim= <$delim>\n";
    print " <$line>\n";
    }
    print "end line = [$line]\n",'-'x20,"\n\n";
    }

    __END__
    Output:

    ** start line = [ "Barack", "Hussein", "Obama" ]

    quote= <> quoted= <> unquoted= < > delim= <>
    <"Barack", "Hussein", "Obama" >
    quote= <"> quoted= <Barack> unquoted= <> delim= <>
    <, "Hussein", "Obama" >
    quote= <> quoted= <> unquoted= <> delim= <,>
    < "Hussein", "Obama" >
    quote= <> quoted= <> unquoted= < > delim= <>
    <"Hussein", "Obama" >
    quote= <"> quoted= <Hussein> unquoted= <> delim= <>
    <, "Obama" >
    quote= <> quoted= <> unquoted= <> delim= <,>
    < "Obama" >
    quote= <> quoted= <> unquoted= < > delim= <>
    <"Obama" >
    quote= <"> quoted= <Obama> unquoted= <> delim= <>
    < >
    quote= <> quoted= <> unquoted= < > delim= <>
    <>
    end line = []
    --------------------

    ** start line = [ "Bar'a'ck", "test", hello, "Hussein", 'Obama" ]

    quote= <> quoted= <> unquoted= < > delim= <>
    <"Bar'a'ck", "test", hello, "Hussein", 'Obama" >
    quote= <"> quoted= <Bar'a'ck> unquoted= <> delim= <>
    <, "test", hello, "Hussein", 'Obama" >
    quote= <> quoted= <> unquoted= <> delim= <,>
    < "test", hello, "Hussein", 'Obama" >
    quote= <> quoted= <> unquoted= < > delim= <>
    <"test", hello, "Hussein", 'Obama" >
    quote= <"> quoted= <test> unquoted= <> delim= <>
    <, hello, "Hussein", 'Obama" >
    quote= <> quoted= <> unquoted= <> delim= <,>
    < hello, "Hussein", 'Obama" >
    quote= <> quoted= <> unquoted= < hello> delim= <,>
    < "Hussein", 'Obama" >
    quote= <> quoted= <> unquoted= < > delim= <>
    <"Hussein", 'Obama" >
    quote= <"> quoted= <Hussein> unquoted= <> delim= <>
    <, 'Obama" >
    quote= <> quoted= <> unquoted= <> delim= <,>
    < 'Obama" >
    quote= <> quoted= <> unquoted= < > delim= <>
    <'Obama" >
    end line = ['Obama" ]
    --------------------

    ** start line = [ 'Bar'a'ck", "test", hello, "Hussein", 'Obama" ]

    quote= <> quoted= <> unquoted= < > delim= <>
    <'Bar'a'ck", "test", hello, "Hussein", 'Obama" >
    quote= <'> quoted= <Bar> unquoted= <> delim= <>
    <a'ck", "test", hello, "Hussein", 'Obama" >
    quote= <> quoted= <> unquoted= <a> delim= <>
    <'ck", "test", hello, "Hussein", 'Obama" >
    quote= <'> quoted= <ck", "test", hello, "Hussein", > unquoted= <> delim= <>
    <Obama" >
    quote= <> quoted= <> unquoted= <Obama> delim= <>
    <" >
    end line = [" ]
    --------------------
     
    , Nov 5, 2010
    #2
    1. Advertising

  3. ccc31807

    ccc31807 Guest

    On Nov 5, 1:57 pm, wrote:
    > What is it you want to understand about it?


    Line 2 -- the (?: construct
    Lines 4, 7, 10 -- same thing
    Line 13 -- \Z(?!\n)
    Line 15 -- (?-x:$delimiter)
    $delimiter would be the COMMA character
    Line 17 -- (?!^)(?=["'])
    the ["'] means either one quote or one double-quote

    Thanks, CC.
     
    ccc31807, Nov 5, 2010
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Joao Silva
    Replies:
    16
    Views:
    409
    7stud --
    Aug 21, 2009
  2. Keith A. Clay

    issue with Text::ParseWords

    Keith A. Clay, Jun 23, 2005, in forum: Perl Misc
    Replies:
    0
    Views:
    110
    Keith A. Clay
    Jun 23, 2005
  3. tsotsi

    Text::ParseWords::parse_line bug?

    tsotsi, Jul 28, 2006, in forum: Perl Misc
    Replies:
    2
    Views:
    214
    -berlin.de
    Jul 30, 2006
  4. howa

    Text::ParseWords

    howa, Nov 11, 2008, in forum: Perl Misc
    Replies:
    1
    Views:
    99
    Darren Dunham
    Nov 14, 2008
  5. ccc31807

    Text::ParseWords

    ccc31807, Mar 30, 2010, in forum: Perl Misc
    Replies:
    6
    Views:
    236
Loading...

Share This Page