Text::ParseWords

Discussion in 'Perl Misc' started by ccc31807, Mar 30, 2010.

  1. ccc31807

    ccc31807 Guest

    See the script and output below. The problem is that DATA contains a
    single quote in the name O'Toole. Is there any way to get this to
    work? Or do I have to roll my own?

    Or (horrors) do I have to munge DATA to escape every single quote?

    Thanks, CC.

    ---------------------script---------------------
    use strict;
    use warnings;
    use Text::parseWords;

    while (<DATA>)
    {
    chomp;
    #my ($id, $first, $last, $csz) = split /,/;
    my ($id, $first, $last, $csz) = parse_line(',', 0, $_);
    #my ($id, $first, $last, $csz) = quotewords(',', 0, $_);
    ###my ($id, $first, $last, $csz) = shellwords(',', 1, $_); never works
    ###my ($id, $first, $last, $csz) = nested_quotewords(',', 1, $_);
    never works
    print "$id, $first, $last, $csz\n";
    }

    exit(0);

    __DATA__
    1234,John,Smith,"New York, NY"
    2345,Karl,Tomas,"Boston, MA"
    98765,Sean,O'Toole,"Dublin, Ireland"
    34567,Lewis,Uberville,"Nashville, TN"

    ---------------output---------------------------------

    D:\PerlLearn\ParseWords>perl test_1.plx
    1234, John, Smith, New York, NY
    2345, Karl, Tomas, Boston, MA
    Use of uninitialized value in concatenation (.) or string at
    test_1.plx line 13,
    <DATA> line 3.
    Use of uninitialized value in concatenation (.) or string at
    test_1.plx line 13,
    <DATA> line 3.
    Use of uninitialized value in concatenation (.) or string at
    test_1.plx line 13,
    <DATA> line 3.
    Use of uninitialized value in concatenation (.) or string at
    test_1.plx line 13,
    <DATA> line 3.
    , , ,
    34567, Lewis, Uberville, Nashville, TN
     
    ccc31807, Mar 30, 2010
    #1
    1. Advertising

  2. ccc31807 <> wrote:
    >See the script and output below. The problem is that DATA contains a
    >single quote in the name O'Toole. Is there any way to get this to
    >work? Or do I have to roll my own?
    >
    >Or (horrors) do I have to munge DATA to escape every single quote?
    >
    >Thanks, CC.
    >
    >---------------------script---------------------
    >use Text::parseWords;

    [...]
    >
    >__DATA__
    >1234,John,Smith,"New York, NY"
    >2345,Karl,Tomas,"Boston, MA"
    >98765,Sean,O'Toole,"Dublin, Ireland"
    >34567,Lewis,Uberville,"Nashville, TN"


    This looks like a standard CSV format. Is there a specific reason why
    you are not using one of the existing CSV modules to parse this data?

    jue
     
    Jürgen Exner, Mar 30, 2010
    #2
    1. Advertising

  3. ccc31807

    ccc31807 Guest

    On Mar 30, 10:48 am, Jürgen Exner <> wrote:
    > This looks like a standard CSV format. Is there a specific reason why
    > you are not using one of the existing CSV modules to parse this data?


    This runs on a server that isn't mine. I provided the script, and the
    user who runs the script noticed the error (and it is an error). I am
    constrained by the Perl distribution on this particular machine, which
    is ActiveState 5.8.something which includes Text::parseWords.

    In desperation I had done what Tad suggested, substituting the
    apostrophe for \\', but I thought it was a hack. It worked well
    enough, but I still don't like it, which is why I posted this morning.
    At least someone else thinks it's a viable solution, which is a small
    comfort.

    Thanks for the suggestions, Tad, Ben, and jue.

    CC.
     
    ccc31807, Mar 30, 2010
    #3
  4. ccc31807 <> wrote:
    >On Mar 30, 10:48 am, Jürgen Exner <> wrote:
    >> This looks like a standard CSV format. Is there a specific reason why
    >> you are not using one of the existing CSV modules to parse this data?

    >
    >This runs on a server that isn't mine. I provided the script, and the
    >user who runs the script noticed the error (and it is an error). I am
    >constrained by the Perl distribution on this particular machine, which
    >is ActiveState 5.8.something which includes Text::parseWords.


    Then I would (in this order)
    - try (with the help of that user) to persuade the admin of that machine
    to install the module
    - have that user install the module in his user space
    - ship the module together with my script to be copied into the same
    directory and loaded from there
    - include (at last the relevant portion of) the module verbatim as
    source code in my script

    jue
     
    Jürgen Exner, Mar 30, 2010
    #4
  5. ccc31807

    ccc31807 Guest

    On Mar 30, 12:02 pm, Jürgen Exner <> wrote:
    > Then I would (in this order)
    > - try (with the help of that user) to persuade the admin of that machine
    > to install the module


    I have discovered that, to the usual Windows admin, the command 'ppm'
    is as terrifying as the command 'brick_server'. ;-)

    > - have that user install the module in his user space


    I don't think that the user has privileges to install software, but
    this is a good idea.

    > - ship the module together with my script to be copied into the same
    > directory and loaded from there


    Good idea.

    > - include (at last the relevant portion of) the module verbatim as
    > source code in my script


    Also a good idea. I often try to do stuff the hard way, mostly as a
    learning exercise, and I have been known to shamelessly copy code from
    other people, including PM shipped with Perl. I've wondered about the
    ethics of this, but my conscience is eased by the facts that (1) I
    don't claim authorship, (2) I don't make commercial use of the
    software, and (3) the source is freely available for appropriate uses.
    Unfortunately, I find some of the code is above my present ability to
    understand (which is why I do this as a learning exercise, and yes, I
    do learn from it.)

    CC.
     
    ccc31807, Mar 30, 2010
    #5
  6. ccc31807

    John Bokma Guest

    ccc31807 <> writes:

    > On Mar 30, 12:02 pm, Jürgen Exner <> wrote:


    [ Missing Perl module ]

    >> - have that user install the module in his user space

    >
    > I don't think that the user has privileges to install software, but
    > this is a good idea.


    A user can *always* install a module in a directory he has access to.

    >> - ship the module together with my script to be copied into the same
    >> directory and loaded from there

    >
    > Good idea.
    >
    >> - include (at last the relevant portion of) the module verbatim as
    >> source code in my script

    >
    > Also a good idea.


    No. It's and option, but there is a reason why it's listed last.

    > I often try to do stuff the hard way, mostly as a
    > learning exercise, and I have been known to shamelessly copy code from
    > other people, including PM shipped with Perl. I've wondered about the
    > ethics of this, but my conscience is eased by the facts that (1) I
    > don't claim authorship, (2) I don't make commercial use of the
    > software, and (3) the source is freely available for appropriate uses.
    > Unfortunately, I find some of the code is above my present ability to
    > understand (which is why I do this as a learning exercise, and yes, I
    > do learn from it.)


    It's called cargo cult coding, at least that's how it sounds. While it's
    not bad to copy a piece of code verbatim out of a context that you can't
    use directly at least make sure you understand what it's doing.

    --
    John Bokma j3b

    Hacking & Hiking in Mexico - http://johnbokma.com/
    http://castleamber.com/ - Perl & Python Development
     
    John Bokma, Mar 30, 2010
    #6
  7. ccc31807

    Guest

    On Tue, 30 Mar 2010 09:47:24 -0500, Tad McClellan <> wrote:

    >ccc31807 <> wrote:
    >> See the script and output below. The problem is that DATA contains a
    >> single quote in the name O'Toole. Is there any way to get this to
    >> work? Or do I have to roll my own?
    >>
    >> Or (horrors) do I have to munge DATA to escape every single quote?

    >
    >
    >Where is the horror in that?
    >
    >
    >> ---------------------script---------------------
    >> use strict;
    >> use warnings;
    >> use Text::parseWords;
    >>
    >> while (<DATA>)
    >> {
    >> chomp;
    >> #my ($id, $first, $last, $csz) = split /,/;

    >
    >
    > s/'/\\'/g; # that doesn't seem horrible to me...
    > # unless you have single-quoted 'strings' in DATA

    ^^^^
    98765,Sean,O'Toole,"O'Dublin, Ireland"

    Thats a big restriction there, hardly a workaround solution.

    Its too bad though, with a little extra work,
    they could have got it right.

    -sln

    ========================
    Output:

    c:\temp>perl parse_line.pl

    1234, John, Smith, "New York, NY"
    2345, Karl, Tomas, "Boston, MA"
    98765, Sean, O'Toole, "Dublin, Ireland"
    34567, Lewis, Uberville, "Nashville, TN"

    c:\temp>

    ## parse_line.pl
    ##
    use strict;
    use warnings;

    my $PERL_SINGLE_QUOTE = 0;

    use strict;
    use warnings;
    #use Text::parseWords;

    print "\n";
    while (<DATA>)
    {
    chomp;
    my ($id, $first, $last, $csz) = parse_line(',', 1, $_);
    print "$id, $first, $last, $csz\n";
    }

    exit(0);


    ## -----------------------------------------
    ## sub parse_line()
    ## Copyright @ 4/30/2010, by sln
    ## All rights reserved
    ## -----------------------------------------
    sub parse_line {
    my($delimiter, $keep, $line) = @_;
    my($word, @pieces);

    no warnings 'uninitialized'; # we will be testing undef strings

    while (length($line)) {
    # This pattern is optimised to be stack conservative on older perls.
    # Do not refactor without being careful and testing it on very long strings.
    # See Perl bug #42980 for an example of a stack busting input.
    $line =~ s/^
    (?:
    (?:
    # double quoted string
    (") # $quote
    ((?>[^\\"]*(?:\\.[^\\"]*)*))" # $quoted
    | # --OR--
    # singe quoted string
    (') # $quote
    ((?>[^\\']*(?:\\.[^\\']*)*))' # $quoted
    | # --OR--
    # unquoted string
    ( # $unquoted
    (?:\\.|[^\\"'])*?
    )
    # followed by
    ( # $delim
    \Z(?!\n) # EOL
    | # --OR--
    (?-x:$delimiter) # delimiter
    | # --OR--
    (?!^)(?=["']) # a quote
    )
    )
    | # --OR--
    (['"]) # $unquoted quote
    )
    //xs or return; # extended layout
    my ($quote, $quoted, $unquoted, $delim) = (($1 ? ($1,$2) : ($3,$4)), ($5 ? $5 : $7), $6);


    return() unless( defined($quote) || length($unquoted) || length($delim));

    if ($keep) {
    $quoted = "$quote$quoted$quote";
    }
    else {
    $unquoted =~ s/\\(.)/$1/sg;
    if (defined $quote) {
    $quoted =~ s/\\(.)/$1/sg if ($quote eq '"');
    $quoted =~ s/\\([\\'])/$1/g if ( $PERL_SINGLE_QUOTE && $quote eq "'");
    }
    }
    $word .= substr($line, 0, 0); # leave results tainted
    $word .= defined $quote ? $quoted : $unquoted;

    if (length($delim)) {
    push(@pieces, $word);
    push(@pieces, $delim) if ($keep eq 'delimiters');
    undef $word;
    }
    if (!length($line)) {
    push(@pieces, $word);
    }
    }
    return(@pieces);
    }

    __DATA__
    1234,John,Smith,"New York, NY"
    2345,Karl,Tomas,"Boston, MA"
    98765,Sean,O'Toole,"Dublin, Ireland"
    34567,Lewis,Uberville,"Nashville, TN"
     
    , Mar 30, 2010
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. leo
    Replies:
    1
    Views:
    297
    Bob Lehmann
    Dec 5, 2005
  2. Keith A. Clay

    issue with Text::ParseWords

    Keith A. Clay, Jun 23, 2005, in forum: Perl Misc
    Replies:
    0
    Views:
    105
    Keith A. Clay
    Jun 23, 2005
  3. tsotsi

    Text::ParseWords::parse_line bug?

    tsotsi, Jul 28, 2006, in forum: Perl Misc
    Replies:
    2
    Views:
    202
    -berlin.de
    Jul 30, 2006
  4. howa

    Text::ParseWords

    howa, Nov 11, 2008, in forum: Perl Misc
    Replies:
    1
    Views:
    92
    Darren Dunham
    Nov 14, 2008
  5. ccc31807

    understanding regexp, Text::ParseWords

    ccc31807, Nov 5, 2010, in forum: Perl Misc
    Replies:
    2
    Views:
    162
    ccc31807
    Nov 5, 2010
Loading...

Share This Page