Text::ParseWords

C

ccc31807

See the script and output below. The problem is that DATA contains a
single quote in the name O'Toole. Is there any way to get this to
work? Or do I have to roll my own?

Or (horrors) do I have to munge DATA to escape every single quote?

Thanks, CC.

---------------------script---------------------
use strict;
use warnings;
use Text::parseWords;

while (<DATA>)
{
chomp;
#my ($id, $first, $last, $csz) = split /,/;
my ($id, $first, $last, $csz) = parse_line(',', 0, $_);
#my ($id, $first, $last, $csz) = quotewords(',', 0, $_);
###my ($id, $first, $last, $csz) = shellwords(',', 1, $_); never works
###my ($id, $first, $last, $csz) = nested_quotewords(',', 1, $_);
never works
print "$id, $first, $last, $csz\n";
}

exit(0);

__DATA__
1234,John,Smith,"New York, NY"
2345,Karl,Tomas,"Boston, MA"
98765,Sean,O'Toole,"Dublin, Ireland"
34567,Lewis,Uberville,"Nashville, TN"

---------------output---------------------------------

D:\PerlLearn\ParseWords>perl test_1.plx
1234, John, Smith, New York, NY
2345, Karl, Tomas, Boston, MA
Use of uninitialized value in concatenation (.) or string at
test_1.plx line 13,
<DATA> line 3.
Use of uninitialized value in concatenation (.) or string at
test_1.plx line 13,
<DATA> line 3.
Use of uninitialized value in concatenation (.) or string at
test_1.plx line 13,
<DATA> line 3.
Use of uninitialized value in concatenation (.) or string at
test_1.plx line 13,
<DATA> line 3.
, , ,
34567, Lewis, Uberville, Nashville, TN
 
J

Jürgen Exner

ccc31807 said:
See the script and output below. The problem is that DATA contains a
single quote in the name O'Toole. Is there any way to get this to
work? Or do I have to roll my own?

Or (horrors) do I have to munge DATA to escape every single quote?

Thanks, CC.

---------------------script---------------------
use Text::parseWords; [...]

__DATA__
1234,John,Smith,"New York, NY"
2345,Karl,Tomas,"Boston, MA"
98765,Sean,O'Toole,"Dublin, Ireland"
34567,Lewis,Uberville,"Nashville, TN"

This looks like a standard CSV format. Is there a specific reason why
you are not using one of the existing CSV modules to parse this data?

jue
 
C

ccc31807

This looks like a standard CSV format. Is there a specific reason why
you are not using one of the existing CSV modules to parse this data?

This runs on a server that isn't mine. I provided the script, and the
user who runs the script noticed the error (and it is an error). I am
constrained by the Perl distribution on this particular machine, which
is ActiveState 5.8.something which includes Text::parseWords.

In desperation I had done what Tad suggested, substituting the
apostrophe for \\', but I thought it was a hack. It worked well
enough, but I still don't like it, which is why I posted this morning.
At least someone else thinks it's a viable solution, which is a small
comfort.

Thanks for the suggestions, Tad, Ben, and jue.

CC.
 
J

Jürgen Exner

ccc31807 said:
This runs on a server that isn't mine. I provided the script, and the
user who runs the script noticed the error (and it is an error). I am
constrained by the Perl distribution on this particular machine, which
is ActiveState 5.8.something which includes Text::parseWords.

Then I would (in this order)
- try (with the help of that user) to persuade the admin of that machine
to install the module
- have that user install the module in his user space
- ship the module together with my script to be copied into the same
directory and loaded from there
- include (at last the relevant portion of) the module verbatim as
source code in my script

jue
 
C

ccc31807

Then I would (in this order)
- try (with the help of that user) to persuade the admin of that machine
to install the module

I have discovered that, to the usual Windows admin, the command 'ppm'
is as terrifying as the command 'brick_server'. ;-)
- have that user install the module in his user space

I don't think that the user has privileges to install software, but
this is a good idea.
- ship the module together with my script to be copied into the same
directory and loaded from there

Good idea.
- include (at last the relevant portion of) the module verbatim as
source code in my script

Also a good idea. I often try to do stuff the hard way, mostly as a
learning exercise, and I have been known to shamelessly copy code from
other people, including PM shipped with Perl. I've wondered about the
ethics of this, but my conscience is eased by the facts that (1) I
don't claim authorship, (2) I don't make commercial use of the
software, and (3) the source is freely available for appropriate uses.
Unfortunately, I find some of the code is above my present ability to
understand (which is why I do this as a learning exercise, and yes, I
do learn from it.)

CC.
 
J

John Bokma

ccc31807 said:
On Mar 30, 12:02 pm, Jürgen Exner <[email protected]> wrote:

[ Missing Perl module ]
I don't think that the user has privileges to install software, but
this is a good idea.

A user can *always* install a module in a directory he has access to.
Good idea.


Also a good idea.

No. It's and option, but there is a reason why it's listed last.
I often try to do stuff the hard way, mostly as a
learning exercise, and I have been known to shamelessly copy code from
other people, including PM shipped with Perl. I've wondered about the
ethics of this, but my conscience is eased by the facts that (1) I
don't claim authorship, (2) I don't make commercial use of the
software, and (3) the source is freely available for appropriate uses.
Unfortunately, I find some of the code is above my present ability to
understand (which is why I do this as a learning exercise, and yes, I
do learn from it.)

It's called cargo cult coding, at least that's how it sounds. While it's
not bad to copy a piece of code verbatim out of a context that you can't
use directly at least make sure you understand what it's doing.
 
S

sln

Where is the horror in that?




s/'/\\'/g; # that doesn't seem horrible to me...
# unless you have single-quoted 'strings' in DATA
^^^^
98765,Sean,O'Toole,"O'Dublin, Ireland"

Thats a big restriction there, hardly a workaround solution.

Its too bad though, with a little extra work,
they could have got it right.

-sln

========================
Output:

c:\temp>perl parse_line.pl

1234, John, Smith, "New York, NY"
2345, Karl, Tomas, "Boston, MA"
98765, Sean, O'Toole, "Dublin, Ireland"
34567, Lewis, Uberville, "Nashville, TN"

c:\temp>

## parse_line.pl
##
use strict;
use warnings;

my $PERL_SINGLE_QUOTE = 0;

use strict;
use warnings;
#use Text::parseWords;

print "\n";
while (<DATA>)
{
chomp;
my ($id, $first, $last, $csz) = parse_line(',', 1, $_);
print "$id, $first, $last, $csz\n";
}

exit(0);


## -----------------------------------------
## sub parse_line()
## Copyright @ 4/30/2010, by sln
## All rights reserved
## -----------------------------------------
sub parse_line {
my($delimiter, $keep, $line) = @_;
my($word, @pieces);

no warnings 'uninitialized'; # we will be testing undef strings

while (length($line)) {
# This pattern is optimised to be stack conservative on older perls.
# Do not refactor without being careful and testing it on very long strings.
# See Perl bug #42980 for an example of a stack busting input.
$line =~ s/^
(?:
(?:
# double quoted string
(") # $quote
((?>[^\\"]*(?:\\.[^\\"]*)*))" # $quoted
| # --OR--
# singe quoted string
(') # $quote
((?>[^\\']*(?:\\.[^\\']*)*))' # $quoted
| # --OR--
# unquoted string
( # $unquoted
(?:\\.|[^\\"'])*?
)
# followed by
( # $delim
\Z(?!\n) # EOL
| # --OR--
(?-x:$delimiter) # delimiter
| # --OR--
(?!^)(?=["']) # a quote
)
)
| # --OR--
(['"]) # $unquoted quote
)
//xs or return; # extended layout
my ($quote, $quoted, $unquoted, $delim) = (($1 ? ($1,$2) : ($3,$4)), ($5 ? $5 : $7), $6);


return() unless( defined($quote) || length($unquoted) || length($delim));

if ($keep) {
$quoted = "$quote$quoted$quote";
}
else {
$unquoted =~ s/\\(.)/$1/sg;
if (defined $quote) {
$quoted =~ s/\\(.)/$1/sg if ($quote eq '"');
$quoted =~ s/\\([\\'])/$1/g if ( $PERL_SINGLE_QUOTE && $quote eq "'");
}
}
$word .= substr($line, 0, 0); # leave results tainted
$word .= defined $quote ? $quoted : $unquoted;

if (length($delim)) {
push(@pieces, $word);
push(@pieces, $delim) if ($keep eq 'delimiters');
undef $word;
}
if (!length($line)) {
push(@pieces, $word);
}
}
return(@pieces);
}

__DATA__
1234,John,Smith,"New York, NY"
2345,Karl,Tomas,"Boston, MA"
98765,Sean,O'Toole,"Dublin, Ireland"
34567,Lewis,Uberville,"Nashville, TN"
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top