Question about Damian Conway's "Perl Best Practices"

U

usenet

In "Perl Best Practices," (great book, BTW!) Damian Conway recommends:

He supports his recommendation (in part) with this narrative:
Using split to extract variable-width fields is efficient and easy,
provided those fields _really_are_ always delimited by a simple
separator. More often, though... it becomes necessary to
extend the format rules to cope with human vagaries (such as
ignoring whitespace around commas) [and he goes on to discuss
how ugly the regexp can get as the requirements morph]

However, in the usage examples he provides, he doesn't show how the
Text::CSV_XS module helps simplify that particular vagarity (ignoring
whitespace around commas), and I don't see it discussed in the module's
perldocs.

I thought maybe I could tell the module that the quote_char could be a
whitespace, but this actually throws a runtime error.

Kindly consider this code (adapted from Damian's book); how would I
make efficient use of the Text::CSV_XS module to ignore the whitespace
around the second line of __DATA__, as Damian suggests I can do?

#!/usr/bin/perl
use strict; use warnings;
use Text::CSV_XS;

my $csv_format = Text::CSV_XS->new({
sep_char => q{:}, #fields are double-point delimited
#quote_char => q{ }, #whitespace throws runtime error!!!
});

while (my $record = <DATA>) {
$csv_format -> parse($record); #error-checking omitted

my ($user, $uid, $gid) = $csv_format -> fields();
print map {"'$_'\n"} ($user, $uid, $gid);
}

__DATA__
sshd:123:321::/var/empty:/usr/bin/ksh
apache : 789 : 987:: /home/apache: /usr/bin/ksh

Thanks!
 
D

damian

David said:
In "Perl Best Practices," (great book, BTW!)

Thank-you. :)

Damian Conway recommends:

He supports his recommendation (in part) with this narrative:
Using split to extract variable-width fields is efficient and easy,
provided those fields _really_are_ always delimited by a simple
separator. More often, though... it becomes necessary to
extend the format rules to cope with human vagaries (such as
ignoring whitespace around commas) [and he goes on to discuss
how ugly the regexp can get as the requirements morph]

However, in the usage examples he provides, he doesn't show how the
Text::CSV_XS module helps simplify that particular vagarity (ignoring
whitespace around commas), and I don't see it discussed in the module's
perldocs.

My apologies. I obviously wasn't clear enough in that recommendation. I
don't at any point claim that Text::CSV_XS supports optional whitespace
around commas, because as far as I know, it doesn't.

The argument I (attempt to ;-) put forward is actually:

* If you start with a regex, the temptation is to extend the regex
to handle ever-more-complex (and unmaintainable) field
specifications

* If you start with the CSV specification instead, there's no such
temptation and no need to maintain the parsing code

The line that (I had hoped) makes that distinction clear is in the
middle of page 159:

"As soon as your record format goes beyond a simple
separator...consider whether you can respecify your
data format and rewrite your code to use Text::CSV_XS..."

The implication being that you should abandon complex formats parsed
with
regexes, and just go with the standard CSV parsed with Text::CSV_XS.

Kindly consider this code (adapted from Damian's book); how would I
make efficient use of the Text::CSV_XS module to ignore the whitespace
around the second line of __DATA__, as Damian suggests I can do?

I don't suggest you can do that. Text::CSV_XS can't do that.
However, Text_CSV_XS-plus-regexes *can* do it very easily:

while (my $record = <DATA>) {
$csv_format -> parse($record); #error-checking omitted

use List::MoreUtils qw( apply );
my ($user, $uid, $gid)
= apply { s s/\A\s+ | \s+\z}{}gxms }
$csv_format->fields();

print map {"'$_'\n"} ($user, $uid, $gid);
}

Sorry for the confusion,

Damian
 
A

Anno Siegel

In "Perl Best Practices," (great book, BTW!) Damian Conway recommends:

He supports his recommendation (in part) with this narrative:
Using split to extract variable-width fields is efficient and easy,
provided those fields _really_are_ always delimited by a simple
separator. More often, though... it becomes necessary to
extend the format rules to cope with human vagaries (such as
ignoring whitespace around commas) [and he goes on to discuss
how ugly the regexp can get as the requirements morph]

However, in the usage examples he provides, he doesn't show how the
Text::CSV_XS module helps simplify that particular vagarity (ignoring
whitespace around commas), and I don't see it discussed in the module's
perldocs.

Hey, taking the newsgroup to task for Damian's careless promises?

I believe discussing spaces around commas is simply wrong in the context
of CSV. In CSV, a blank is a blank, whether or not it's adjacent to a
comma. There are many common formats where spaces around commas (and/or
other operators) are expected, but CSV is not one of them. The format
description in Text::CSV* (under CAVEATS) supports this.

Submit it to PBP's errata page[1]. I don't think there is a simple way
to make Text::CSV (of any provenance) comply.

Ignoring the unfortunate CSV context, the underlying tendency is
of course right: If there is a (reputable) module that parses some
format you have to parse, use it instead of making a homebrew.

[snip]

Anno

[1] (http://www.oreilly.com/cgi-bin/errata.form/perlbp).
 
D

damian

Anno said:
Hey, taking the newsgroup to task for Damian's careless promises?

Please read what I actually wrote in the book. It was not a careless
promise. It was not a promise of any kind.

And please don't submit it as an erratum. It isn't.

Damian
 
U

usenet

I don't suggest you can do that. Text::CSV_XS can't do that.
However, Text_CSV_XS-plus-regexes *can* do it very easily:
<[code_snippets:]>
use List::MoreUtils qw( apply );
my ($user, $uid, $gid)
= apply { s s/\A\s+ | \s+\z}{}gxms }
$csv_format->fields();

Thanks for clearing that up, Damian (ain't it great when you ask a
question about a book, and the _author_ shoots back a detailed reply?
And in under an hour? Try _that_ with C++)

FWIW, IMHO the discussion about "non-builtin builtins" (p.170-174) is
worth the price of the book alone. I cringe at how much time I've
wasted (and how much crap code I've probably written) by neglecting
modules such as List::MoreUtils (as used in Damian's solution above).

But, if I may ask another question...
= apply { s s/\A\s+ | \s+\z}{}gxms }

Err, that doesn't quite parse ( maybe s{\A\s+ | \s+\z}{}gxms ). Of
course, this is just a bit of throw-away code in a newsgroup, but if
writing a "real" program (using the book's guidelines) would it be
preferable to do it like this:

use Regexp::Common qw /whitespace/;
and then...
apply { s/$RE{ws}{crop}// } $csv_format->fields();
 
A

Anno Siegel

Please read what I actually wrote in the book. It was not a careless
promise. It was not a promise of any kind.

Sorry. You probably didn't see the smiley I didn't write.

Under the heading "Use Text::CSV_XS to extract complex variable-width fields"
the first problem you introduce is that of blanks surrounding the separator.
The reader may easily conclude that the module solves that problem. It
takes a rather careful reading to see that this is not actually in the text.
And please don't submit it as an erratum. It isn't.

Okay.

Anno
 
D

Dr.Ruud

Damian:
Anno Siegel:

Please read what I actually wrote in the book. It was not a careless
promise. It was not a promise of any kind.

Heheh, now you too are reading something in there that isn't supposed to
be in there. How I love language and its games.

And please don't submit it as an erratum. It isn't.

Or (to someone else) not anymore, with the (improved)
"Text_CSV_XS-plus-regexes" example around.
 
D

damian

David said:
if I may ask another question...


Err, that doesn't quite parse ( maybe s{\A\s+ | \s+\z}{}gxms ).

Yes, indeed. I dashed that code off just a little *too* fast. ;-)

Damian
 
D

damian

Anno said:
Sorry. You probably didn't see the smiley I didn't write.

And I apologize for snapping at you.

Under the heading "Use Text::CSV_XS to extract complex variable-width fields"
the first problem you introduce is that of blanks surrounding the separator.
The reader may easily conclude that the module solves that problem. It
takes a rather careful reading to see that this is not actually in the text.

This, I readily concede.

It's an occupational hazard for a writer: you can never tell (until the
book's in print and widely distributed, when it's too late) which parts
are going to require excessively careful reading. Even when you have 27
people review the manuscript (as we did with PBP) you can never get the
readability perfect, since 27 is a very poor approximation for the
eventual tens of thousands of readers. Especially since those 27 *were*
deliberately reading it excessively carefully. ;-)

Damian
 
A

Anno Siegel

And I apologize for snapping at you.

It was a flippant remark, and I didn't stop to think how it would look
to you as the author. I might have said it differently, or not at all,
if I had.

Apologies, hugs and smileys all around :)

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top