Question about Damian Conway's "Perl Best Practices"

usenet · Jan 17, 2006

In "Perl Best Practices," (great book, BTW!) Damian Conway recommends:

He supports his recommendation (in part) with this narrative:

Using split to extract variable-width fields is efficient and easy,
provided those fields _really_are_ always delimited by a simple
separator. More often, though... it becomes necessary to
extend the format rules to cope with human vagaries (such as
ignoring whitespace around commas) [and he goes on to discuss
how ugly the regexp can get as the requirements morph]

Click to expand...

Click to expand...

However, in the usage examples he provides, he doesn't show how the
Text::CSV_XS module helps simplify that particular vagarity (ignoring
whitespace around commas), and I don't see it discussed in the module's
perldocs.

I thought maybe I could tell the module that the quote_char could be a
whitespace, but this actually throws a runtime error.

Kindly consider this code (adapted from Damian's book); how would I
make efficient use of the Text::CSV_XS module to ignore the whitespace
around the second line of __DATA__, as Damian suggests I can do?

#!/usr/bin/perl
use strict; use warnings;
use Text::CSV_XS;

my $csv_format = Text::CSV_XS->new({
sep_char => q{:}, #fields are double-point delimited
#quote_char => q{ }, #whitespace throws runtime error!!!
});

while (my $record = <DATA>) {
$csv_format -> parse($record); #error-checking omitted

my ($user, $uid, $gid) = $csv_format -> fields();
print map {"'$_'\n"} ($user, $uid, $gid);
}

__DATA__
sshd:123:321::/var/empty:/usr/bin/ksh
apache : 789 : 987:: /home/apache: /usr/bin/ksh

Thanks!

damian · Jan 17, 2006

David said:
In "Perl Best Practices," (great book, BTW!)

Thank-you.

Damian Conway recommends:

He supports his recommendation (in part) with this narrative:

Using split to extract variable-width fields is efficient and easy,
provided those fields _really_are_ always delimited by a simple
separator. More often, though... it becomes necessary to
extend the format rules to cope with human vagaries (such as
ignoring whitespace around commas) [and he goes on to discuss
how ugly the regexp can get as the requirements morph]

Click to expand...

Click to expand...

However, in the usage examples he provides, he doesn't show how the
Text::CSV_XS module helps simplify that particular vagarity (ignoring
whitespace around commas), and I don't see it discussed in the module's
perldocs.

My apologies. I obviously wasn't clear enough in that recommendation. I
don't at any point claim that Text::CSV_XS supports optional whitespace
around commas, because as far as I know, it doesn't.

The argument I (attempt to ;-) put forward is actually:

* If you start with a regex, the temptation is to extend the regex
to handle ever-more-complex (and unmaintainable) field
specifications

* If you start with the CSV specification instead, there's no such
temptation and no need to maintain the parsing code

The line that (I had hoped) makes that distinction clear is in the
middle of page 159:

"As soon as your record format goes beyond a simple
separator...consider whether you can respecify your
data format and rewrite your code to use Text::CSV_XS..."

The implication being that you should abandon complex formats parsed
with
regexes, and just go with the standard CSV parsed with Text::CSV_XS.

Kindly consider this code (adapted from Damian's book); how would I
make efficient use of the Text::CSV_XS module to ignore the whitespace
around the second line of __DATA__, as Damian suggests I can do?

I don't suggest you can do that. Text::CSV_XS can't do that.
However, Text_CSV_XS-plus-regexes *can* do it very easily:

while (my $record = <DATA>) {
$csv_format -> parse($record); #error-checking omitted

use List::MoreUtils qw( apply );
my ($user, $uid, $gid)
= apply { s s/\A\s+ | \s+\z}{}gxms }
$csv_format->fields();

print map {"'$_'\n"} ($user, $uid, $gid);
}

Sorry for the confusion,

Damian

Anno Siegel · Jan 17, 2006

In "Perl Best Practices," (great book, BTW!) Damian Conway recommends:

He supports his recommendation (in part) with this narrative:

Using split to extract variable-width fields is efficient and easy,
provided those fields _really_are_ always delimited by a simple
separator. More often, though... it becomes necessary to
extend the format rules to cope with human vagaries (such as
ignoring whitespace around commas) [and he goes on to discuss
how ugly the regexp can get as the requirements morph]

Click to expand...

Click to expand...

However, in the usage examples he provides, he doesn't show how the
Text::CSV_XS module helps simplify that particular vagarity (ignoring
whitespace around commas), and I don't see it discussed in the module's
perldocs.

Hey, taking the newsgroup to task for Damian's careless promises?

I believe discussing spaces around commas is simply wrong in the context
of CSV. In CSV, a blank is a blank, whether or not it's adjacent to a
comma. There are many common formats where spaces around commas (and/or
other operators) are expected, but CSV is not one of them. The format
description in Text::CSV* (under CAVEATS) supports this.

Submit it to PBP's errata page[1]. I don't think there is a simple way
to make Text::CSV (of any provenance) comply.

Ignoring the unfortunate CSV context, the underlying tendency is
of course right: If there is a (reputable) module that parses some
format you have to parse, use it instead of making a homebrew.

[snip]

Anno

[1] (http://www.oreilly.com/cgi-bin/errata.form/perlbp).

damian · Jan 17, 2006

Anno said:
Hey, taking the newsgroup to task for Damian's careless promises?

Please read what I actually wrote in the book. It was not a careless
promise. It was not a promise of any kind.

And please don't submit it as an erratum. It isn't.

Damian

usenet · Jan 17, 2006

I don't suggest you can do that. Text::CSV_XS can't do that.
However, Text_CSV_XS-plus-regexes *can* do it very easily:
<[code_snippets:]>
use List::MoreUtils qw( apply );

my ($user, $uid, $gid)
= apply { s s/\A\s+ | \s+\z}{}gxms }
$csv_format->fields();

Thanks for clearing that up, Damian (ain't it great when you ask a
question about a book, and the _author_ shoots back a detailed reply?
And in under an hour? Try _that_ with C++)

FWIW, IMHO the discussion about "non-builtin builtins" (p.170-174) is
worth the price of the book alone. I cringe at how much time I've
wasted (and how much crap code I've probably written) by neglecting
modules such as List::MoreUtils (as used in Damian's solution above).

But, if I may ask another question...

= apply { s s/\A\s+ | \s+\z}{}gxms }

Err, that doesn't quite parse ( maybe s{\A\s+ | \s+\z}{}gxms ). Of
course, this is just a bit of throw-away code in a newsgroup, but if
writing a "real" program (using the book's guidelines) would it be
preferable to do it like this:

use Regexp::Common qw /whitespace/;
and then...
apply { s/$RE{ws}{crop}// } $csv_format->fields();

Anno Siegel · Jan 18, 2006

Please read what I actually wrote in the book. It was not a careless
promise. It was not a promise of any kind.

Sorry. You probably didn't see the smiley I didn't write.

Under the heading "Use Text::CSV_XS to extract complex variable-width fields"
the first problem you introduce is that of blanks surrounding the separator.
The reader may easily conclude that the module solves that problem. It
takes a rather careful reading to see that this is not actually in the text.

And please don't submit it as an erratum. It isn't.

Okay.

Anno

Dr.Ruud · Jan 18, 2006

Damian:

Anno Siegel:

Please read what I actually wrote in the book. It was not a careless
promise. It was not a promise of any kind.

Heheh, now you too are reading something in there that isn't supposed to
be in there. How I love language and its games.

And please don't submit it as an erratum. It isn't.

Or (to someone else) not anymore, with the (improved)
"Text_CSV_XS-plus-regexes" example around.

damian · Jan 18, 2006

David said:
if I may ask another question...

Err, that doesn't quite parse ( maybe s{\A\s+ | \s+\z}{}gxms ).

Yes, indeed. I dashed that code off just a little *too* fast. ;-)

Damian

damian · Jan 18, 2006

Anno said:
Sorry. You probably didn't see the smiley I didn't write.

And I apologize for snapping at you.

Under the heading "Use Text::CSV_XS to extract complex variable-width fields"
the first problem you introduce is that of blanks surrounding the separator.
The reader may easily conclude that the module solves that problem. It
takes a rather careful reading to see that this is not actually in the text.

This, I readily concede.

It's an occupational hazard for a writer: you can never tell (until the
book's in print and widely distributed, when it's too late) which parts
are going to require excessively careful reading. Even when you have 27
people review the manuscript (as we did with PBP) you can never get the
readability perfect, since 27 is a very poor approximation for the
eventual tens of thousands of readers. Especially since those 27 *were*
deliberately reading it excessively carefully. ;-)

Damian

Anno Siegel · Jan 19, 2006

And I apologize for snapping at you.

It was a flippant remark, and I didn't stop to think how it would look
to you as the author. I might have said it differently, or not at all,
if I had.

Apologies, hugs and smileys all around

Anno

Question re Conway's "Perl Best Practices" - Sort::Maker	7	May 4, 2006
Using "Perl Best Practices" inside-out objects	15	May 15, 2007
FAQ 4.31 How can I split a [character] delimited string except when inside [character]?	0	Apr 13, 2011
How to read big XML files using Perl - Memory Problem	4	Nov 21, 2005
Question about email-handling modules	5	Dec 20, 2007
temp file creation in perl 5.0	2	May 9, 2007
Perl 5.8.x, Unicode and In-memory Filehandles	3	Mar 1, 2006
Win32::SAPI4 question (Win32 events and Perl)	0	Feb 28, 2004

Question about Damian Conway's "Perl Best Practices"

usenet

damian

Anno Siegel

damian

usenet

Anno Siegel

Dr.Ruud

damian

damian

Anno Siegel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads