Question about Damian Conway's "Perl Best Practices"

Discussion in 'Perl Misc' started by usenet@DavidFilmer.com, Jan 17, 2006.

  1. Guest

    In "Perl Best Practices," (great book, BTW!) Damian Conway recommends:

    >>> Use Text::CSV_XS to extract complex variable-width fields (p.158)


    He supports his recommendation (in part) with this narrative:

    >>> Using split to extract variable-width fields is efficient and easy,
    >>> provided those fields _really_are_ always delimited by a simple
    >>> separator. More often, though... it becomes necessary to
    >>> extend the format rules to cope with human vagaries (such as
    >>> ignoring whitespace around commas) [and he goes on to discuss
    >>> how ugly the regexp can get as the requirements morph]


    However, in the usage examples he provides, he doesn't show how the
    Text::CSV_XS module helps simplify that particular vagarity (ignoring
    whitespace around commas), and I don't see it discussed in the module's
    perldocs.

    I thought maybe I could tell the module that the quote_char could be a
    whitespace, but this actually throws a runtime error.

    Kindly consider this code (adapted from Damian's book); how would I
    make efficient use of the Text::CSV_XS module to ignore the whitespace
    around the second line of __DATA__, as Damian suggests I can do?

    #!/usr/bin/perl
    use strict; use warnings;
    use Text::CSV_XS;

    my $csv_format = Text::CSV_XS->new({
    sep_char => q{:}, #fields are double-point delimited
    #quote_char => q{ }, #whitespace throws runtime error!!!
    });

    while (my $record = <DATA>) {
    $csv_format -> parse($record); #error-checking omitted

    my ($user, $uid, $gid) = $csv_format -> fields();
    print map {"'$_'\n"} ($user, $uid, $gid);
    }

    __DATA__
    sshd:123:321::/var/empty:/usr/bin/ksh
    apache : 789 : 987:: /home/apache: /usr/bin/ksh

    Thanks!
    --
    http://DavidFilmer.com
     
    , Jan 17, 2006
    #1
    1. Advertising

  2. Guest

    David Filmer wrote:

    > In "Perl Best Practices," (great book, BTW!)


    Thank-you. :)


    > Damian Conway recommends:
    >
    > >>> Use Text::CSV_XS to extract complex variable-width fields (p.158)

    >
    > He supports his recommendation (in part) with this narrative:
    >
    > >>> Using split to extract variable-width fields is efficient and easy,
    > >>> provided those fields _really_are_ always delimited by a simple
    > >>> separator. More often, though... it becomes necessary to
    > >>> extend the format rules to cope with human vagaries (such as
    > >>> ignoring whitespace around commas) [and he goes on to discuss
    > >>> how ugly the regexp can get as the requirements morph]

    >
    > However, in the usage examples he provides, he doesn't show how the
    > Text::CSV_XS module helps simplify that particular vagarity (ignoring
    > whitespace around commas), and I don't see it discussed in the module's
    > perldocs.


    My apologies. I obviously wasn't clear enough in that recommendation. I
    don't at any point claim that Text::CSV_XS supports optional whitespace
    around commas, because as far as I know, it doesn't.

    The argument I (attempt to ;-) put forward is actually:

    * If you start with a regex, the temptation is to extend the regex
    to handle ever-more-complex (and unmaintainable) field
    specifications

    * If you start with the CSV specification instead, there's no such
    temptation and no need to maintain the parsing code

    The line that (I had hoped) makes that distinction clear is in the
    middle of page 159:

    "As soon as your record format goes beyond a simple
    separator...consider whether you can respecify your
    data format and rewrite your code to use Text::CSV_XS..."

    The implication being that you should abandon complex formats parsed
    with
    regexes, and just go with the standard CSV parsed with Text::CSV_XS.


    > Kindly consider this code (adapted from Damian's book); how would I
    > make efficient use of the Text::CSV_XS module to ignore the whitespace
    > around the second line of __DATA__, as Damian suggests I can do?


    I don't suggest you can do that. Text::CSV_XS can't do that.
    However, Text_CSV_XS-plus-regexes *can* do it very easily:

    while (my $record = <DATA>) {
    $csv_format -> parse($record); #error-checking omitted

    use List::MoreUtils qw( apply );
    my ($user, $uid, $gid)
    = apply { s s/\A\s+ | \s+\z}{}gxms }
    $csv_format->fields();

    print map {"'$_'\n"} ($user, $uid, $gid);
    }

    Sorry for the confusion,

    Damian
     
    , Jan 17, 2006
    #2
    1. Advertising

  3. Anno Siegel Guest

    <> wrote in comp.lang.perl.misc:
    > In "Perl Best Practices," (great book, BTW!) Damian Conway recommends:
    >
    > >>> Use Text::CSV_XS to extract complex variable-width fields (p.158)

    >
    > He supports his recommendation (in part) with this narrative:
    >
    > >>> Using split to extract variable-width fields is efficient and easy,
    > >>> provided those fields _really_are_ always delimited by a simple
    > >>> separator. More often, though... it becomes necessary to
    > >>> extend the format rules to cope with human vagaries (such as
    > >>> ignoring whitespace around commas) [and he goes on to discuss
    > >>> how ugly the regexp can get as the requirements morph]

    >
    > However, in the usage examples he provides, he doesn't show how the
    > Text::CSV_XS module helps simplify that particular vagarity (ignoring
    > whitespace around commas), and I don't see it discussed in the module's
    > perldocs.


    Hey, taking the newsgroup to task for Damian's careless promises?

    I believe discussing spaces around commas is simply wrong in the context
    of CSV. In CSV, a blank is a blank, whether or not it's adjacent to a
    comma. There are many common formats where spaces around commas (and/or
    other operators) are expected, but CSV is not one of them. The format
    description in Text::CSV* (under CAVEATS) supports this.

    Submit it to PBP's errata page[1]. I don't think there is a simple way
    to make Text::CSV (of any provenance) comply.

    Ignoring the unfortunate CSV context, the underlying tendency is
    of course right: If there is a (reputable) module that parses some
    format you have to parse, use it instead of making a homebrew.

    [snip]

    Anno

    [1] (http://www.oreilly.com/cgi-bin/errata.form/perlbp).
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Jan 17, 2006
    #3
  4. Guest

    Anno Siegel wrote:

    > Hey, taking the newsgroup to task for Damian's careless promises?


    Please read what I actually wrote in the book. It was not a careless
    promise. It was not a promise of any kind.

    And please don't submit it as an erratum. It isn't.

    Damian
     
    , Jan 17, 2006
    #4
  5. Guest

    wrote:
    > I don't suggest you can do that. Text::CSV_XS can't do that.
    > However, Text_CSV_XS-plus-regexes *can* do it very easily:
    ><[code_snippets:]>
    > use List::MoreUtils qw( apply );


    > my ($user, $uid, $gid)
    > = apply { s s/\A\s+ | \s+\z}{}gxms }
    > $csv_format->fields();


    Thanks for clearing that up, Damian (ain't it great when you ask a
    question about a book, and the _author_ shoots back a detailed reply?
    And in under an hour? Try _that_ with C++)

    FWIW, IMHO the discussion about "non-builtin builtins" (p.170-174) is
    worth the price of the book alone. I cringe at how much time I've
    wasted (and how much crap code I've probably written) by neglecting
    modules such as List::MoreUtils (as used in Damian's solution above).

    But, if I may ask another question...

    > = apply { s s/\A\s+ | \s+\z}{}gxms }


    Err, that doesn't quite parse ( maybe s{\A\s+ | \s+\z}{}gxms ). Of
    course, this is just a bit of throw-away code in a newsgroup, but if
    writing a "real" program (using the book's guidelines) would it be
    preferable to do it like this:

    use Regexp::Common qw /whitespace/;
    and then...
    apply { s/$RE{ws}{crop}// } $csv_format->fields();

    --
    http://DavidFilmer.com
     
    , Jan 17, 2006
    #5
  6. Anno Siegel Guest

    <> wrote in comp.lang.perl.misc:
    > Anno Siegel wrote:
    >
    > > Hey, taking the newsgroup to task for Damian's careless promises?

    >
    > Please read what I actually wrote in the book. It was not a careless
    > promise. It was not a promise of any kind.


    Sorry. You probably didn't see the smiley I didn't write.

    Under the heading "Use Text::CSV_XS to extract complex variable-width fields"
    the first problem you introduce is that of blanks surrounding the separator.
    The reader may easily conclude that the module solves that problem. It
    takes a rather careful reading to see that this is not actually in the text.

    > And please don't submit it as an erratum. It isn't.


    Okay.

    Anno

    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Jan 18, 2006
    #6
  7. Dr.Ruud Guest

    Damian:
    > Anno Siegel:


    >> Hey, taking the newsgroup to task for Damian's careless promises?

    >
    > Please read what I actually wrote in the book. It was not a careless
    > promise. It was not a promise of any kind.


    Heheh, now you too are reading something in there that isn't supposed to
    be in there. How I love language and its games.


    > And please don't submit it as an erratum. It isn't.


    Or (to someone else) not anymore, with the (improved)
    "Text_CSV_XS-plus-regexes" example around.

    --
    Affijn, Ruud

    "Gewoon is een tijger."
     
    Dr.Ruud, Jan 18, 2006
    #7
  8. Guest

    David Filmer wrote:

    > if I may ask another question...
    >
    > > = apply { s s/\A\s+ | \s+\z}{}gxms }

    >
    > Err, that doesn't quite parse ( maybe s{\A\s+ | \s+\z}{}gxms ).


    Yes, indeed. I dashed that code off just a little *too* fast. ;-)

    Damian
     
    , Jan 18, 2006
    #8
  9. Guest

    Anno Siegel wrote:

    > > Please read what I actually wrote in the book. It was not a careless
    > > promise. It was not a promise of any kind.

    >
    > Sorry. You probably didn't see the smiley I didn't write.


    And I apologize for snapping at you.


    > Under the heading "Use Text::CSV_XS to extract complex variable-width fields"
    > the first problem you introduce is that of blanks surrounding the separator.
    > The reader may easily conclude that the module solves that problem. It
    > takes a rather careful reading to see that this is not actually in the text.


    This, I readily concede.

    It's an occupational hazard for a writer: you can never tell (until the
    book's in print and widely distributed, when it's too late) which parts
    are going to require excessively careful reading. Even when you have 27
    people review the manuscript (as we did with PBP) you can never get the
    readability perfect, since 27 is a very poor approximation for the
    eventual tens of thousands of readers. Especially since those 27 *were*
    deliberately reading it excessively carefully. ;-)

    Damian
     
    , Jan 18, 2006
    #9
  10. Anno Siegel Guest

    <> wrote in comp.lang.perl.misc:
    > Anno Siegel wrote:
    >
    > > > Please read what I actually wrote in the book. It was not a careless
    > > > promise. It was not a promise of any kind.

    > >
    > > Sorry. You probably didn't see the smiley I didn't write.

    >
    > And I apologize for snapping at you.


    It was a flippant remark, and I didn't stop to think how it would look
    to you as the author. I might have said it differently, or not at all,
    if I had.

    Apologies, hugs and smileys all around :)

    Anno
    --
    If you want to post a followup via groups.google.com, don't use
    the broken "Reply" link at the bottom of the article. Click on
    "show options" at the top of the article, then click on the
    "Reply" at the bottom of the article headers.
     
    Anno Siegel, Jan 19, 2006
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Putty

    Conway's Life Implementation

    Putty, Aug 27, 2006, in forum: Python
    Replies:
    4
    Views:
    423
    Andrew Trevorrow
    Aug 28, 2006
  2. Uri Guttman

    boston perl classes with damian conway

    Uri Guttman, Sep 16, 2003, in forum: Perl Misc
    Replies:
    15
    Views:
    185
    Graham Drabble
    Oct 2, 2003
  3. Sunil Choudhary

    SourceCode : perl OOPS by Conway:

    Sunil Choudhary, Jan 11, 2004, in forum: Perl Misc
    Replies:
    5
    Views:
    126
    Matt Garrish
    Feb 7, 2004
  4. Replies:
    7
    Views:
    185
    Salvador Fandino
    May 5, 2006
  5. Uri Guttman

    Free Perl Training with Damian Conway

    Uri Guttman, Mar 25, 2008, in forum: Perl Misc
    Replies:
    1
    Views:
    104
    David Combs
    Apr 21, 2008
Loading...

Share This Page