CSV parsing by Paul Hsieh and Michael B. Allen?

B

Ben Pfaff

Michael B Allen said:
However, they're not doing anyone any good with this code (at least the
code in the link posted by websnarf [1]). In fact I think they're probably
doing more harm than good. Modern code must be reentrant or it's basically
useless. The cited example desperately needs an object to hold the state
of the parse. And for a CSV parser a state machine is clearly the only
way to go. The code may be good from a language lawyer perspective
but the overall organisation and design is just awful. For the sake of C
programmers everywhere I hope the rest of the book isn't anything like it.

[1] http://cm.bell-labs.com/cm/cs/tpop/csvgetline2.c

In the book, Kernighan and Pike go on to improve the final C
version they present (presumably what is at the URL above) into a
C++ implementation that encapsulates the operation of the library
into a class. They also pose an exercise (Exercise 4-8)
encouraging readers to write a C version of the improved C++
code.

I suppose I should have looked at the book more carefully before
suggesting that its code be used as a CSV parser. I wouldn't
recommend the book's final C version as a prepackaged CSV parser.
In my defense, I suspect that I was at home, not in my Stanford
office, when I posted the suggestion; my copy of the book is at
the office.
 
M

Michael B Allen

Michael B Allen said:
However, they're not doing anyone any good with this code (at least the
code in the link posted by websnarf [1]). In fact I think they're probably
doing more harm than good. Modern code must be reentrant or it's basically
useless. The cited example desperately needs an object to hold the state
of the parse. And for a CSV parser a state machine is clearly the only
way to go. The code may be good from a language lawyer perspective
but the overall organisation and design is just awful. For the sake of C
programmers everywhere I hope the rest of the book isn't anything like it.

[1] http://cm.bell-labs.com/cm/cs/tpop/csvgetline2.c

In the book, Kernighan and Pike go on to improve the final C
version they present (presumably what is at the URL above) into a
C++ implementation that encapsulates the operation of the library
into a class. They also pose an exercise (Exercise 4-8)
encouraging readers to write a C version of the improved C++
code.

Ok. Well that makes a little more sense but I still don't think there
should be any example like this in a new book from K&P. I think all code
listings should strive to be as close to production code as reasonably
possible. I realize authors have space constraints and they're trying
to be concise but in these cases they should have something in the
code listing caption that clearly indicates that the code is insecure,
contrived or not optimal in some way (e.g. "error handling has been
omitted for brevity").

Mike
 
C

Chris Dollin

Michael said:
Given websnarf's tirade, I took the trouble to reread chapter 4 of tPoP
on the train over the weekend. Almost everything I've seen him complain
about [in the code] is addressed in the text, either directly or
indirectly. One can disagree with some of their decisions, but they /are/
decisions, with contexts and reasons and tradeoffs, not mindless
just-do-thiss.

tPoP is not perfect, but it's a useful and illuminating book
nevertheless.

I hate to encourage bad behavior but usenet karma be damned I have to
agree with websnarf about this.

I have the greatest respect for Kernighan and Pike. C and UNIX are
still great application platforms today and I think they will be for a
long time. The simplicity and elegance of C and UNIX is quite frankly
unmatched by anything.

However, they're not doing anyone any good with this code (at least the
code in the link posted by websnarf [1]). In fact I think they're probably
doing more harm than good. Modern code must be reentrant or it's basically
useless. The cited example desperately needs an object to hold the state
of the parse. And for a CSV parser a state machine is clearly the only
way to go. The code may be good from a language lawyer perspective
but the overall organisation and design is just awful. For the sake of C
programmers everywhere I hope the rest of the book isn't anything like it.

"Almost everything I've seen him complain about [in the code] is addressed
in the text, either directly or indirectly.", from Chris Dollin's post
above, works for what you say, too.

As it happens, if I were writing a book like that, it wouldn't have occurred
to me to have the state static -- but on the other hand, /starting/ with it
static allows one to make a pedagogical point.
 
R

Richard Bos

CBFalconer said:
I shall eagerly watch the newsgroups for such. However, I plan to
eat and sleep in the interim.

I shall not even bother to look for it.

Richard
 
C

Chris Dollin

Michael said:
On Mon, 30 Apr 2007 10:54:31 -0700


Ok. Well that makes a little more sense but I still don't think there
should be any example like this in a new book from K&P.

tPoP isn't a new book; the copyright date is 1999.

I don't think its entirely fair to evaluate the code without the
context in which it appears.

--
"I just wonder when we're going to have to sit down and re-evaluate /Sahara/
our decision-making paradigm."

Hewlett-Packard Limited registered no:
registered office: Cain Road, Bracknell, Berks RG12 1HN 690597 England
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,444
Messages
2,571,709
Members
48,796
Latest member
Greg L.
Top