vi regex to preserve interior commas in CSV string

C

ccc31807

I have CSV file with rows that looks like this:

ROW EXAMPLE 1
fieldA,fieldB,fieldC,George,Washington,President,"1600 Pennsylvania
Avenue, Washington, D.C. 55554",202-555-1212,fieldX,fieldY,fieldZ<EOL>

I want to preserve the interior commas from the double quoted portion
of the string, perhaps turning it into something like this:

ROW EXAMPLE 2
'fieldA','fieldB','fieldC','George','Washington,President','1600
Pennsylvania Avenue, Washington, D.C.
55554','202-555-1212,','fieldX',;fieldY',;fieldZ'<EOL>

where each value is single quoted and the values are separated by
commas.

I wanted (past tense) to do this with one vi (vim) regex and spend
about an hour trying different ones before giving up in failure. I use
three different regexes to do what I wanted, so I don't have the
problem anymore, but I still have the question.

What one regex can I use (in vi/vim) to transform EXAMPLE 1 to EXAMPLE
2?

Thanks, CC.
 
C

ccc31807

How do you determine that the comma in "Washington,President"
is not a field separator?

Are you missing some double quotes in your ROW EXAMPLE 1 ?

Or are you missing some single quotes in your ROW EXAMPLE 2 ?

The file is an Excel file saved as a DOS-CSV file. Excel qualifies
text fields with double quotes if and only if the text field contains
a comma.

Obviously, you can write a Perl script to munge the data file, either
by using one of the CSV modules or by hand rolling your own. However,
this was such a simple task that I didn't want to go to the trouble of
writing such a script.

My solution was to (1) substitute every comma between double quotes to
an asterisk, (2) substitute every comma to an apostrophe, comma,
apostrophe, and (3) substitute every asterisk back to a comma.

vi/vim should be able to do this with one regex rather than three.

CC.
 
C

ccc31807

So then, you are missing some single quotes in your ROW EXAMPLE 2.

I made a mistake when typing.

I have discovered that vi will do a lot of what I write little Perl
scripts to do, such as convert the output of Excel to a format that I
can use. I'm just not that good (yet) with vi regexes.

Thanks, CC.
 
C

ccc31807

What was your Perl question?

Didn't have Perl question. I posted to comp.editors and cross posted
to clpm because of my high regard for the knowledge, intelligence, and
experience of the denizens of clpm, you included, Ben.

I have also noted from time to time that people sometimes post on
comp.editors and clpm, and that some of them us vi/vim and have a deep
knowledge of it. I had hoped to catch the eye of some of these people
and maximize the chance of getting an answer.

Sorry if this offends you, but it's not a flame or a troll, and the
subject of the post clearly discloses that it's a vi question.

Thanks, CC.
 
C

ccc31807

All abuses of Usenet are offensive.

I did not abuse Usenet.
Your abuse here displays a rather profound selfishness on your part.

Actually, your abuse displays a rather profound selfishness on your
part.

As an aside, permit me to say that the regex engines in Perl and vi/
vim have a number of substantial differences, and those of us who
write Perl using vi/vim struggle daily with the differences. In my own
case, I sometimes spend much more time trying to construct a regular
expression compatible with vi than I would have spent either doing the
edits by hand and then finding and correcting all the inadvertent
errors, or by writing a Perl script that would do the same thing. To
give a real life example, I frequently find myself doing something
like this:

STATEMENT 1 is an SQL query
select $id, $one, $two, $three, $four ... from TABLE with $twenty eq
'twenty';

STATEMENT 2 is a hash assignment
$hash{$id} = { one => $one, two => $two, ... };

STATEMENT 3 is an output statement
$row = qq("$id","$hash{$id}{one}","$hash{$id}{two}"...\n);

Using vi, I only have to type STATEMENT 1, and can transform that
statement into STATEMENT 2 and STATEMENT 3, and similar statements,
with just one command, and do so instantly and without any errors. It
saves a great deal of time and effort. Proficiency in a programming
language includes mastery of a programming environment, and many
Perlistas use vi/vim as their primary programming environment. If you
are ignorant of vi/vim, I can see how you might perceive my post as
abusive of clpm. I would be very surprised if, knowing vi/vim and
having struggled with vi regexes, you perceive my post as abusive.

You don't need to be so quick to judge.
Stating that a post is off-topic does not give license to post it
to non-related newsgroups.

No, but it at least indicates to people that they should not read the
post if off topic posts offend them. In this case, knowing that the
post concerned a vi regex to preserve interior commas, and that you
would be offended by reading a non-Perl question on clpm, you read it
and took offense anyway. This says a lot more about you than it does
about me.

And yes, I'm cross posting this reply to comp.editors, because members
frequently mention Perl there, to give those who don't regularly read
clpm the opportunity to look at this exchange.

CC.
 
C

ccc31807

Making an off-topic post is clearly an abuse, so yes you did.

As I said, proficiency in a programming language requires mastery of
some programming environment. The vi editor has been used over the
years as the programming environment of choice for a number of
different purposes, Perl programming is one, and I personally also use
it for programs that I write in Common Lisp, HTML, LaTeX, SQL, Java,
and I also use it a lot for plain data files.

There is a penumbra (if you will) of topics that fit more or less well
into the interests of those who use Perl. These include both editors
and regular expressions. I asked about both of these. Clearly, if I
had made a post concerning religion, politics, sex, or a multitude of
other things I would have committed an abuse.

I will bet you money that if you asked 1,000 people who use, have
used, or would like to use, Perl, if my topic was legitimate for Perl
users to discuss, anywhere from ten percent to ninety percent would
find the topic agreeable.

And maybe the real test is this: if I had asked the question about a
PERL re rather than a VI re, which I very well could have done, not
even you would have found it abusive in any way. It wasn't the subject
matter of the post that apparently has your panties in a wad, but the
manner of expression.

My question concerned the construction of a regular expression. It's
immaterial whether I wanted to use it in a Perl script, or as an ex
command, or for any other purpose. I don't think regular expressions
are off topic in clpm.

CC.
 
K

Keith Keller

[Followup-To ignored]

Quoth ccc31807 said:
I did not abuse Usenet.

I, perhaps, wouldn't go so far as 'abuse'; but your post was definitely
off-topic, which is definitely rude.

This OP has had the same bad posting habits for many years; I am
surprised that people still respond to anything he posts beyond
corrections of blatant untruths.

--keith
 
J

Jürgen Exner

Ben Morrow said:
How entirely fascinating.

What was your Perl question?

Oh, come on! For the first time in his life he discovered a feature that
distinguishes a real editor from a toy and you chastise him?
Cut the guy some slack, at some point we were young and inexperienced,
too.

Of course, the Perl script will do the same thing again and again and
again automatically while for vi or any other editor you either have to
retype the maybe very complex substitute command or save it as a macro
if you ever want to use it again.

jue
 
J

Jürgen Exner

Ben Morrow said:
People asking general regular expression questions here 'because Perl
uses them and there isn't anywhere else to ask' is one of the things
the regulars tend to react particularly badly to, since it happens a lot
more often than it should.

And even worse: Perl regular expressions are quite different from e.g.
vi or emacs or .Net or pickYourFavourite regular expressions. Therefore
asking about Perl REs and hoping to be able to use the Perl solution in
a different system is quite naive.

jue
 
J

Jürgen Exner

Ben Morrow said:
[Followup-To ignored]

Quoth ccc31807 said:
If you
are ignorant of vi/vim, I can see how you might perceive my post as
abusive of clpm. I would be very surprised if, knowing vi/vim and
having struggled with vi regexes, you perceive my post as abusive.

How is vi special wrt. REs? REs are supported in any standard editor.
No, not in Notepad, but that doesn't qualify as an editor in the first
place.

jue
 
C

ccc31807

'In order to learn Perl one must eat; therefore cookery is on-topic in
clpmisc.'

Eating is common across every field of human endeavor. Mastery of a
programming environment seems common only in programming.

'Rudeness' is much a matter of convention. I certainly did not
perceive myself as rude in my original post. I'm sorry if others did,
but I don't control how others feel.

In any case, my comment about penumbras stands. There is no bright
line between things that are 'clearly' off topic and those that
'clearly' on topic. In any case, I strongly disagree that IN THE
GENERAL CASE questions relating to regular expressions are ALWAYS off
topic. I just did a search for regex questions in clpm, and found
thousands of them. If I hadn't mentioned vi probably no one would have
had hurt feelings, but I can't figure out why mentioning vi caused
such an uproar.

CC.
 
A

Antony Scriven

I have CSV file with rows that looks like this:

ROW EXAMPLE 1
fieldA,fieldB,fieldC,George,Washington,President,"1600 Pennsylvania
Avenue, Washington, D.C. 55554",
202-555-1212 said:
I want to preserve the interior commas from the double quoted portion
of the string, perhaps turning it into something like this:

ROW EXAMPLE 2
'fieldA','fieldB','fieldC','George','Washington,President','1600
Pennsylvania Avenue, Washington, D.C.
55554','202-555-1212,','fieldX',;fieldY',;fieldZ'<EOL>

where each value is single quoted and the values are separated by
commas.

I wanted (past tense) to do this with one vi (vim) regex and spend
about an hour trying different ones before giving up in failure. I use
three different regexes to do what I wanted, so I don't have the
problem anymore, but I still have the question.

What one regex can I use (in vi/vim) to transform EXAMPLE 1 to EXAMPLE
2?

Thanks, CC.

1) Why?
2) Newlines are also valid within a CSV field. Use a proper parser.
--Antony
 
A

Antony Scriven

I have CSV file with rows that looks like this:

[...]

What one regex can I use (in vi/vim) to transform EXAMPLE
1 to EXAMPLE 2?

And why does it have to be *one* regexp? --Antony
 
R

Rui Maciel

Antony said:
1) Why?
2) Newlines are also valid within a CSV field. Use a proper parser.


The CSV format isn't formally defined and generally consists of a series of
lines, each one comprised by set of fields separated by a comma and
terminated by an end-of-line symbol. This means that newlines are not
valid within a CSV field in some implementations. From the example which
was presented by CC, it doesn't appear that his CSV documents includes any
end-of-line symbol in any field.

Regarding your suggestion to use a "proper parser", I believe we can agree
that it would be a bit excessive for this application. After all, the
purpose of this thread is to help someone "massage" a text file in order to
tweak the file format, which would be a "one in a lifetime" thing.


Rui Maciel
 
C

Charlton Wilbur

JE> Oh, come on! For the first time in his life he discovered a
JE> feature that distinguishes a real editor from a toy and you
JE> chastise him? Cut the guy some slack, at some point we were
JE> young and inexperienced, too.

Most people grow out of it. Some people manage to remain inexperienced,
if not young, for decades. The OP has a long posting history here.

Charlton
 
C

ccc31807

And why does it have to be *one* regexp? --Antony

Because I can do it in three steps. I just wondered if it could be
done in one step.

I found this question interesting: how to replace a character except
when it appears between two characters. I feel reasonably sure that it
can be done in one statement, but I don't know what that statement is.

CC.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top