vi regex to preserve interior commas in CSV string

Discussion in 'Perl Misc' started by ccc31807, Dec 13, 2011.

  1. ccc31807

    ccc31807 Guest

    I have CSV file with rows that looks like this:

    ROW EXAMPLE 1
    fieldA,fieldB,fieldC,George,Washington,President,"1600 Pennsylvania
    Avenue, Washington, D.C. 55554",202-555-1212,fieldX,fieldY,fieldZ<EOL>

    I want to preserve the interior commas from the double quoted portion
    of the string, perhaps turning it into something like this:

    ROW EXAMPLE 2
    'fieldA','fieldB','fieldC','George','Washington,President','1600
    Pennsylvania Avenue, Washington, D.C.
    55554','202-555-1212,','fieldX',;fieldY',;fieldZ'<EOL>

    where each value is single quoted and the values are separated by
    commas.

    I wanted (past tense) to do this with one vi (vim) regex and spend
    about an hour trying different ones before giving up in failure. I use
    three different regexes to do what I wanted, so I don't have the
    problem anymore, but I still have the question.

    What one regex can I use (in vi/vim) to transform EXAMPLE 1 to EXAMPLE
    2?

    Thanks, CC.
     
    ccc31807, Dec 13, 2011
    #1
    1. Advertising

  2. ccc31807

    ccc31807 Guest

    On Dec 13, 1:18 pm, Tad McClellan <> wrote:','1600
    >
    > How do you determine that the comma in "Washington,President"
    > is not a field separator?
    >
    > Are you missing some double quotes in your ROW EXAMPLE 1 ?
    >
    > Or are you missing some single quotes in your ROW EXAMPLE 2 ?


    The file is an Excel file saved as a DOS-CSV file. Excel qualifies
    text fields with double quotes if and only if the text field contains
    a comma.

    Obviously, you can write a Perl script to munge the data file, either
    by using one of the CSV modules or by hand rolling your own. However,
    this was such a simple task that I didn't want to go to the trouble of
    writing such a script.

    My solution was to (1) substitute every comma between double quotes to
    an asterisk, (2) substitute every comma to an apostrophe, comma,
    apostrophe, and (3) substitute every asterisk back to a comma.

    vi/vim should be able to do this with one regex rather than three.

    CC.
     
    ccc31807, Dec 13, 2011
    #2
    1. Advertising

  3. ccc31807

    ccc31807 Guest

    On Dec 13, 5:32 pm, Tad McClellan <> wrote:
    > So then, you are missing some single quotes in your ROW EXAMPLE 2.


    I made a mistake when typing.

    I have discovered that vi will do a lot of what I write little Perl
    scripts to do, such as convert the output of Excel to a format that I
    can use. I'm just not that good (yet) with vi regexes.

    Thanks, CC.
     
    ccc31807, Dec 14, 2011
    #3
  4. ccc31807

    ccc31807 Guest

    On Dec 14, 5:16 pm, Ben Morrow <> wrote:
    > What was your Perl question?


    Didn't have Perl question. I posted to comp.editors and cross posted
    to clpm because of my high regard for the knowledge, intelligence, and
    experience of the denizens of clpm, you included, Ben.

    I have also noted from time to time that people sometimes post on
    comp.editors and clpm, and that some of them us vi/vim and have a deep
    knowledge of it. I had hoped to catch the eye of some of these people
    and maximize the chance of getting an answer.

    Sorry if this offends you, but it's not a flame or a troll, and the
    subject of the post clearly discloses that it's a vi question.

    Thanks, CC.
     
    ccc31807, Dec 15, 2011
    #4
  5. ccc31807

    ccc31807 Guest

    On Dec 15, 10:17 am, Tad McClellan <> wrote:
    > All abuses of Usenet are offensive.


    I did not abuse Usenet.

    > Your abuse here displays a rather profound selfishness on your part.


    Actually, your abuse displays a rather profound selfishness on your
    part.

    As an aside, permit me to say that the regex engines in Perl and vi/
    vim have a number of substantial differences, and those of us who
    write Perl using vi/vim struggle daily with the differences. In my own
    case, I sometimes spend much more time trying to construct a regular
    expression compatible with vi than I would have spent either doing the
    edits by hand and then finding and correcting all the inadvertent
    errors, or by writing a Perl script that would do the same thing. To
    give a real life example, I frequently find myself doing something
    like this:

    STATEMENT 1 is an SQL query
    select $id, $one, $two, $three, $four ... from TABLE with $twenty eq
    'twenty';

    STATEMENT 2 is a hash assignment
    $hash{$id} = { one => $one, two => $two, ... };

    STATEMENT 3 is an output statement
    $row = qq("$id","$hash{$id}{one}","$hash{$id}{two}"...\n);

    Using vi, I only have to type STATEMENT 1, and can transform that
    statement into STATEMENT 2 and STATEMENT 3, and similar statements,
    with just one command, and do so instantly and without any errors. It
    saves a great deal of time and effort. Proficiency in a programming
    language includes mastery of a programming environment, and many
    Perlistas use vi/vim as their primary programming environment. If you
    are ignorant of vi/vim, I can see how you might perceive my post as
    abusive of clpm. I would be very surprised if, knowing vi/vim and
    having struggled with vi regexes, you perceive my post as abusive.

    You don't need to be so quick to judge.

    > Stating that a post is off-topic does not give license to post it
    > to non-related newsgroups.


    No, but it at least indicates to people that they should not read the
    post if off topic posts offend them. In this case, knowing that the
    post concerned a vi regex to preserve interior commas, and that you
    would be offended by reading a non-Perl question on clpm, you read it
    and took offense anyway. This says a lot more about you than it does
    about me.

    And yes, I'm cross posting this reply to comp.editors, because members
    frequently mention Perl there, to give those who don't regularly read
    clpm the opportunity to look at this exchange.

    CC.
     
    ccc31807, Dec 15, 2011
    #5
  6. ccc31807

    ccc31807 Guest

    On Dec 15, 12:59 pm, Tad McClellan <> wrote:e:
    > >> All abuses of Usenet are offensive.

    >
    > > I did not abuse Usenet.

    >
    > Making an off-topic post is clearly an abuse, so yes you did.


    As I said, proficiency in a programming language requires mastery of
    some programming environment. The vi editor has been used over the
    years as the programming environment of choice for a number of
    different purposes, Perl programming is one, and I personally also use
    it for programs that I write in Common Lisp, HTML, LaTeX, SQL, Java,
    and I also use it a lot for plain data files.

    There is a penumbra (if you will) of topics that fit more or less well
    into the interests of those who use Perl. These include both editors
    and regular expressions. I asked about both of these. Clearly, if I
    had made a post concerning religion, politics, sex, or a multitude of
    other things I would have committed an abuse.

    I will bet you money that if you asked 1,000 people who use, have
    used, or would like to use, Perl, if my topic was legitimate for Perl
    users to discuss, anywhere from ten percent to ninety percent would
    find the topic agreeable.

    And maybe the real test is this: if I had asked the question about a
    PERL re rather than a VI re, which I very well could have done, not
    even you would have found it abusive in any way. It wasn't the subject
    matter of the post that apparently has your panties in a wad, but the
    manner of expression.

    My question concerned the construction of a regular expression. It's
    immaterial whether I wanted to use it in a Perl script, or as an ex
    command, or for any other purpose. I don't think regular expressions
    are off topic in clpm.

    CC.
     
    ccc31807, Dec 15, 2011
    #6
  7. ccc31807

    Keith Keller Guest

    On 2011-12-15, Ben Morrow <> wrote:
    > [Followup-To ignored]
    >
    > Quoth ccc31807 <>:
    >> On Dec 15, 10:17?am, Tad McClellan <> wrote:
    >> > All abuses of Usenet are offensive.

    >>
    >> I did not abuse Usenet.

    >
    > I, perhaps, wouldn't go so far as 'abuse'; but your post was definitely
    > off-topic, which is definitely rude.


    This OP has had the same bad posting habits for many years; I am
    surprised that people still respond to anything he posts beyond
    corrections of blatant untruths.

    --keith



    --
    -francisco.ca.us
    (try just my userid to email me)
    AOLSFAQ=http://www.therockgarden.ca/aolsfaq.txt
    see X- headers for PGP signature information
     
    Keith Keller, Dec 15, 2011
    #7
  8. Ben Morrow <> wrote:
    >
    >Quoth ccc31807 <>:
    >>
    >> I have discovered that vi will do a lot of what I write little Perl
    >> scripts to do, such as convert the output of Excel to a format that I
    >> can use. I'm just not that good (yet) with vi regexes.

    >
    >How entirely fascinating.
    >
    >What was your Perl question?


    Oh, come on! For the first time in his life he discovered a feature that
    distinguishes a real editor from a toy and you chastise him?
    Cut the guy some slack, at some point we were young and inexperienced,
    too.

    Of course, the Perl script will do the same thing again and again and
    again automatically while for vi or any other editor you either have to
    retype the maybe very complex substitute command or save it as a macro
    if you ever want to use it again.

    jue
     
    Jürgen Exner, Dec 16, 2011
    #8
  9. Ben Morrow <> wrote:
    >People asking general regular expression questions here 'because Perl
    >uses them and there isn't anywhere else to ask' is one of the things
    >the regulars tend to react particularly badly to, since it happens a lot
    >more often than it should.


    And even worse: Perl regular expressions are quite different from e.g.
    vi or emacs or .Net or pickYourFavourite regular expressions. Therefore
    asking about Perl REs and hoping to be able to use the Perl solution in
    a different system is quite naive.

    jue
     
    Jürgen Exner, Dec 16, 2011
    #9
  10. Ben Morrow <> wrote:
    >[Followup-To ignored]
    >
    >Quoth ccc31807 <>:
    >> If you
    >> are ignorant of vi/vim, I can see how you might perceive my post as
    >> abusive of clpm. I would be very surprised if, knowing vi/vim and
    >> having struggled with vi regexes, you perceive my post as abusive.


    How is vi special wrt. REs? REs are supported in any standard editor.
    No, not in Notepad, but that doesn't qualify as an editor in the first
    place.

    jue
     
    Jürgen Exner, Dec 16, 2011
    #10
  11. ccc31807

    ccc31807 Guest

    On Dec 15, 6:22 pm, Ben Morrow <> wrote:
    > 'In order to learn Perl one must eat; therefore cookery is on-topic in
    > clpmisc.'


    Eating is common across every field of human endeavor. Mastery of a
    programming environment seems common only in programming.

    'Rudeness' is much a matter of convention. I certainly did not
    perceive myself as rude in my original post. I'm sorry if others did,
    but I don't control how others feel.

    In any case, my comment about penumbras stands. There is no bright
    line between things that are 'clearly' off topic and those that
    'clearly' on topic. In any case, I strongly disagree that IN THE
    GENERAL CASE questions relating to regular expressions are ALWAYS off
    topic. I just did a search for regex questions in clpm, and found
    thousands of them. If I hadn't mentioned vi probably no one would have
    had hurt feelings, but I can't figure out why mentioning vi caused
    such an uproar.

    CC.
     
    ccc31807, Dec 16, 2011
    #11
  12. On Dec 13, 3:59 pm, ccc31807 wrote:

    > I have CSV file with rows that looks like this:
    >
    > ROW EXAMPLE 1
    > fieldA,fieldB,fieldC,George,Washington,President,"1600 Pennsylvania
    > Avenue, Washington, D.C. 55554",

    202-555-1212,fieldX,fieldY,fieldZ<EOL>
    >
    > I want to preserve the interior commas from the double quoted

    portion
    > of the string, perhaps turning it into something like this:
    >
    > ROW EXAMPLE 2
    > 'fieldA','fieldB','fieldC','George','Washington,President','1600
    > Pennsylvania Avenue, Washington, D.C.
    > 55554','202-555-1212,','fieldX',;fieldY',;fieldZ'<EOL>
    >
    > where each value is single quoted and the values are separated by
    > commas.
    >
    > I wanted (past tense) to do this with one vi (vim) regex and spend
    > about an hour trying different ones before giving up in failure. I

    use
    > three different regexes to do what I wanted, so I don't have the
    > problem anymore, but I still have the question.
    >
    > What one regex can I use (in vi/vim) to transform EXAMPLE 1 to

    EXAMPLE
    > 2?
    >
    > Thanks, CC.


    1) Why?
    2) Newlines are also valid within a CSV field. Use a proper parser.
    --Antony
     
    Antony Scriven, Dec 18, 2011
    #12
  13. On Dec 13, 3:59 pm, ccc31807 wrote:

    > I have CSV file with rows that looks like this:
    >
    > [...]
    >
    > What one regex can I use (in vi/vim) to transform EXAMPLE
    > 1 to EXAMPLE 2?


    And why does it have to be *one* regexp? --Antony
     
    Antony Scriven, Dec 18, 2011
    #13
  14. ccc31807

    Rui Maciel Guest

    Antony Scriven wrote:

    > 1) Why?
    > 2) Newlines are also valid within a CSV field. Use a proper parser.



    The CSV format isn't formally defined and generally consists of a series of
    lines, each one comprised by set of fields separated by a comma and
    terminated by an end-of-line symbol. This means that newlines are not
    valid within a CSV field in some implementations. From the example which
    was presented by CC, it doesn't appear that his CSV documents includes any
    end-of-line symbol in any field.

    Regarding your suggestion to use a "proper parser", I believe we can agree
    that it would be a bit excessive for this application. After all, the
    purpose of this thread is to help someone "massage" a text file in order to
    tweak the file format, which would be a "one in a lifetime" thing.


    Rui Maciel
     
    Rui Maciel, Dec 19, 2011
    #14
  15. >>>>> "JE" == Jürgen Exner <> writes:

    JE> Oh, come on! For the first time in his life he discovered a
    JE> feature that distinguishes a real editor from a toy and you
    JE> chastise him? Cut the guy some slack, at some point we were
    JE> young and inexperienced, too.

    Most people grow out of it. Some people manage to remain inexperienced,
    if not young, for decades. The OP has a long posting history here.

    Charlton

    --
    Charlton Wilbur
     
    Charlton Wilbur, Dec 20, 2011
    #15
  16. ccc31807

    ccc31807 Guest

    On Dec 18, 3:26 pm, Antony Scriven <> wrote:
    > And why does it have to be *one* regexp? --Antony


    Because I can do it in three steps. I just wondered if it could be
    done in one step.

    I found this question interesting: how to replace a character except
    when it appears between two characters. I feel reasonably sure that it
    can be done in one statement, but I don't know what that statement is.

    CC.
     
    ccc31807, Dec 20, 2011
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. dzieciou
    Replies:
    1
    Views:
    437
    Joe Fawcett
    Jun 11, 2004
  2. AviraM
    Replies:
    2
    Views:
    6,497
    Manish Pandit
    Sep 28, 2006
  3. R. David Murray
    Replies:
    8
    Views:
    612
    Tim Chase
    Mar 27, 2009
  4. Grzegorz Chrupala
    Replies:
    2
    Views:
    239
    Grzegorz Chrupala
    Jun 30, 2003
  5. Henry Law
    Replies:
    2
    Views:
    93
    Henry Law
    Aug 5, 2006
Loading...

Share This Page