Regexp Guru Needed

Discussion in 'Ruby' started by James Edward Gray II, Oct 30, 2005.

  1. We're having a discussion on Ruby Core about how to speed up CSV.
    I'm trying to tune a Regexp that matches CSV fields. However, I'm
    seeing something I don't expect. Can someone explain this to me,
    please?

    >> ",".scan(/(?:^|,)(?:"()"|([^",]*))/)

    => [[nil, ""]]

    That's a simplified version of what I'm messing with. My question
    is, why does it only match once, when I expect two matches?

    The first match should be right at the beginning, and is basically
    (?:^ ... )(?: ... ([^",]*)). The second match should begin at the
    comma, being (?: ... ,)(?: ... ([^",]*)). What am I missing?

    James Edward Gray II
     
    James Edward Gray II, Oct 30, 2005
    #1
    1. Advertising

  2. ------=_Part_6082_11694773.1130634239195
    Content-Type: text/plain; charset=ISO-8859-1
    Content-Transfer-Encoding: quoted-printable
    Content-Disposition: inline

    On 10/30/05, James Edward Gray II <> wrote:
    >
    > We're having a discussion on Ruby Core about how to speed up CSV.
    > I'm trying to tune a Regexp that matches CSV fields. However, I'm
    > seeing something I don't expect. Can someone explain this to me,
    > please?
    >
    > >> ",".scan(/(?:^|,)(?:"()"|([^",]*))/)

    > =3D> [[nil, ""]]
    >
    > That's a simplified version of what I'm messing with. My question
    > is, why does it only match once, when I expect two matches?
    >
    > The first match should be right at the beginning, and is basically
    > (?:^ ... )(?: ... ([^",]*)). The second match should begin at the
    > comma, being (?: ... ,)(?: ... ([^",]*)). What am I missing?
    >


    I'm not pretending to be a regexp guru, but nonetheless:

    scan moves forward one character even if the portion of the string that it
    matched has length 0. This is to prevent it from going into an infinite
    loop. Consider your example: the regexp matches at the start of the string,
    and matches 0 characters. If for the next match, Ruby has not moved forward
    one character, the regexp would match at the start of the string again in
    exactly the same way and still have not matched anything of the string.

    My suggestion would be to have two regexps, one to strip off the beginning
    of the CSV line, and one to split the remainder into parts.

    Peter

    ------=_Part_6082_11694773.1130634239195--
     
    Peter Vanbroekhoven, Oct 30, 2005
    #2
    1. Advertising

  3. On Oct 29, 2005, at 8:04 PM, Peter Vanbroekhoven wrote:

    > I'm not pretending to be a regexp guru, but nonetheless:
    >
    > scan moves forward one character even if the portion of the string
    > that it
    > matched has length 0.


    I am aware of the infamous "bump-along", but doesn't 0 + 1 == 1? I
    expected that to put it on the comma, which would work just fine.

    James Edward Gray II
     
    James Edward Gray II, Oct 30, 2005
    #3
  4. On Oct 29, 2005, at 10:18 PM, James Edward Gray II wrote:

    > I am aware of the infamous "bump-along", but doesn't 0 + 1 == 1? I
    > expected that to put it on the comma, which would work just fine.


    Nevermind. I get how dumb I'm being now. There's only one
    character, at 0. Duh. Thanks for the lesson.

    James Edward Gray II
     
    James Edward Gray II, Oct 30, 2005
    #4
  5. If you google for "CSV regexp" you get a lot of hits. This one looks promising:

    http://www.codeguru.com/columns/DotNetTips/article.php/c8153/

    Warren Seltzer


    -----Original Message-----
    From: James Edward Gray II [mailto:]
    Sent: Sunday, October 30, 2005 2:27 AM
    To: ruby-talk ML
    Subject: Regexp Guru Needed
    ...
     
    Warren Seltzer, Oct 30, 2005
    #5
  6. On Oct 30, 2005, at 3:43 AM, Warren Seltzer wrote:

    > If you google for "CSV regexp" you get a lot of hits. This one
    > looks promising:
    >
    > http://www.codeguru.com/columns/DotNetTips/article.php/c8153/


    Thanks.

    Just FYI, the main expression we are working with is:

    /\G(?:^|,)(?:"((?>[^"]*)(?>""[^"]*)*)"|([^",]*))/

    From Mastering Regular Expressions (2nd Edition).

    James Edward Gray II
     
    James Edward Gray II, Oct 30, 2005
    #6
  7. James Edward Gray II wrote:
    >
    > From Mastering Regular Expressions (2nd Edition).


    Check out RegexBuddy. Worth getting access to Win32 just for this if
    you're a Mac guy needing to debug some REs.

    --Steve
     
    Stephen Waits, Nov 2, 2005
    #7
  8. Stephen Waits <> wrote:
    > James Edward Gray II wrote:
    > >
    > > From Mastering Regular Expressions (2nd Edition).

    >
    > Check out RegexBuddy. Worth getting access to Win32 just for this if
    > you're a Mac guy needing to debug some REs.


    Or http://www.weitz.de/regex-coach/ - it's the best one I've seen, and
    has Linux and Windows ports (sadly no Mac version).

    martin
     
    Martin DeMello, Nov 2, 2005
    #8
  9. On Nov 2, 2005, at 4:42 AM, Martin DeMello wrote:

    > Stephen Waits <> wrote:
    >> Check out RegexBuddy. Worth getting access to Win32 just for this if
    >> you're a Mac guy needing to debug some REs.

    >
    > Or http://www.weitz.de/regex-coach/ - it's the best one I've seen,
    > and
    > has Linux and Windows ports (sadly no Mac version).


    Thanks for the link Martin. I hadn't found it before. I tried it
    out, and, it's a nice "free" alternative to RegexBuddy; however, it
    pales in comparison to what RB can do. I do wish RB was a little
    cheaper - I've bought much richer software for less money.

    --Steve
     
    Stephen Waits, Nov 2, 2005
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Efy.
    Replies:
    2
    Views:
    1,097
  2. Joe

    Control Guru Needed

    Joe, Jan 20, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    366
  3. John Thompson

    ASP.NET Image Upload... Guru needed

    John Thompson, Jun 30, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    1,148
    Steve C. Orr [MVP, MCSD]
    Jun 30, 2004
  4. Andreas Klemt

    Regular Expressions Guru needed. Please help!

    Andreas Klemt, Aug 18, 2004, in forum: ASP .Net
    Replies:
    0
    Views:
    285
    Andreas Klemt
    Aug 18, 2004
  5. Joao Silva
    Replies:
    16
    Views:
    361
    7stud --
    Aug 21, 2009
Loading...

Share This Page