Need to parse SQL statements...use regular expression?

Discussion in 'Perl Misc' started by Justin F, Mar 5, 2004.

  1. Justin F

    Justin F Guest

    I'm developing a tool that will be used to write queries against a database.
    The word 'GO' is used as a batch terminator...for example, two queries
    separated by 'GO' would need to be sent to the database separately. I can
    can accomplish this if 'GO' appears on it's own line, but that won't always
    be the case...it may be in a comment or contained in a string, in which case
    I don't want to match.

    Here are some examples of when I DON'T want to match 'GO':
    On a commented line ('--' marks a commented line):
    select * from users
    -- don't match this go
    select * from users2

    Within a comment block ('/*' opens the block, and '*/' closes it):
    select * from users
    /*
    don't match this go
    */
    select * from users2

    Within a string (single qoutes delimit a string):
    select 'don't match this go'

    DO match these two:
    select * from users go select * from users2

    select * from users
    go
    select * from users2


    I really don't know much about regular expressions...can this be
    accomplished with one? If so, what would the expression look like?

    -Justin
     
    Justin F, Mar 5, 2004
    #1
    1. Advertising

  2. Justin F

    gnari Guest

    "Justin F" <> wrote in message news:jTS1c.10175$m4.5448@okepread03...
    > I'm developing a tool that will be used to write queries against a

    database.
    > The word 'GO' is used as a batch terminator...for example, two queries
    > separated by 'GO' would need to be sent to the database separately. I can
    > can accomplish this if 'GO' appears on it's own line, but that won't

    always
    > be the case...it may be in a comment or contained in a string, in which

    case
    > I don't want to match.


    [snip examples]

    > I really don't know much about regular expressions...can this be
    > accomplished with one? If so, what would the expression look like?


    first: is there any reason for the 'go' as SQL terminator ?
    it is quite usual to use ';' for this. one problem is that 'go' is
    a valid SQL object name (table/column/...)

    I think that it is not possible to do this with a regex, unless you add
    a few restrictions to the allowed use of the comments, because you
    need to deal with so many cases:
    terminator in single quotes
    terminator in double quotes
    terminator in comments
    nested comments of all combinations
    comments inside quoted strings
    quotes in comments
    quotes in quoted strings

    you probably are better off doing some basic parsing instead of
    using a regex. (or many)
    who knows, maybe one of the many SQL modules on CPAN
    can help you.

    gnari
     
    gnari, Mar 5, 2004
    #2
    1. Advertising

  3. Justin F wrote:

    > I'm developing a tool that will be used to write queries against a
    > database. The word 'GO' is used as a batch terminator...


    PL/I allowed reserved words to be identifiers in the 1960s but it never
    caught on. I wonder why ...

    A blank line or semicolon, is the normal terminator in most command line sql
    interface.
    What are you going to do about about DDL statements - create table, drop
    index, etc. Do you wan to support various RDBMS vendors & implementations?

    To do what you want you may need a full parser (maybe LALR(1)) with
    disambiguation rules. That's a lot more to learn than regular expressions.

    gtoomey
     
    Gregory Toomey, Mar 5, 2004
    #3
  4. Justin F

    Justin F Guest

    "gnari" <> wrote in message
    news:c29fdn$j8e$...
    > "Justin F" <> wrote in message

    news:jTS1c.10175$m4.5448@okepread03...
    > > I'm developing a tool that will be used to write queries against a

    > database.
    > > The word 'GO' is used as a batch terminator...for example, two queries
    > > separated by 'GO' would need to be sent to the database separately. I

    can
    > > can accomplish this if 'GO' appears on it's own line, but that won't

    > always
    > > be the case...it may be in a comment or contained in a string, in which

    > case
    > > I don't want to match.

    >
    > [snip examples]
    >
    > > I really don't know much about regular expressions...can this be
    > > accomplished with one? If so, what would the expression look like?

    >
    > first: is there any reason for the 'go' as SQL terminator ?
    > it is quite usual to use ';' for this. one problem is that 'go' is
    > a valid SQL object name (table/column/...)
    >
    > I think that it is not possible to do this with a regex, unless you add
    > a few restrictions to the allowed use of the comments, because you
    > need to deal with so many cases:
    > terminator in single quotes
    > terminator in double quotes
    > terminator in comments
    > nested comments of all combinations
    > comments inside quoted strings
    > quotes in comments
    > quotes in quoted strings
    >
    > you probably are better off doing some basic parsing instead of
    > using a regex. (or many)
    > who knows, maybe one of the many SQL modules on CPAN
    > can help you.
    >
    > gnari


    The reason 'GO' is the terminator is because it's a Microsoft SQL server
    I'll be going against. From the replies I've gotten it looks like I'll have
    to pursue a different avenue for parsing. Thanks for the replies.

    BTW, what is CPAN?
     
    Justin F, Mar 5, 2004
    #4
  5. On Fri, 05 Mar 2004 06:33:04 -0600, Justin F wrote:

    > The reason 'GO' is the terminator is because it's a Microsoft SQL server
    > I'll be going against. From the replies I've gotten it looks like I'll have
    > to pursue a different avenue for parsing. Thanks for the replies.


    I didn't get that impression. I got the impression that your data that
    you're using needs to be re-thought :) It would be a simple task if the
    'GO' were on a single line.

    But ... since you don't have control over your data coming in, you need to
    fashion a (series of) rather complex regular expression(s).

    Or ... another option you *may* be able to examine is substituting each
    instance of 'GO' into a newline character. Then, you *might* be able to
    use a single regular expression to extract the SQL statements. A module
    from ye olde CPAN might be able to help.

    This, of course, is just off the top of my head and untested :)

    > BTW, what is CPAN?


    Comprehensive Perl Archive Network - it is the central repository for Perl
    modules. `perldoc CPAN` for more information.

    --
    Jim

    Copyright notice: all code written by the author in this post is
    released under the GPL. http://www.gnu.org/licenses/gpl.txt
    for more information.

    a fortune quote ...
    Documentation is like sex: when it is good, it is very, very
    good; and when it is bad, it is better than nothing. -- Dick
    Brandon
     
    James Willmore, Mar 5, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Neil Zanella
    Replies:
    8
    Views:
    1,196
    mfmehdi
    Oct 20, 2006
  2. VSK
    Replies:
    2
    Views:
    2,307
  3. Harry George
    Replies:
    6
    Views:
    385
    Bart Nessux
    Feb 23, 2004
  4. Vince
    Replies:
    12
    Views:
    751
    Martin Gregorie
    Jan 21, 2008
  5. Neil
    Replies:
    32
    Views:
    1,277
    Tom Anderson
    Aug 13, 2009
Loading...

Share This Page