Need to parse SQL statements...use regular expression?

J

Justin F

I'm developing a tool that will be used to write queries against a database.
The word 'GO' is used as a batch terminator...for example, two queries
separated by 'GO' would need to be sent to the database separately. I can
can accomplish this if 'GO' appears on it's own line, but that won't always
be the case...it may be in a comment or contained in a string, in which case
I don't want to match.

Here are some examples of when I DON'T want to match 'GO':
On a commented line ('--' marks a commented line):
select * from users
-- don't match this go
select * from users2

Within a comment block ('/*' opens the block, and '*/' closes it):
select * from users
/*
don't match this go
*/
select * from users2

Within a string (single qoutes delimit a string):
select 'don't match this go'

DO match these two:
select * from users go select * from users2

select * from users
go
select * from users2


I really don't know much about regular expressions...can this be
accomplished with one? If so, what would the expression look like?

-Justin
 
G

gnari

Justin F said:
I'm developing a tool that will be used to write queries against a database.
The word 'GO' is used as a batch terminator...for example, two queries
separated by 'GO' would need to be sent to the database separately. I can
can accomplish this if 'GO' appears on it's own line, but that won't always
be the case...it may be in a comment or contained in a string, in which case
I don't want to match.

[snip examples]
I really don't know much about regular expressions...can this be
accomplished with one? If so, what would the expression look like?

first: is there any reason for the 'go' as SQL terminator ?
it is quite usual to use ';' for this. one problem is that 'go' is
a valid SQL object name (table/column/...)

I think that it is not possible to do this with a regex, unless you add
a few restrictions to the allowed use of the comments, because you
need to deal with so many cases:
terminator in single quotes
terminator in double quotes
terminator in comments
nested comments of all combinations
comments inside quoted strings
quotes in comments
quotes in quoted strings

you probably are better off doing some basic parsing instead of
using a regex. (or many)
who knows, maybe one of the many SQL modules on CPAN
can help you.

gnari
 
G

Gregory Toomey

Justin said:
I'm developing a tool that will be used to write queries against a
database. The word 'GO' is used as a batch terminator...

PL/I allowed reserved words to be identifiers in the 1960s but it never
caught on. I wonder why ...

A blank line or semicolon, is the normal terminator in most command line sql
interface.
What are you going to do about about DDL statements - create table, drop
index, etc. Do you wan to support various RDBMS vendors & implementations?

To do what you want you may need a full parser (maybe LALR(1)) with
disambiguation rules. That's a lot more to learn than regular expressions.

gtoomey
 
J

Justin F

gnari said:
I'm developing a tool that will be used to write queries against a database.
The word 'GO' is used as a batch terminator...for example, two queries
separated by 'GO' would need to be sent to the database separately. I can
can accomplish this if 'GO' appears on it's own line, but that won't always
be the case...it may be in a comment or contained in a string, in which case
I don't want to match.

[snip examples]
I really don't know much about regular expressions...can this be
accomplished with one? If so, what would the expression look like?

first: is there any reason for the 'go' as SQL terminator ?
it is quite usual to use ';' for this. one problem is that 'go' is
a valid SQL object name (table/column/...)

I think that it is not possible to do this with a regex, unless you add
a few restrictions to the allowed use of the comments, because you
need to deal with so many cases:
terminator in single quotes
terminator in double quotes
terminator in comments
nested comments of all combinations
comments inside quoted strings
quotes in comments
quotes in quoted strings

you probably are better off doing some basic parsing instead of
using a regex. (or many)
who knows, maybe one of the many SQL modules on CPAN
can help you.

gnari

The reason 'GO' is the terminator is because it's a Microsoft SQL server
I'll be going against. From the replies I've gotten it looks like I'll have
to pursue a different avenue for parsing. Thanks for the replies.

BTW, what is CPAN?
 
J

James Willmore

The reason 'GO' is the terminator is because it's a Microsoft SQL server
I'll be going against. From the replies I've gotten it looks like I'll have
to pursue a different avenue for parsing. Thanks for the replies.

I didn't get that impression. I got the impression that your data that
you're using needs to be re-thought :) It would be a simple task if the
'GO' were on a single line.

But ... since you don't have control over your data coming in, you need to
fashion a (series of) rather complex regular expression(s).

Or ... another option you *may* be able to examine is substituting each
instance of 'GO' into a newline character. Then, you *might* be able to
use a single regular expression to extract the SQL statements. A module
from ye olde CPAN might be able to help.

This, of course, is just off the top of my head and untested :)
BTW, what is CPAN?

Comprehensive Perl Archive Network - it is the central repository for Perl
modules. `perldoc CPAN` for more information.

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
Documentation is like sex: when it is good, it is very, very
good; and when it is bad, it is better than nothing. -- Dick
Brandon
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,007
Latest member
obedient dusk

Latest Threads

Top