Query 10 times slower (and results reordered!) from ASP.NET page than console app.

W

wizofaus

I have a chunk of code which is essentially

IDbCommand cmd = db.CreateCommand();
cmd.CommandText = "SELECT X, Y, Count(*) FROM Foo WHERE Z = 1 GROUP BY
X, Y";
using (IDataReader reader = cmd.ExecuteReader())
while (reader.Read())
{
// grab values from query
}

The database is SQL server (MSDE 2000), and "Foo" has well over a
million records. The query however only runs about 12 rows.

When I compile this code into a Windows console application, and run
it, it completes the query in under 2 seconds. But when I run exactly
the same code from an ASP.NET class, it takes over 20 seconds!
Furthermore, even though the values returned are exactly the same, the
ORDER of the rows is quite different: specifically, under ASP.NET the
rows are all ordered first by Y then by X, even though there's no
logical reason they should be. I actually tried adding an "ORDER BY Y,
X" to the end, which did cause the console app version to print out the
rows in the same order, but made no difference to the execution speed.

I've tried using an OdbcConnection, and OleDbConnection and an
SqlConnection - neither make any difference there either. I've made
sure both are compiled in release mode, and restarted IIS, and made
sure that my ASP.NET application is doing nothing else except this one
query.

Any suggestions most welcome!

Thanks

Dylan
 
C

Chris Lim

Use Profiler to compare execution plans. It sounds like different
indexes are being used, hence the different ordering as well. The
Console app's query may be using a better index than the ASP.NET one
(which may or may not be using an index at all).
 
J

Jeff Dillon

No need to cross-post so much!

Does it still take 20 seconds on the "second" invocation of the same ASP.NET
page?

Jeff
 
W

wizofaus

Chris said:
Use Profiler to compare execution plans. It sounds like different
indexes are being used, hence the different ordering as well. The
Console app's query may be using a better index than the ASP.NET one
(which may or may not be using an index at all).
That wouldn't make sense - it's exactly the same database.

The problem, believe it or not, was the case of the word "by" in the
GROUP BY clause.

If you use uppercase 'BY' the query takes 20 seconds, if you use "By",
it takes 2 seconds.

Pretty close to the damn weirdest bug I've come across any time
recently, especially because it happens under all three IDbConnection
providers (but not using the command-line osql tool).
 
W

wizofaus

That wouldn't make sense - it's exactly the same database.

The problem, believe it or not, was the case of the word "by" in the
GROUP BY clause.

If you use uppercase 'BY' the query takes 20 seconds, if you use "By",
it takes 2 seconds.

Pretty close to the damn weirdest bug I've come across any time
recently, especially because it happens under all three IDbConnection
providers (but not using the command-line osql tool).

Did some more poking around, the above isn't quite true. Firstly, it
only happens with OdbcDbConnection and OleDbConnection - not
SqlConnection. So I'm fairly sure it's not an SQL server problem per
se.

Furthermore, it specifically only occurs when I'm using Parameters that
need their types converting, and using "IN" for matching:

So the following code:

IDbCommand cmd = db.CreateCommand();
cmd.CommandText = "SELECT X, Y, Count(*) FROM Foo WHERE Z IN (?) GROUP
BY X, Y";
IDbDataParameter param = cmd.CreateParameter();
param.Value = "1";
cmd.Parameters.Add(param);
using (IDataReader reader = cmd.ExecuteReader())
while (reader.Read())
{
// grab values
}

Will reproduce the problem.

Change the query to be "WHERE Z = ?", or change param.Value to be just
1 (instead of the string literal "1"), or change "GROUP BY" to "GROUP
By" and the problem goes away.

Odd, huh.
 
A

Aaron Bertrand [SQL Server MVP]

Furthermore, even though the values returned are exactly the same, the
ORDER of the rows is quite different

Well, you don't have an ORDER BY clause, so SQL Server is free to return the
rows in any row it chooses. There is nothing wrong with even the same piece
of code running multiple times and returning a different order every time...
neglecting to have an ORDER BY clause is basically telling SQL Server that
you don't care about the order. So, long story short, if you want/expect a
certain order, SAY SO.
: specifically, under ASP.NET the
rows are all ordered first by Y then by X, even though there's no
logical reason they should be.

You could say the same for the ordering no matter what order they came out:
without an order by clause, there is no logical orderto expect. SQL Server
will simply return the rows in whatever order it deems most efficient.
sure both are compiled in release mode, and restarted IIS

And are you only testing the ASP.Net page once after restarting IIS? See,
the first time you load an ASP.Net page in IIS, it does a bunch of
background activity with it. Running Profiler might point out how much of
that time IIS is spent doing its work with the ASP.Net file, as opposed to
running the query.

Remember too that there is more overhead in the display time used by the
ASP.Net page to render results to HTML than there would be for a console app
returning plain text to the command line. And if you are running the
console app on the same machine as SQL Server, and the web site is on a
different machine, you need to factor network latency in there as well.

A
 
A

Aaron Bertrand [SQL Server MVP]

Use Profiler to compare execution plans. It sounds like different
That wouldn't make sense - it's exactly the same database.

The problem, believe it or not, was the case of the word "by" in the
GROUP BY clause.

If you use uppercase 'BY' the query takes 20 seconds, if you use "By",
it takes 2 seconds.

That's not exactly why. And I think Chris was more on track than you might
think "makes sense."

If you clear out the procedure cache and the buffers, you will find that
whether you use By or BY the first time makes no difference. The problem is
that SQL Server stores execution plans for ad hoc SQL statements, and treats
the plan with case sensitivity (and even for whitespace differences). So it
might be that when you ran one version you got a certain plan because of
data size or who knows what, and when you ran a version with a different
casing, a different plan got stored (again, who knows why, we don't know
enough about your system).

There's a guaranteed way around this: STOP USING AD HOC SQL. Is there any
reason you're not using stored procedures?

Another way around this is to use consistent casing and spacing/indenting
for T-SQL keywords. You'd be amazed how much more readable and maintainable
your code will become if you pick a convention and stick with it.

A
 
W

wizofaus

Aaron said:
That's not exactly why. And I think Chris was more on track than you might
think "makes sense."

If you clear out the procedure cache and the buffers, you will find that
whether you use By or BY the first time makes no difference. The problem is
that SQL Server stores execution plans for ad hoc SQL statements, and treats
the plan with case sensitivity (and even for whitespace differences). So it
might be that when you ran one version you got a certain plan because of
data size or who knows what, and when you ran a version with a different
casing, a different plan got stored (again, who knows why, we don't know
enough about your system).
Might be, but my additional investigation seemed to reveal that it was
a bug in the Odbc and OleDb providers, given that the case of "By"
wasn't the only thing causing the problem. I have to admit I didn't
try chaning the case of other parts of the query, but it was absolutely
consistent: I could change it to "BY", run it 3 times, and have it take
20 seconds each time, then change it to "By" and it would run at < 2
seconds.
There's a guaranteed way around this: STOP USING AD HOC SQL. Is there any
reason you're not using stored procedures?

Yup, we need to support Access, Oracle, MySql etc. etc.

I certainly want to look at using more parameterized views, which would
probably help in this case. But I actually build up the SQL statements
dynamically based on a number of user-configurable parameters -
including even which fields are selected/grouped on, so it's hard to
see how to avoid some amount of ad hoc SQL.
Another way around this is to use consistent casing and spacing/indenting
for T-SQL keywords. You'd be amazed how much more readable and maintainable
your code will become if you pick a convention and stick with it.

I always use all upper-case - I've actually no idea how I ended up with
the mixed case "By", but it was a good job I did, or I might never have
found the problem.
 
C

Chris Lim

Might be, but my additional investigation seemed to reveal that it was
a bug in the Odbc and OleDb providers, given that the case of "By"
wasn't the only thing causing the problem. I have to admit I didn't
try chaning the case of other parts of the query, but it was absolutely
consistent: I could change it to "BY", run it 3 times, and have it take
20 seconds each time, then change it to "By" and it would run at < 2
seconds.

Strange problem. Still, using Profiler you could at least see where the
delay is and it might give some more clues as to what's actually going
on.
 
A

Aaron Bertrand [SQL Server MVP]

wasn't the only thing causing the problem. I have to admit I didn't
try chaning the case of other parts of the query, but it was absolutely
consistent: I could change it to "BY", run it 3 times, and have it take
20 seconds each time, then change it to "By" and it would run at < 2
seconds.

You could run BY 8000 times and By 8000 times and it would still execute the
same in each case. Unless you follow my earlier advice to clear the
procedure cache and drop the buffers, which will get rid of both plans and
you may see that the behavior was due to a bad plan stored for one of the
cases, not due to a bug in ODBC or due to the case itself.

A
 
A

Aaron Bertrand [SQL Server MVP]

wasn't the only thing causing the problem. I have to admit I didn't
try chaning the case of other parts of the query, but it was absolutely
consistent: I could change it to "BY", run it 3 times, and have it take
20 seconds each time, then change it to "By" and it would run at < 2
seconds.

You could run BY 8000 times and By 8000 times and it would still execute the
same in each case. Unless you follow my earlier advice to clear the
procedure cache and drop the buffers, which will get rid of both plans and
you may see that the behavior was due to a bad plan stored for one of the
cases, not due to a bug in ODBC or due to the case itself.

A
 
W

wizofaus

Aaron said:
You could run BY 8000 times and By 8000 times and it would still execute the
same in each case. Unless you follow my earlier advice to clear the
procedure cache and drop the buffers, which will get rid of both plans and
you may see that the behavior was due to a bad plan stored for one of the
cases, not due to a bug in ODBC or due to the case itself.
Possible, but it wouldn't explain why a) it didn't happen when using
the .NET class "SqlConnection" and b) it didn't happen when using an
integer C# datatype for the parameter.

BTW, I have another question concerning efficiency - I ended up having
to delete all those records from the "Foo" table on my development
machine because it was causing even unrelated queries, i.e. queries
that were pulling records from that table, but that explicitly filtered
out almost of them in the WHERE clause via a key field, to run too
slow. Now that key field was marked with a foreign key constraint to
another table, but I hadn't explicitly created an index on it (mainly
because I thought the foreign key constraint would implicitly define
one, as it does under Access and even Oracle, I believe). Should I
reasonably expect a query like

SELECT Count(*) FROM Foo WHERE Z = 2

To run at much the same speed no matter how many records there are in
Foo that have Z <> 2, provided there is an index on Z?
 
A

Aaron Bertrand [SQL Server MVP]

Possible, but it wouldn't explain why a) it didn't happen when using
the .NET class "SqlConnection" and b) it didn't happen when using an
integer C# datatype for the parameter.

Changing the case from GROUP BY to GROUP By does not magically change the
way the query works, unless you had a bad plan cached for one case.

Maybe in your scenario the settings for "SqlConnection" or some other
environmental thing prevented the cached plan from being used.

Did you bother running a profiler trace to see what is going on? How about
examining the plan for both queries (copied verbatim) in Query Analyzer /
Management Studio?

I have no idea how to reproduce your scenario, so I don't know what else to
tell you, except that chaning BY to By alone is not the primary / sole
culprit for the difference you're seeing.

A
 
W

wizofaus

Aaron said:
Changing the case from GROUP BY to GROUP By does not magically change the
way the query works, unless you had a bad plan cached for one case.

Or there's something odd in the Odbc/OleDb drivers that somehow picks
up on the case of the query. Anyway, having deleted and re-inserted
all the rows from the table, I can't reproduce the problem anymore. I
suppose the "bad plan" theory makes the most sense, although it
concerns me how it happened in the first place.
Maybe in your scenario the settings for "SqlConnection" or some other
environmental thing prevented the cached plan from being used.

Did you bother running a profiler trace to see what is going on? How about
examining the plan for both queries (copied verbatim) in Query Analyzer /
Management Studio?

I'm only using MSDE at this point - I don't have access to a full blown
SQL installation, so I don't believe I have all those tools (and I
certainly don't pretend to be familiar with them).
 
W

wizofaus

BTW, I have another question concerning efficiency - I ended up having
to delete all those records from the "Foo" table on my development
machine because it was causing even unrelated queries, i.e. queries
that were pulling records from that table, but that explicitly filtered
out almost of them in the WHERE clause via a key field, to run too
slow. Now that key field was marked with a foreign key constraint to
another table, but I hadn't explicitly created an index on it (mainly
because I thought the foreign key constraint would implicitly define
one, as it does under Access and even Oracle, I believe). Should I
reasonably expect a query like

SELECT Count(*) FROM Foo WHERE Z = 2

To run at much the same speed no matter how many records there are in
Foo that have Z <> 2, provided there is an index on Z?

Further on this, I determined that

a) sql server definitely does create an implicit index when creating a
foreign key constraint (you can see it in the sysindexes table)
but
b) adding an explicit index definitely speeds up queries that filter
out most of the records - by a factor or 4 or 5!
however
c) adding the explicit index makes no difference to queries that do
need to process most of the records

To clarify:

Table 'foo' has over 1.5 million rows with Z = 1, and about 2000 with Z
= 2

Z has a foreign key constraint to another table.

The query

SELECT X, Y, Count(*) FROM Foo WHERE Z = 1 GROUP BY X, Y

Takes about the same time (just under 2 seconds) whether or not there
is an explicit index (using CREATE INDEX) on Z, but the query

SELECT X, Y, Count(*) FROM Foo WHERE Z = 2 GROUP BY X, Y

Takes about 5 times longer (200ms vs 40ms) when there is no explicit
index on Z.

Note that both queries return the same number of rows (12).
Does this seem normal?
 
R

Roy Harvey

Further on this, I determined that

a) sql server definitely does create an implicit index when creating a
foreign key constraint (you can see it in the sysindexes table)

No, it does not. It creates index statistics, which is what you were
seeing, but not an index.
but
b) adding an explicit index definitely speeds up queries that filter
out most of the records - by a factor or 4 or 5!

Yes, having an index that fits a selective query is way faster, but
remember this is comparing to no index at all.
however
c) adding the explicit index makes no difference to queries that do
need to process most of the records

The optimizer knows better to use an index - and thus go jumping all
over the table - when it has to look at most of the table in any case.

Roy Harvey
Beacon Falls, CT
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top