Query 10 times slower (and results reordered!) from ASP.NET page than console app.

wizofaus · Oct 25, 2006

I have a chunk of code which is essentially

IDbCommand cmd = db.CreateCommand();
cmd.CommandText = "SELECT X, Y, Count(*) FROM Foo WHERE Z = 1 GROUP BY
X, Y";
using (IDataReader reader = cmd.ExecuteReader())
while (reader.Read())
{
// grab values from query
}

The database is SQL server (MSDE 2000), and "Foo" has well over a
million records. The query however only runs about 12 rows.

When I compile this code into a Windows console application, and run
it, it completes the query in under 2 seconds. But when I run exactly
the same code from an ASP.NET class, it takes over 20 seconds!
Furthermore, even though the values returned are exactly the same, the
ORDER of the rows is quite different: specifically, under ASP.NET the
rows are all ordered first by Y then by X, even though there's no
logical reason they should be. I actually tried adding an "ORDER BY Y,
X" to the end, which did cause the console app version to print out the
rows in the same order, but made no difference to the execution speed.

I've tried using an OdbcConnection, and OleDbConnection and an
SqlConnection - neither make any difference there either. I've made
sure both are compiled in release mode, and restarted IIS, and made
sure that my ASP.NET application is doing nothing else except this one
query.

Any suggestions most welcome!

Thanks

Dylan

Chris Lim · Oct 25, 2006

Use Profiler to compare execution plans. It sounds like different
indexes are being used, hence the different ordering as well. The
Console app's query may be using a better index than the ASP.NET one
(which may or may not be using an index at all).

Jeff Dillon · Oct 25, 2006

No need to cross-post so much!

Does it still take 20 seconds on the "second" invocation of the same ASP.NET
page?

Jeff

wizofaus · Oct 25, 2006

Chris said:
Use Profiler to compare execution plans. It sounds like different
indexes are being used, hence the different ordering as well. The
Console app's query may be using a better index than the ASP.NET one
(which may or may not be using an index at all).

That wouldn't make sense - it's exactly the same database.

The problem, believe it or not, was the case of the word "by" in the
GROUP BY clause.

If you use uppercase 'BY' the query takes 20 seconds, if you use "By",
it takes 2 seconds.

Pretty close to the damn weirdest bug I've come across any time
recently, especially because it happens under all three IDbConnection
providers (but not using the command-line osql tool).

wizofaus · Oct 25, 2006

[email protected] said:
That wouldn't make sense - it's exactly the same database.

The problem, believe it or not, was the case of the word "by" in the
GROUP BY clause.

If you use uppercase 'BY' the query takes 20 seconds, if you use "By",
it takes 2 seconds.

Pretty close to the damn weirdest bug I've come across any time
recently, especially because it happens under all three IDbConnection
providers (but not using the command-line osql tool).

Did some more poking around, the above isn't quite true. Firstly, it
only happens with OdbcDbConnection and OleDbConnection - not
SqlConnection. So I'm fairly sure it's not an SQL server problem per
se.

Furthermore, it specifically only occurs when I'm using Parameters that
need their types converting, and using "IN" for matching:

So the following code:

IDbCommand cmd = db.CreateCommand();
cmd.CommandText = "SELECT X, Y, Count(*) FROM Foo WHERE Z IN (?) GROUP
BY X, Y";
IDbDataParameter param = cmd.CreateParameter();
param.Value = "1";
cmd.Parameters.Add(param);
using (IDataReader reader = cmd.ExecuteReader())
while (reader.Read())
{
// grab values
}

Will reproduce the problem.

Change the query to be "WHERE Z = ?", or change param.Value to be just
1 (instead of the string literal "1"), or change "GROUP BY" to "GROUP
By" and the problem goes away.

Odd, huh.

Aaron Bertrand [SQL Server MVP] · Oct 25, 2006

Furthermore, even though the values returned are exactly the same, the

ORDER of the rows is quite different

Well, you don't have an ORDER BY clause, so SQL Server is free to return the
rows in any row it chooses. There is nothing wrong with even the same piece
of code running multiple times and returning a different order every time...
neglecting to have an ORDER BY clause is basically telling SQL Server that
you don't care about the order. So, long story short, if you want/expect a
certain order, SAY SO.

: specifically, under ASP.NET the
rows are all ordered first by Y then by X, even though there's no
logical reason they should be.

You could say the same for the ordering no matter what order they came out:
without an order by clause, there is no logical orderto expect. SQL Server
will simply return the rows in whatever order it deems most efficient.

sure both are compiled in release mode, and restarted IIS

And are you only testing the ASP.Net page once after restarting IIS? See,
the first time you load an ASP.Net page in IIS, it does a bunch of
background activity with it. Running Profiler might point out how much of
that time IIS is spent doing its work with the ASP.Net file, as opposed to
running the query.

Remember too that there is more overhead in the display time used by the
ASP.Net page to render results to HTML than there would be for a console app
returning plain text to the command line. And if you are running the
console app on the same machine as SQL Server, and the web site is on a
different machine, you need to factor network latency in there as well.

A

Aaron Bertrand [SQL Server MVP] · Oct 25, 2006

Use Profiler to compare execution plans. It sounds like different

That wouldn't make sense - it's exactly the same database.

The problem, believe it or not, was the case of the word "by" in the
GROUP BY clause.

If you use uppercase 'BY' the query takes 20 seconds, if you use "By",
it takes 2 seconds.

That's not exactly why. And I think Chris was more on track than you might
think "makes sense."

If you clear out the procedure cache and the buffers, you will find that
whether you use By or BY the first time makes no difference. The problem is
that SQL Server stores execution plans for ad hoc SQL statements, and treats
the plan with case sensitivity (and even for whitespace differences). So it
might be that when you ran one version you got a certain plan because of
data size or who knows what, and when you ran a version with a different
casing, a different plan got stored (again, who knows why, we don't know
enough about your system).

There's a guaranteed way around this: STOP USING AD HOC SQL. Is there any
reason you're not using stored procedures?

Another way around this is to use consistent casing and spacing/indenting
for T-SQL keywords. You'd be amazed how much more readable and maintainable
your code will become if you pick a convention and stick with it.

A

wizofaus · Oct 25, 2006

Aaron said:
That's not exactly why. And I think Chris was more on track than you might
think "makes sense."

If you clear out the procedure cache and the buffers, you will find that
whether you use By or BY the first time makes no difference. The problem is
that SQL Server stores execution plans for ad hoc SQL statements, and treats
the plan with case sensitivity (and even for whitespace differences). So it
might be that when you ran one version you got a certain plan because of
data size or who knows what, and when you ran a version with a different
casing, a different plan got stored (again, who knows why, we don't know
enough about your system).

Might be, but my additional investigation seemed to reveal that it was
a bug in the Odbc and OleDb providers, given that the case of "By"
wasn't the only thing causing the problem. I have to admit I didn't
try chaning the case of other parts of the query, but it was absolutely
consistent: I could change it to "BY", run it 3 times, and have it take
20 seconds each time, then change it to "By" and it would run at < 2
seconds.

There's a guaranteed way around this: STOP USING AD HOC SQL. Is there any
reason you're not using stored procedures?

Yup, we need to support Access, Oracle, MySql etc. etc.

I certainly want to look at using more parameterized views, which would
probably help in this case. But I actually build up the SQL statements
dynamically based on a number of user-configurable parameters -
including even which fields are selected/grouped on, so it's hard to
see how to avoid some amount of ad hoc SQL.

Another way around this is to use consistent casing and spacing/indenting
for T-SQL keywords. You'd be amazed how much more readable and maintainable
your code will become if you pick a convention and stick with it.

I always use all upper-case - I've actually no idea how I ended up with
the mixed case "By", but it was a good job I did, or I might never have
found the problem.

Chris Lim · Oct 25, 2006

Might be, but my additional investigation seemed to reveal that it was
a bug in the Odbc and OleDb providers, given that the case of "By"
wasn't the only thing causing the problem. I have to admit I didn't
try chaning the case of other parts of the query, but it was absolutely
consistent: I could change it to "BY", run it 3 times, and have it take
20 seconds each time, then change it to "By" and it would run at < 2
seconds.

Strange problem. Still, using Profiler you could at least see where the
delay is and it might give some more clues as to what's actually going
on.

Aaron Bertrand [SQL Server MVP] · Oct 25, 2006

wasn't the only thing causing the problem. I have to admit I didn't

try chaning the case of other parts of the query, but it was absolutely
consistent: I could change it to "BY", run it 3 times, and have it take
20 seconds each time, then change it to "By" and it would run at < 2
seconds.

You could run BY 8000 times and By 8000 times and it would still execute the
same in each case. Unless you follow my earlier advice to clear the
procedure cache and drop the buffers, which will get rid of both plans and
you may see that the behavior was due to a bad plan stored for one of the
cases, not due to a bug in ODBC or due to the case itself.

A

Aaron Bertrand [SQL Server MVP] · Oct 25, 2006

wasn't the only thing causing the problem. I have to admit I didn't

try chaning the case of other parts of the query, but it was absolutely
consistent: I could change it to "BY", run it 3 times, and have it take
20 seconds each time, then change it to "By" and it would run at < 2
seconds.

You could run BY 8000 times and By 8000 times and it would still execute the
same in each case. Unless you follow my earlier advice to clear the
procedure cache and drop the buffers, which will get rid of both plans and
you may see that the behavior was due to a bad plan stored for one of the
cases, not due to a bug in ODBC or due to the case itself.

A

wizofaus · Oct 25, 2006

Aaron said:
You could run BY 8000 times and By 8000 times and it would still execute the
same in each case. Unless you follow my earlier advice to clear the
procedure cache and drop the buffers, which will get rid of both plans and
you may see that the behavior was due to a bad plan stored for one of the
cases, not due to a bug in ODBC or due to the case itself.

Possible, but it wouldn't explain why a) it didn't happen when using
the .NET class "SqlConnection" and b) it didn't happen when using an
integer C# datatype for the parameter.

BTW, I have another question concerning efficiency - I ended up having
to delete all those records from the "Foo" table on my development
machine because it was causing even unrelated queries, i.e. queries
that were pulling records from that table, but that explicitly filtered
out almost of them in the WHERE clause via a key field, to run too
slow. Now that key field was marked with a foreign key constraint to
another table, but I hadn't explicitly created an index on it (mainly
because I thought the foreign key constraint would implicitly define
one, as it does under Access and even Oracle, I believe). Should I
reasonably expect a query like

SELECT Count(*) FROM Foo WHERE Z = 2

To run at much the same speed no matter how many records there are in
Foo that have Z <> 2, provided there is an index on Z?

Aaron Bertrand [SQL Server MVP] · Oct 25, 2006

Possible, but it wouldn't explain why a) it didn't happen when using

the .NET class "SqlConnection" and b) it didn't happen when using an
integer C# datatype for the parameter.

Changing the case from GROUP BY to GROUP By does not magically change the
way the query works, unless you had a bad plan cached for one case.

Maybe in your scenario the settings for "SqlConnection" or some other
environmental thing prevented the cached plan from being used.

Did you bother running a profiler trace to see what is going on? How about
examining the plan for both queries (copied verbatim) in Query Analyzer /
Management Studio?

I have no idea how to reproduce your scenario, so I don't know what else to
tell you, except that chaning BY to By alone is not the primary / sole
culprit for the difference you're seeing.

A

wizofaus · Oct 25, 2006

Aaron said:
Changing the case from GROUP BY to GROUP By does not magically change the
way the query works, unless you had a bad plan cached for one case.

Or there's something odd in the Odbc/OleDb drivers that somehow picks
up on the case of the query. Anyway, having deleted and re-inserted
all the rows from the table, I can't reproduce the problem anymore. I
suppose the "bad plan" theory makes the most sense, although it
concerns me how it happened in the first place.

Maybe in your scenario the settings for "SqlConnection" or some other
environmental thing prevented the cached plan from being used.

Did you bother running a profiler trace to see what is going on? How about
examining the plan for both queries (copied verbatim) in Query Analyzer /
Management Studio?

I'm only using MSDE at this point - I don't have access to a full blown
SQL installation, so I don't believe I have all those tools (and I
certainly don't pretend to be familiar with them).

wizofaus · Oct 25, 2006

BTW, I have another question concerning efficiency - I ended up having
to delete all those records from the "Foo" table on my development
machine because it was causing even unrelated queries, i.e. queries
that were pulling records from that table, but that explicitly filtered
out almost of them in the WHERE clause via a key field, to run too
slow. Now that key field was marked with a foreign key constraint to
another table, but I hadn't explicitly created an index on it (mainly
because I thought the foreign key constraint would implicitly define
one, as it does under Access and even Oracle, I believe). Should I
reasonably expect a query like

SELECT Count(*) FROM Foo WHERE Z = 2

To run at much the same speed no matter how many records there are in
Foo that have Z <> 2, provided there is an index on Z?

Further on this, I determined that

a) sql server definitely does create an implicit index when creating a
foreign key constraint (you can see it in the sysindexes table)
but
b) adding an explicit index definitely speeds up queries that filter
out most of the records - by a factor or 4 or 5!
however
c) adding the explicit index makes no difference to queries that do
need to process most of the records

To clarify:

Table 'foo' has over 1.5 million rows with Z = 1, and about 2000 with Z
= 2

Z has a foreign key constraint to another table.

The query

SELECT X, Y, Count(*) FROM Foo WHERE Z = 1 GROUP BY X, Y

Takes about the same time (just under 2 seconds) whether or not there
is an explicit index (using CREATE INDEX) on Z, but the query

SELECT X, Y, Count(*) FROM Foo WHERE Z = 2 GROUP BY X, Y

Takes about 5 times longer (200ms vs 40ms) when there is no explicit
index on Z.

Note that both queries return the same number of rows (12).
Does this seem normal?

Roy Harvey · Oct 26, 2006

Further on this, I determined that

a) sql server definitely does create an implicit index when creating a
foreign key constraint (you can see it in the sysindexes table)

No, it does not. It creates index statistics, which is what you were
seeing, but not an index.

but
b) adding an explicit index definitely speeds up queries that filter
out most of the records - by a factor or 4 or 5!

Yes, having an index that fits a selective query is way faster, but
remember this is comparing to no index at all.

however
c) adding the explicit index makes no difference to queries that do
need to process most of the records

The optimizer knows better to use an index - and thus go jumping all
over the table - when it has to look at most of the table in any case.

Roy Harvey
Beacon Falls, CT

Simple sproc executes 240 times slower when called from asp.net thanwhen called from QA	7	Mar 19, 2009
Running FTPS from the ASP.NET page	1	May 29, 2009
posting daqta to an asp.net page from a console application	1	Apr 29, 2004
Form Query and GridView for ASP.NET 2.0	1	Feb 18, 2006
Query Active Directory from ASP.NET	0	Jan 16, 2004
Leaking Database Connections from ASP.NET 2 Web App	4	Oct 17, 2006
ASP.Net with Indexing Services returns zero results	1	Jul 1, 2003
Out-of-memory error: sample app	3	Sep 19, 2006

Query 10 times slower (and results reordered!) from ASP.NET page than console app.

wizofaus

Chris Lim

Jeff Dillon

wizofaus

wizofaus

Aaron Bertrand [SQL Server MVP]

Aaron Bertrand [SQL Server MVP]

wizofaus

Chris Lim

Aaron Bertrand [SQL Server MVP]

Aaron Bertrand [SQL Server MVP]

wizofaus

Aaron Bertrand [SQL Server MVP]

wizofaus

wizofaus

Roy Harvey

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads