Thanks for your suggestion of trying the UNION query. I understand
all the points you made about scalability and resource utilization. I
have revised my article and posted a section on this technique:
http://www.somacon.com/aspdocs/
The result is that the union query method is 10 times faster than the
sub-query method for a 3000 record ASP report.
I thought you might like to hear about how I implemented the technique
in a real program. My current client has a "patient ledger" function
in their medical billing software, which is built on ASP/IIS. This
requires records from three different tables to be displayed: charges,
payments, and invoices. If you can believe it, I originally used
three separate queries to retrieve the records into a redimmed
VBScript array. Then I used a VBScript implementation of Quicksort to
order the records. This version was overall slow, and did not
correctly order the payment lines in relation to the invoice lines.
So upon revisiting the issue to create a similar "patient statement"
function, I spent several hours analyzing the relationships between
the resulting records. At first, I thought more complex sorting would
be needed, and for awhile started implementing my own sorting
algorithms. I quickly realized VBScript has no native implementation
of linked lists, making an efficient sorting algorithm next to
impossible to write from scratch. So I ditched the idea of separate
queries and tried to come up with a union query that did all the
ordering for me. The final query contained 14 columns, 4 of which
were used simply for ordering and row identification. Two of the
three select queries had 5 null columns. This query was several times
faster even for small record sets, and correctly ordered all the
lines, which is quite marvelous. Despite the query's complexity (its
length is ~2800 characters), the page loads almost instantly on a dual
Xeon 2.4 server with SQL Server. The major problem I had with
converting the page from 3 separate queries to a union query was
making sure all the column data types were identical, but this could
have been avoided by better preparation. SQL Server is much more
picky about the data types than Jet SQL. SQL Server also could not
handle an ntext field type in the union, which had to be converted to
nvarchar.
Anyway, thanks for the suggestions.
Shailesh
Chris Hohmann said:
Shailesh Humbad said:
I wanted to try a real example, so I tried printing Orders from the
Northwind sample database. The "Orders" table contains each order, and
"Order Details" contains the items for each order. I printed one line
detailing the order, and then multiple lines for each item in that
order. There are about 800 orders and 2000 items, so 2800 lines to be
printed. Using the UNION ALL query took 0.31 seconds. Using a prepared
command object and second recordset as in complextable3.asp took 2.77
seconds. Cool! Looks like I'll have to make another update to the
article. One thing that needs to be analyzed is what happens to
performance if the number of fields needed from each query is very
different. For example, suppose I needed 15 fields for each Order, but
only 3 fields for each item. The record for each item would contain 12
blank fields. I suspect that this won't be too much of a problem though.
Shailesh
Empirical testing would tell the tale. I will however throw out some
conjecture. If you're using an empty strings or NULL as filler and the
data profile is correct (VARCHAR vs. CHAR in SQL Server), then from a
size standpoint, very little additional data is rendered. The most
important thing to note here is that the performance advantage of the
union method over multiple recordsets improves as the result set is
scaled. Your test of 2000/800 records demonstrates a performance factor
of 9x. If you scale your result set to 20000/8000 you would see a
startling jump in that performance factor. In fact, it's likely that the
multiple recordset method would timeout at those numbers while the union
method would happily go about its business. There's an "economies of
scale" issue that comes into play for the union method. It is beyond the
scope of this thread, but the very indexes you referred to in your prior
post would allow the union query to perform better on a record for
record basis, for larger result sets. Finally, an issue that has not be
touched upon is the resource utilization footprint of each method.
Specifically, memory usage, threads, handles, registers, etc... Again,
empirical testing would tell the tale, but the disparity between the
union method and the multiple recordset method would be even more
extreme than simple performance. Consider the fact that in the 2000/800
example, 801 recordsets are being instantiated compared to the one (1)
recordset instantiated in the union method. 800 x [resource overhead for
a recordset]. Scary.
HTH
-Chris