Recordsets vs. Arrays?

S

SABmore

Is there a performance advantage to parsing thru a recordset verus using an
array?

I'm currently trying to populate a listbox by returning data from my
database, then either parsing thru the recordset until I reach the EOF, or
putting the data into an array.

Thanks for your assistance.
 
B

Bob Barrows [MVP]

SABmore said:
Is there a performance advantage to parsing thru a recordset verus
using an array?

I'm currently trying to populate a listbox by returning data from my
database, then either parsing thru the recordset until I reach the
EOF, or putting the data into an array.

Thanks for your assistance.

The definitive answer is here:
http://www.aspfaq.com/show.asp?id=2467

Bob Barrows
PS. My preference for listboxes is to use GetString
 
D

Dave Anderson

Bob said:
http://www.aspfaq.com/show.asp?id=2467

PS. My preference for listboxes is to use GetString

Is there a way to insert the SELECTED attribute when you do this?



--
Dave Anderson

Unsolicited commercial email will be read at a cost of $500 per message. Use
of this email address implies consent to these terms. Please do not contact
me directly or ask me to contact you directly for assistance. If your
question is worth asking, it's worth posting.
 
B

Bob Barrows [MVP]

Dave said:
Is there a way to insert the SELECTED attribute when you do this?
Here's an example of a function I did to get a list of facilities from our
AS400 for use in various pages:


Function FacilityDropdown(pName, pDefault,pAllowAll)
dim cn,rs,strsql,sOptions, sHTML, i
sHTML=Session("Fac_HTML")
if len(sHTML) = 0 then
'|| is the concatenation operator in DB2 SQL
strsql = "SELECT '<OPTION value=""' || Facility || '"">' ||" & _
" Facility || '&nbsp;-&nbsp;' || Description || '</option>' " & _
"FROM Facilities where Record_ID='MF' and Facility <> '61' " & _
"and Facility between 'AA' and '99' ORDER BY Facility"
ODBCDatasource = "censored"
set cn = server.createobject("ADODB.CONNECTION")
set rs=Server.CreateObject("adodb.recordset")
cn.open ODBCDatasource
set rs = cn.Execute(strsql,,1)
if not rs.eof then sOptions= rs.GetString(,,"",vbCrLf)
rs.Close:set rs=nothing
cn.Close:set cn=nothing
sHTML="<SELECT style=""font-family:Lucida Console"">" & _
vbCrLf & sOptions & vbCrLf & "</SELECT>"
Session("Fac_HTML")=sHTML
end if
sHTML=replace(sHTML,"<SELECT","<SELECT name=""" & _
pName & """")
if pAllowAll then
if InStr(sHTML,"value=""ALL""") = 0 then
i=instr(sHTML,"<OPT")
if i>0 then
sHTML=left(sHTML,i-1) & _
"<OPTION value=""ALL"">ALL</OPTION>" & vbCrLf & _
mid(sHTML,i)
end if
end if
elseif InStr(sHTML,"""ALL""") > 0 then
sHTML=replace(sHTML, _
"<OPTION value=""ALL"">ALL</OPTION>" & vbCrLf, "")
end if
if len(pDefault) > 0 then
if InStr(sHTML,pDefault & """ SELECTED") = 0 then
sHTML=replace(sHTML," SELECTED","")
sHTML=replace(sHTML, _
"value=""" & pDefault & """>", _
"value=""" & pDefault & """ SELECTED>")
end if
else
if InStr(sHTML," SELECTED") > 0 then
sHTML=replace(sHTML," SELECTED","")
end if
end if
FacilityDropdown=sHTML
end function
 
D

Dave Anderson

Bob said:
Yes, with a Replace statement.

I guess this also requires that you know your whether your data might
require Server.HTMLEncode().

My personal preference is to save user data as entered, so I always use
Server.HTMLEncode on attribute values populated from user-submitted values.
This would render GetString() somewhat less uniformly useful.

Which, combined with your technique, leads to a question -- does SQL Server
2005 offer HTMLEncode functionality?



--
Dave Anderson

Unsolicited commercial email will be read at a cost of $500 per message. Use
of this email address implies consent to these terms. Please do not contact
me directly or ask me to contact you directly for assistance. If your
question is worth asking, it's worth posting.
 
B

Bob Barrows [MVP]

Dave said:
I guess this also requires that you know your whether your data might
require Server.HTMLEncode().

My personal preference is to save user data as entered, so I always
use Server.HTMLEncode on attribute values populated from
user-submitted values. This would render GetString() somewhat less
uniformly useful.

I'm not sure I follow. Rather than guess, I think I'll wait until you look
at the example I posted. Maybe if you relate your reservation to the
example, I'll understand your point better.
Which, combined with your technique, leads to a question -- does SQL
Server 2005 offer HTMLEncode functionality?

I don't know for sure, but given the ability to use the CLR, I would be
surprised if it didn't.

Bob Barrows
 
B

Bob Barrows [MVP]

Dave said:
I guess this also requires that you know your whether your data might
require Server.HTMLEncode().

My personal preference is to save user data as entered, so I always
use Server.HTMLEncode on attribute values populated from
user-submitted values. This would render GetString() somewhat less
uniformly useful.

Oh wait, I see what you're getting at. If the Description data in my example
required encoding, I would not be able to use GetString. Unless ..
I need to think about this.

Bob
 
C

Chris Hohmann

Dave Anderson said:
Is there a way to insert the SELECTED attribute when you do this?

Here's what I used. DBCFS is for single-select controls and DBLSS is for
multi-select controls:

<%
Function DBCFS(strConn,strSQL,varSelectedKey)
'I/O:
'--> strConn : Connection String
'--> strSQL : SQL Statement OR "Table" Name
'--> varSelectedKey : Variant that identifies which option should be
selected
'Notes:
'The function expects strSQL to return at least two(2) columns.
'Column 1 will be used to populate the value attribute of the option tag
'Column 2 will be used to populate the content of the option tag, ie. what
gets displayed

'Determine command type
Dim re, lngOptions
Set re = New RegExp
re.Pattern = "^\s*(SELECT|EXEC)"
re.IgnoreCase = True
If re.Test(strSQL) Then
lngOptions = &H1 'Command Text
Else
lngOptions = &H2 'Table
End If

'Get the data
Dim conn, rs, arr
Set conn = Server.CreateObject("ADODB.Connection")
conn.Open strConn
set rs = conn.Execute(strSQL,,lngOptions)
If Not rs.EOF Then arr = rs.GetRows()
rs.Close : Set rs = Nothing
conn.Close : Set conn = Nothing

'Build the option tags
Dim j,o
o=""
If IsArray(arr) Then
For j = 0 to UBound(arr,2)
o=o & "<option value=""" & Server.HTMLEncode(arr(0,j)) & """"
If arr(0,j) = varSelectedKey Then
o=o & " selected"
End If
o=o & ">" & Server.HTMLEncode(arr(1,j)) & "</option>" & vbCRLF
Next
Else
o=o & "<option>[No Option Data]</option>"
End If
DBCFS = o
End Function

Function DBLSS(strConn,strSQL)
'I/O:
'--> strConn : Connection String
'--> strSQL : SQL Statement OR "Table" Name
'Notes:
'The function expects strSQL to return at least three(3) columns.
'Column 1 will be used to populate the value attribute of the option tag
'Column 2 will be used to populate the content of the option tag, ie. what
gets displayed
'Column 3 determines if the select attribute should be set for the option
tag

'Determine command type
Dim re, lngOptions
Set re = New RegExp
re.Pattern = "^\s*(SELECT|EXEC)"
re.IgnoreCase = True
If re.Test(strSQL) Then
lngOptions = &H1 'Command Text
Else
lngOptions = &H2 'Table
End If

'Get data
Dim conn, rs, arr
Set conn = Server.CreateObject("ADODB.Connection")
conn.Open strConn
Set rs = conn.Execute(strSQL,,lngOptions)
If Not rs.EOF Then arr = rs.GetRows()
rs.Close : Set rs = Nothing
conn.Close : Set conn = Nothing

'Build option tags
Dim j,o
o=""
If IsArray(arr) Then
For j = 0 to UBound(arr,2)
o=o & "<option value=""" & Server.HTMLEncode(arr(0,j)) & """"
If arr(2,j) Then
o=o & " selected"
End If
o=o & ">" & Server.HTMLEncode(arr(1,j)) & "</option>"
Next
Else
o=o & "<option>" & strSQL & "</option>"
End If
DBLSS = o
End Function
%>
 
M

Mark J. McGinty

Bob Barrows said:

I think use of the word "definitive" here is something of an overstatement.

For one thing, there is an optimization technique available to the recordset
test that was not presented, that being explicit creation of field objects,
which will improve field value access times that rival the array method.
And even if field access is minutely slower, the overhead of allocating
space for a complete, sometimes redundant copy of the data as an array can
make the recordset method faster overall, depending on number of records,
cursor, etc.

Yet another optimization technique overlooked is to access the fields using
their ordinal positions rather than their names. This isn't quite as fast
as creating field objects, but is considerably faster than using names. The
article approached this concept in the GetRows section, except that it
talked about using a dictionary object as a lookup for field positions. It
didn't bother to delve into the performance hit *that* would incur -- for
sure there would be COM object access overhead... simple constants would be
much faster.

For another thing, the GetString test doesn't properly belong in the set,
it's too dissimilar from the others. Yes it can be faster [depending on
cursor] because it makes far fewer allocations, and avoids the overhead of
iteration in ASP code. But if you need to process something in each row
[and said processing cannot be performed in the SQL statement] it's not an
option.

Further, depending on the provider and cursor used to source it, GetString
can actually be quite a bit slower than iterating. (In principle I agree
that calling GetString on a recordset opened with a cursor other than a
firehose is usually nearly insane, but there are plausible scenarios.)

Almost lastly, this is in essence a comparison of field data access
techniques, not a test of connection pooling or query plan cache... point
being that including object creation/opening time in the results tends to
cloud the issue, as well as subject the test to potential skew from
transient server-side factors.

And lastly, the difference between the winner and the loser is 31.25 ms.
The resolution of Timer is going to be 15 ms by default on an SMP system, 10
ms on a single CPU system, so relative conclusions such as method A is x%
faster than method B aren't going to tend to be highly accurate. The test
should instead have an inner loop that redundantly accesses the fields of
each row [n] times, because that's what's really at the core of this
benchmark.

Of course this would incur the loop overhead that the article was so
concerned with, but that, I believe, is infantissimal: a quick test on my
system indicates that it is less than 0.0002 ms per iteration (using a for
loop.) Hmm, that doesn't consider calling MoveNext and testing EOF, which
could be consequential, but since we're talking about an inner loop here,
not relevant. Here's the code I used:

dim t, i, t2 : i = 0 : t = Timer
for i = 0 to 1000000 : next : t2 = Timer
Response.Write Left(CStr(((Timer - t) * 1000000.0) / CDbl(i)), 5) & _
" microseconds per iteration"


Maybe I'll construct an enhanced version of the drag-race... not today
though. :)


-Mark
 
B

Bob Barrows [MVP]

Mark said:
(using a for loop.) Hmm, that doesn't consider calling MoveNext and
testing EOF, which could be consequential, but since we're talking

My understanding has always been that the entire process of moving through a
recordset is very processor-intensive given all the "stuff" that has to go
on under-the-hood. I have never seen a test result showing a recordset loop
to be faster than using GetRows. I'll be interested to see what you come up
with.
 
B

Bob Barrows [MVP]

Mark said:
I think use of the word "definitive" here is something of an
overstatement.

For one thing, there is an optimization technique available to the
recordset test that was not presented, that being explicit creation
of field objects, which will improve field value access times that
rival the array method. And even if field access is minutely slower,
the overhead of allocating space for a complete, sometimes redundant
copy of the data as an array can make the recordset method faster
overall, depending on number of records, cursor, etc.
I'm doing some tests now in VB6 that seem to bear out what you are saying.

Of course, this does not address the benefit of being able to discard the
cursor and close the connection immediately after doing the GetRows, but
some of that benefit can be achieved by using a client-side cursor and
disconnecting it. I may need to do some rethinking here. Thanks for jumping
in. I've never seen any data to refute the "arrays are faster" contention.

Bob Barrows
 
C

Chris Hohmann

Mark J. McGinty said:
I think use of the word "definitive" here is something of an
overstatement.
Agreed. The article's intent is not to dictate which approach is best.
That's stated both at the beginning and end of the article. I do however
appreciate Bob's sentiment.

For one thing, there is an optimization technique available to the
recordset test that was not presented, that being explicit creation of
field objects, which will improve field value access times that rival the
array method.

This is very interesting. I use the WITH construct to create a pointer to
the Recordset.Fields collection to minimize object model traversal, but it
makes sense that pointers to the Field objects themselves would be even
better. I reworked my example to use explicit field object references and it
did improve recordset iteration performance. However...

And even if field access is minutely slower, the overhead of allocating
space for a complete, sometimes redundant copy of the data as an array can
make the recordset method faster overall, depending on number of records,
cursor, etc.

Unfortunately, I could not verify this assertion. I tested 1, 10, 100, 1000,
10000 rows and as the number of rows increased, the measure by which
GetString/GetRows outperformed recordset iteration increased as well. Do you
have an example of where recordset iteration outperforms GetString/GetRows?
If so, could you post it here?

Yet another optimization technique overlooked is to access the fields
using their ordinal positions rather than their names. This isn't quite
as fast as creating field objects, but is considerably faster than using
names.

I do use ordinal positions in the article.

The article approached this concept in the GetRows section, except that it
talked about using a dictionary object as a lookup for field positions.
It didn't bother to delve into the performance hit *that* would incur --
for sure there would be COM object access overhead... simple constants
would be much faster.

Agreed, constants would be better than a dictionary object. Better still
would be to use ordinal positions and include a field name map in a comment
section.

For another thing, the GetString test doesn't properly belong in the set,
it's too dissimilar from the others. Yes it can be faster [depending on
cursor] because it makes far fewer allocations, and avoids the overhead of
iteration in ASP code. But if you need to process something in each row
[and said processing cannot be performed in the SQL statement] it's not an
option.

I disagree. The basis of the article was to compare various ways of
retrieving and display recordset data. And in that context, GetString
qualifies. I do address shortcomings of the GetString approach including
those you cited.

Further, depending on the provider and cursor used to source it, GetString
can actually be quite a bit slower than iterating. (In principle I agree
that calling GetString on a recordset opened with a cursor other than a
firehose is usually nearly insane, but there are plausible scenarios.)

Again, I am unable to verify this assertion. Could you provide an example of
recordset iteration outperforming GetString?

Almost lastly, this is in essence a comparison of field data access
techniques, not a test of connection pooling or query plan cache... point
being that including object creation/opening time in the results tends to
cloud the issue, as well as subject the test to potential skew from
transient server-side factors.

I included object creation/opening time because none of the methods exists
in isolation. They all require the retrieval of a recordset. In fact,
presenting the performance data without taking into account the cost of
retrieving the recordset would actually unfairly skew the result in favor of
GetString/GetRows. Perhaps a good middle ground would be to include timing
data for recordset retrieval separately and then allow the reader to decide
if it should factor into their decision making process.

And lastly, the difference between the winner and the loser is 31.25 ms.
The resolution of Timer is going to be 15 ms by default on an SMP system,
10 ms on a single CPU system, so relative conclusions such as method A is
x% faster than method B aren't going to tend to be highly accurate.

I think that reporting timing data for recordset retrieval separately would
address these concerns.

The test should instead have an inner loop that redundantly accesses the
fields of each row [n] times, because that's what's really at the core of
this benchmark.

Of course this would incur the loop overhead that the article was so
concerned with, but that, I believe, is infantissimal: a quick test on my
system indicates that it is less than 0.0002 ms per iteration (using a for
loop.) Hmm, that doesn't consider calling MoveNext and testing EOF, which
could be consequential, but since we're talking about an inner loop here,
not relevant. Here's the code I used:

dim t, i, t2 : i = 0 : t = Timer
for i = 0 to 1000000 : next : t2 = Timer
Response.Write Left(CStr(((Timer - t) * 1000000.0) / CDbl(i)), 5) & _
" microseconds per iteration"

I don't feel I was "so concerned" with loop overhead. It was one of many
considerations, but it was not overriding. You're correct, the inner loop
does not consider the MoveNext or EOF calls. However, I do believe it's
relevant because the inner loop is not representative of how data from a
recordset is accessed. Normally, there is a single pass on the recordset and
occasionally there are multiple passes, but I am hard pressed to imagine a
scenario where a field is redundantly accessed for each row. I such cases,
the field value would be stored to a local variable.

Maybe I'll construct an enhanced version of the drag-race... not today
though. :)

I look forward to it.
 
M

Mark J. McGinty

Chris,

I'll respond in more detail over the weekend, but a couple of quick items:

Do you think that testing VBS script in a scripting control (hosted in a VB
app) would be an "apples to apples" test, compared to VBS in ASP? Reason
being it would be possible to obtain more precise timing data by calling
QueryPerformanceCounters. There would be some call overhead involved
(comparable to calling COM, to make the call out to its host from within the
script control), but I'm thinking it can be isolated and subtracted from the
results.

Not sure whether they use the same core script processing engine -- surely
there are some threadedness differences in ASP's implementation. Maybe I
should just create a quickie COM object and call it from ASP... that would
probably be just a viable.

The other thing is that some of my observations/tests were made/performed on
server boxes under some load, and/or having been up for months and thus
subject to a fair degree of memory fragmentation. To me memory allocation
performance is one of ASP's weakest aspects. Depending on circumstances it
can fluctuate wildly. (In the same vein, though not specific to ASP, if the
system is starved for RAM, and has to do frequent paging ops, anything
involving a large allocation will perform badly and scale even worse. )

Also (as you mentioned in your article) the way a given technique scales is
(as pool players would say) the "money ball" in real life ASP. What runs
fast as a single instance when the machine is otherwise idle may degrade
rapidly under load. I'm fairly certain it's exactly that allocation
performance that can cause the array approach to scale negatively. Long
story short, I'll do my best to make it reproducible.

Lastly, as for the real life occurrence of an inner loop to re-read the same
data multiple times, I agree completely that such constructs won't be found
in sane production code. The purpose was strictly an attempt to get more
accurate timing data given the resolution of available timers. I realize it
doesn't always work out right -- some old C/C++ compilers might even
optimize the redundancy away as "loop invariant"... Nobody said profiling
was easy! :)

The point of it all is accuracy when making comparative judgments. If one
op is timed at 15 ms, and another is timed at 30 ms using 15 ms timer
resolution, it can't be said conclusively that the first took half as long
as the second, it can only be said that the first took less than 15 ms, and
the second took more than 15 ms but less than 30 ms


-Mark



Chris Hohmann said:
Mark J. McGinty said:
I think use of the word "definitive" here is something of an
overstatement.
Agreed. The article's intent is not to dictate which approach is best.
That's stated both at the beginning and end of the article. I do however
appreciate Bob's sentiment.

For one thing, there is an optimization technique available to the
recordset test that was not presented, that being explicit creation of
field objects, which will improve field value access times that rival the
array method.

This is very interesting. I use the WITH construct to create a pointer to
the Recordset.Fields collection to minimize object model traversal, but it
makes sense that pointers to the Field objects themselves would be even
better. I reworked my example to use explicit field object references and
it did improve recordset iteration performance. However...

And even if field access is minutely slower, the overhead of allocating
space for a complete, sometimes redundant copy of the data as an array
can make the recordset method faster overall, depending on number of
records, cursor, etc.

Unfortunately, I could not verify this assertion. I tested 1, 10, 100,
1000, 10000 rows and as the number of rows increased, the measure by which
GetString/GetRows outperformed recordset iteration increased as well. Do
you have an example of where recordset iteration outperforms
GetString/GetRows? If so, could you post it here?

Yet another optimization technique overlooked is to access the fields
using their ordinal positions rather than their names. This isn't quite
as fast as creating field objects, but is considerably faster than using
names.

I do use ordinal positions in the article.

The article approached this concept in the GetRows section, except that
it talked about using a dictionary object as a lookup for field
positions. It didn't bother to delve into the performance hit *that*
would incur -- for sure there would be COM object access overhead...
simple constants would be much faster.

Agreed, constants would be better than a dictionary object. Better still
would be to use ordinal positions and include a field name map in a
comment section.

For another thing, the GetString test doesn't properly belong in the set,
it's too dissimilar from the others. Yes it can be faster [depending on
cursor] because it makes far fewer allocations, and avoids the overhead
of iteration in ASP code. But if you need to process something in each
row [and said processing cannot be performed in the SQL statement] it's
not an option.

I disagree. The basis of the article was to compare various ways of
retrieving and display recordset data. And in that context, GetString
qualifies. I do address shortcomings of the GetString approach including
those you cited.

Further, depending on the provider and cursor used to source it,
GetString can actually be quite a bit slower than iterating. (In
principle I agree that calling GetString on a recordset opened with a
cursor other than a firehose is usually nearly insane, but there are
plausible scenarios.)

Again, I am unable to verify this assertion. Could you provide an example
of recordset iteration outperforming GetString?

Almost lastly, this is in essence a comparison of field data access
techniques, not a test of connection pooling or query plan cache... point
being that including object creation/opening time in the results tends to
cloud the issue, as well as subject the test to potential skew from
transient server-side factors.

I included object creation/opening time because none of the methods exists
in isolation. They all require the retrieval of a recordset. In fact,
presenting the performance data without taking into account the cost of
retrieving the recordset would actually unfairly skew the result in favor
of GetString/GetRows. Perhaps a good middle ground would be to include
timing data for recordset retrieval separately and then allow the reader
to decide if it should factor into their decision making process.

And lastly, the difference between the winner and the loser is 31.25 ms.
The resolution of Timer is going to be 15 ms by default on an SMP system,
10 ms on a single CPU system, so relative conclusions such as method A is
x% faster than method B aren't going to tend to be highly accurate.

I think that reporting timing data for recordset retrieval separately
would address these concerns.

The test should instead have an inner loop that redundantly accesses the
fields of each row [n] times, because that's what's really at the core of
this benchmark.

Of course this would incur the loop overhead that the article was so
concerned with, but that, I believe, is infantissimal: a quick test on
my system indicates that it is less than 0.0002 ms per iteration (using a
for loop.) Hmm, that doesn't consider calling MoveNext and testing EOF,
which could be consequential, but since we're talking about an inner loop
here, not relevant. Here's the code I used:

dim t, i, t2 : i = 0 : t = Timer
for i = 0 to 1000000 : next : t2 = Timer
Response.Write Left(CStr(((Timer - t) * 1000000.0) / CDbl(i)), 5) & _
" microseconds per iteration"

I don't feel I was "so concerned" with loop overhead. It was one of many
considerations, but it was not overriding. You're correct, the inner loop
does not consider the MoveNext or EOF calls. However, I do believe it's
relevant because the inner loop is not representative of how data from a
recordset is accessed. Normally, there is a single pass on the recordset
and occasionally there are multiple passes, but I am hard pressed to
imagine a scenario where a field is redundantly accessed for each row. I
such cases, the field value would be stored to a local variable.

Maybe I'll construct an enhanced version of the drag-race... not today
though. :)

I look forward to it.
 
M

Mark J. McGinty

I went ahead and created a COM object that calls QueryPerformanceFrequency
and QueryPerformanceCounters, making available a timer mechanism that has an
effective resolution equal to the system CPU frequency. It returns results
as a double in your choice of seconds, microseconds or nanoseconds.

The call to the timer itself seems to have an overhead of about 3-6
microseconds (on my system, in it's current state, your mileage may vary.)
(Somehow I was thinking COM overhead was more than that...)

To verify its accuracy I executed the following script:

set hrt = CreateObject("HighResTimer.HRTimer")
hrt.StartTimer
t = Timer
for i = 0 to 10000000 : Next
t2 = Timer
WScript.Echo hrt.Seconds
WScript.Echo t2 - t

The results were within a few milliseconds of each other -- expectable as
the outer timer construct results include overhead incurred by the inner
one. Results were also consistent at 100M and 1M iterations (much below 1M
is beyind the resolution of Timer.)

I've made this component available with C++ source at the following address:
http://www.databoundzone.com/HighResTimer.zip. (DLL is included, must be
registered in the standard way, sources require VS6 or better to build.)
Not counting the part that was generated by the IDE, it's all of 71 lines
(not including 13 lines of white space and 31 lines of file description and
disclaimer.)

Hmm, it occurs to me I've needed this component for a very long time...
thanks for inspiring it! :)

In case you're wondering, databoundzone.com is a domain I registered, that
hopefully will soon be the home of a website dedicated to databound UI
elements in VB and HTML, the magic and mystery of persistance provider XML,
and ADO in general.

Lastly I have done some ADO profiling as part of testing this component:
for test data I used master.information_schema.Tables. I found that the
bare call to GetRows takes between 25% to 50% less time than GetString
(without taking iteration into account.)

However, when all physical memory has been allocated, and the memory manager
must virtualize new allocations, I observed GetString taking as much as 88
times longer than GetRows! I believe it's because GetRows allocates a
series of smaller buffers, while GetString must allocate a contiguous
buffer.

For a table with all fields of a character type, GetRows must surely consume
more total memory (space for pointers to array elements as well as the data
itself) though it would be possible for GetRows to consume less total
memory, if most of it's fields were numeric (as binary representations of
most numbers take less space than text representations of the same numbers.)
But I don't think the total is of much significance, I think the size of the
largest contiguous piece required is the issue.

I will do some more testing tomorrow.

-Mark
 
B

Bob Barrows [MVP]

Mark said:
I went ahead and created a COM object that calls
QueryPerformanceFrequency and QueryPerformanceCounters, making
available a timer mechanism that has an effective resolution equal to
the system CPU frequency. It returns results as a double in your
choice of seconds, microseconds or nanoseconds.

The call to the timer itself seems to have an overhead of about 3-6
microseconds (on my system, in it's current state, your mileage may
vary.) (Somehow I was thinking COM overhead was more than that...)

To verify its accuracy I executed the following script:

set hrt = CreateObject("HighResTimer.HRTimer")
hrt.StartTimer
t = Timer
for i = 0 to 10000000 : Next
t2 = Timer
WScript.Echo hrt.Seconds
WScript.Echo t2 - t

The results were within a few milliseconds of each other --
expectable as the outer timer construct results include overhead
incurred by the inner one. Results were also consistent at 100M and
1M iterations (much below 1M is beyind the resolution of Timer.)

I've made this component available with C++ source at the following
address: http://www.databoundzone.com/HighResTimer.zip. (DLL is
included, must be registered in the standard way, sources require VS6
or better to build.) Not counting the part that was generated by the
IDE, it's all of 71 lines (not including 13 lines of white space and
31 lines of file description and disclaimer.)

Hmm, it occurs to me I've needed this component for a very long
time... thanks for inspiring it! :)

In case you're wondering, databoundzone.com is a domain I registered,
that hopefully will soon be the home of a website dedicated to
databound UI elements in VB and HTML, the magic and mystery of
persistance provider XML, and ADO in general.

Lastly I have done some ADO profiling as part of testing this
component: for test data I used master.information_schema.Tables. I
found that the bare call to GetRows takes between 25% to 50% less
time than GetString (without taking iteration into account.)

However, when all physical memory has been allocated, and the memory
manager must virtualize new allocations, I observed GetString taking
as much as 88 times longer than GetRows! I believe it's because
GetRows allocates a series of smaller buffers, while GetString must
allocate a contiguous buffer.

For a table with all fields of a character type, GetRows must surely
consume more total memory (space for pointers to array elements as
well as the data itself) though it would be possible for GetRows to
consume less total memory, if most of it's fields were numeric (as
binary representations of most numbers take less space than text
representations of the same numbers.) But I don't think the total is
of much significance, I think the size of the largest contiguous
piece required is the issue.

I will do some more testing tomorrow.

-Mark

All I can say is: wow!
I wish I had the time to spend on something like this (I'm barely keeping my
head above water). Thanks for taking interest.

Regarding your results, I've always been surprised that, given the poor
string-handling inherent in vb/vba/vbscript, GetString has always
outperformed GetRows in every test I've ever run or read about. I think your
results have helped resolve this apparent contradiction: the initial call to
the function vs. the handling of the data after the call. The array is
loaded by GetRows very quickly, but looping through the array to build an
output string takes longer than simply writing the result of GetString. That
definitely makes sense now.

As far as the issue of disconnecting the recordset goes, my initial reaction
was that I would have to give up using a firehose cursor (only client-side
static cursors can be disconnected): but now that I think about it, I
realize that this concern is illusory. By transferring the data into another
structure (the array), one is, in essence, creating a quasi client-side,
static cursor. When ADO builds a client-side static cursor, it uses a
firehose cursor to populate the client-side cursor structure.

So, all that remains is to see if looping through a disconnected recordset
(efficiently, using the Field objects to grab the data, or the undocumented
Collect property) can perform as well as an array loop. I look forward to
your results.


Bob Barrows
 
C

Chris Hohmann

Mark J. McGinty said:
Chris,

I'll respond in more detail over the weekend, but a couple of quick items:

Do you think that testing VBS script in a scripting control (hosted in a
VB app) would be an "apples to apples" test, compared to VBS in ASP?
Reason being it would be possible to obtain more precise timing data by
calling QueryPerformanceCounters. There would be some call overhead
involved (comparable to calling COM, to make the call out to its host from
within the script control), but I'm thinking it can be isolated and
subtracted from the results.

Not sure whether they use the same core script processing engine -- surely
there are some threadedness differences in ASP's implementation. Maybe I
should just create a quickie COM object and call it from ASP... that would
probably be just a viable.

Yes, I think creating a COM and calling it from ASP would be better.

The other thing is that some of my observations/tests were made/performed
on server boxes under some load, and/or having been up for months and thus
subject to a fair degree of memory fragmentation. To me memory allocation
performance is one of ASP's weakest aspects. Depending on circumstances
it can fluctuate wildly. (In the same vein, though not specific to ASP,
if the system is starved for RAM, and has to do frequent paging ops,
anything involving a large allocation will perform badly and scale even
worse. )

Also (as you mentioned in your article) the way a given technique scales
is (as pool players would say) the "money ball" in real life ASP. What
runs fast as a single instance when the machine is otherwise idle may
degrade rapidly under load. I'm fairly certain it's exactly that
allocation performance that can cause the array approach to scale
negatively. Long story short, I'll do my best to make it reproducible.

I've been giving this a lot of thought since this thread began. So much so
that I looked up some of your past posts on the topic of GetRows.
Specifically, there's this thread:
http://groups-beta.google.com/group..._frm/thread/aea6333dc48e0ff5/f9a7c0bfa3fa2c5a

You make a very interesting point about the meory allocation requirements
for GetRows. I had never considered the fact that the values themsleves are
variant and as such take up much more memory than would be expected.
However, the field values in a Recordset are also stored as variants. I
believe the reason a Recordset requires less memory is because only a "page"
worth of data is loaded at any one time. To that end, one could simulate
this in GetString/GetRows by pulling data in chunk which are equivalent to
the pagesize of the recordset. In this way you could reduce the memory
footprint of the GetString/GetRows apprach, yet still be able to sidestep
the reentrant object model traversal required for recordset iteration.

Lastly, as for the real life occurrence of an inner loop to re-read the
same data multiple times, I agree completely that such constructs won't be
found in sane production code. The purpose was strictly an attempt to get
more accurate timing data given the resolution of available timers. I
realize it doesn't always work out right -- some old C/C++ compilers might
even optimize the redundancy away as "loop invariant"... Nobody said
profiling was easy! :)

The point of it all is accuracy when making comparative judgments. If one
op is timed at 15 ms, and another is timed at 30 ms using 15 ms timer
resolution, it can't be said conclusively that the first took half as long
as the second, it can only be said that the first took less than 15 ms,
and the second took more than 15 ms but less than 30 ms

Agreed. The COM timer sounds like a great idea. Having said that, I believe
that instead introducing an inner loop, the timer calls should be moved
outside the loop. Something like:

Start Timer
Call GetStrings function 10,000 times
End Timer

Start Timer
Call GetRows function 10,000 times
End Timer

Start Timer
Call Recorset Iteration 10,000 times
End Timer
 
C

Chris Hohmann

Mark J. McGinty said:
Lastly I have done some ADO profiling as part of testing this component:
for test data I used master.information_schema.Tables. I found that the
bare call to GetRows takes between 25% to 50% less time than GetString
(without taking iteration into account.)

However, when all physical memory has been allocated, and the memory
manager must virtualize new allocations, I observed GetString taking as
much as 88 times longer than GetRows! I believe it's because GetRows
allocates a series of smaller buffers, while GetString must allocate a
contiguous buffer.

Actually, I think the reason GetString, and any string concatenation for
that matter, falls down is that it uses so many redundant buffers. For
example

string1 & string2 & string3 & string4 & string5 ...

translates into the following:
Create copy of string1
Append copy of string2 to copy of string1
Create copy of string1+string2
Append copy of string3 to copy of string1+string2
..
..
..

All the intermediate buffers don't get cleaned up until after the
concatenation operation is completed. This translates to an O(n^2) -"order n
squared"- performance based on the number of concatenations. Here's an
article that goes into some more detail about that:
http://www.sql-server-performance.com/string_concat_article.asp

For a table with all fields of a character type, GetRows must surely
consume more total memory (space for pointers to array elements as well as
the data itself) though it would be possible for GetRows to consume less
total memory, if most of it's fields were numeric (as binary
representations of most numbers take less space than text representations
of the same numbers.) But I don't think the total is of much significance,
I think the size of the largest contiguous piece required is the issue.

I don't think it could every take less space. As you yourself pointed out in
the past, data values stored in VB Safe Arrays are stored as variants so the
binary representation do not come into play.

I will do some more testing tomorrow.

I'm looking forward to it.
 
B

Bob Barrows [MVP]

Chris said:
Agreed. The COM timer sounds like a great idea. Having said that, I
believe that instead introducing an inner loop, the timer calls
should be moved outside the loop. Something like:

Start Timer
Call GetStrings function 10,000 times
End Timer

Start Timer
Call GetRows function 10,000 times
End Timer

Start Timer
Call Recorset Iteration 10,000 times
End Timer

That's exactly what I did last Thurs in my VB tests. I was using the API
GetTickCount function to do the timing. My results were rather surprising to
me. Again, I'm looking forward to Mark's results.

Bob Barrows
 
C

Chris Hohmann

Bob Barrows said:
That's exactly what I did last Thurs in my VB tests. I was using the API
GetTickCount function to do the timing. My results were rather surprising
to
me. Again, I'm looking forward to Mark's results.

Can you post your code?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top