SubQuerying Vs Joining

A

Awah Teh

Which is most effective (consider the query below for an example)?
Please give me some explanation (FMI:) as to why one is faster over the
other.


--Both Queries are designed to get the users information of users that
received passes in the Year of 2003

--** ***************
--** QUERY 1
--** ***************
Select * from users where user_id in (select user_id from
users_site_passes where date_pass_issued >= 'January 1, 2003')


--** ***************
--** QUERY 2
--** ***************
Select users.* from users, users_site_passes where users.user_id =
users_site_passes.user_id and users_site_passes.date_pass_issued >= 'January
1, 2003'

Thanks In Advance
A-

--
Awah Teh
Chief Executive Officer
DigicentriQ Technologies, LLC
(e-mail address removed)
www.digicentriq.com
877 675 4742
805 732 9421
 
A

Andrew J. Kelly

Awah,

First off the one that is most likely to be fastest is not shown. This
would be an EXISTS statement.

Select * from users AS b where EXISTS (select * from
users_site_passes AS a where a.User_id = b.User_ID AND a.date_pass_issued
= 'January 1, 2003')

Exists will stop looking after the first match is found where as the other
two might do more work. To answer your original question though they may in
fact be the same. Sometimes the optimizer will create similar plans on
queries such as these. Here are some general comments though:

Get in the habit of using the ANSI JOIN syntax like this:

Select users.* from users INNER JOIN users_site_passes
ON users.user_id = users_site_passes.user_id and
users_site_passes.date_pass_issued

When dealing with DATE strings you should use the ANSI syntax as well to
avoid issues where sql server can misinterpret the date. It goes like
this: 'yyyymmdd' Always has 8 chars and no dashes, slashes etc.
 
R

Rich Dillon

Awah,

I assume that you have a one-to-many relationship here; that <user_id> is a
key for <users> and that <users_site_passes> may have several rows for any
one user. In that case your queries aren't equivelant in the first place.
These three are.

SELECT *
FROM users
WHERE user_id IN (
SELECT user_id
FROM users_site_passes
WHERE date_pass_issued >= '2003-01-01');

SELECT *
FROM users AS u
WHERE EXISTS (
SELECT *
FROM users_site_passes
WHERE user_id=u.user_id AND date_pass_ussued >= '2003-01-01');

SELECT DISTINCT u.*
FROM users AS u JOIN users_site_passes AS p ON u.user_id=p.user_id
WHERE p.date_pass_issued >= '2003-01-01';

Since these three queries define the same result, an ideal optimizer would
produce the same execution plan for each. There's no good reason that it
should produce a better plan for one than for the others. In practice, SQL
Server often does. Have a look at the execution plans in QA to see which
works out best in your environment. Keep in mind, though, that the
difference you see isn't the necessary result of any law of nature but,
rather, an accident of the current state of SQL Server development and of
the particulars of your environment.


Hope that helps,
Rich
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top