The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.
Select columnA, columnB
From table as qryA
LEFT JOIN
(Select columnA, columnB
From table
where columnB <> 'sec999') as qryB
ON qryA.columnA = qryB.columnA
Where gryB.columnA is null
Is how I would do it.
Or try the minus keyword instead of "left join <tableB>... where <tableB>.<key> is null" if your version of sql supports it.
I'd personally do it like so (obviously syntax varies a little between products):
select columnA, columnB from table
where columnA not in (select distinct columnA from table where columnB = 'sec999')
That's pretty much equivalent and much easier to read.
I have a habit of avoiding the IN syntax if the subquery could possibly return a large result. I found that in mssql 2000 it performed very slowly, but they did something in later versions that made it so there's very difference in performance. (This is just an experience thing, I haven't actually read anything that says use one over the other).
NOT IN will work, but a correlated subquery using NOT EXISTS will often give you much better performance. Like so:
SELECT ColumnA FROM table t WHERE NOT EXISTS
(SELECT 1 FROM table WHERE ColumnA = t.ColumnA AND ColumnB = 'sec999');
Especially if you're working with a large data set, using a non-correlated subquery and a NOT IN clause often forces your database to test each value in the outer query against every value in the subquery's result set before it can positively say "No, this value is not in that set". Say your table had a million rows, and a quarter-million of them have "sec999" in ColumnB. With the NOT IN query, for every row in your table the query compares the ColumnA value to the set of 250,000 values returned by the subquery. When ColumnA does exist in the subquery set, it will stop the comparisons once it's found a match, but in many cases it will have to make all 250,000 comparisons before it can affirm that ColumnA doesn't exists in the subquery set. Worst case scenario in this example would be upwards of 200 trillion compare operations, I think.
With the correlated subquery, your RDBMS can pull out that arbitrary "1" value quickly, especially if you're working with indexed columns. So using the million-row table example above, instead of having to make up to 250,000 comparisons for each row in the table, the RDBMS just has to run the correlated subquery once per row and see if the result set is empty or not. Exactly how big your table needs to be before you'll really notice the performance improvement is going to depend on your DB server's I/O, CPU and memory speed, and probably your flavor of RDBMS. On the 9i, 10g or 11g Oracle systems I work with, using relatively recent hardware, I find on any table with more than a couple thousand rows a NOT EXISTS query will be noticeably faster than a NOT IN equivalent.
NOT IN will work, but a correlated subquery using NOT EXISTS will often give you much better performance. Like so:
SELECT ColumnA FROM table t WHERE NOT EXISTS
(SELECT 1 FROM table WHERE ColumnA = t.ColumnA AND ColumnB = 'sec999');
Especially if you're working with a large data set, using a non-correlated subquery and a NOT IN clause often forces your database to test each value in the outer query against every value in the subquery's result set before it can positively say "No, this value is not in that set". Say your table had a million rows, and a quarter-million of them have "sec999" in ColumnB. With the NOT IN query, for every row in your table the query compares the ColumnA value to the set of 250,000 values returned by the subquery. When ColumnA does exist in the subquery set, it will stop the comparisons once it's found a match, but in many cases it will have to make all 250,000 comparisons before it can affirm that ColumnA doesn't exists in the subquery set. Worst case scenario in this example would be upwards of 200 trillion compare operations, I think.
With the correlated subquery, your RDBMS can pull out that arbitrary "1" value quickly, especially if you're working with indexed columns. So using the million-row table example above, instead of having to make up to 250,000 comparisons for each row in the table, the RDBMS just has to run the correlated subquery once per row and see if the result set is empty or not. Exactly how big your table needs to be before you'll really notice the performance improvement is going to depend on your DB server's I/O, CPU and memory speed, and probably your flavor of RDBMS. On the 9i, 10g or 11g Oracle systems I work with, using relatively recent hardware, I find on any table with more than a couple thousand rows a NOT EXISTS query will be noticeably faster than a NOT IN equivalent.
I thank you for this information and will try to remember this on my database.
Posts
Select columnA, columnB
from table
where columnB != 'sec999'
? I thought != exists.
e: oooh I get what you're saying now.
Is how I would do it.
Or try the minus keyword instead of "left join <tableB>... where <tableB>.<key> is null" if your version of sql supports it.
That's pretty much equivalent and much easier to read.
I have a habit of avoiding the IN syntax if the subquery could possibly return a large result. I found that in mssql 2000 it performed very slowly, but they did something in later versions that made it so there's very difference in performance. (This is just an experience thing, I haven't actually read anything that says use one over the other).
Especially if you're working with a large data set, using a non-correlated subquery and a NOT IN clause often forces your database to test each value in the outer query against every value in the subquery's result set before it can positively say "No, this value is not in that set". Say your table had a million rows, and a quarter-million of them have "sec999" in ColumnB. With the NOT IN query, for every row in your table the query compares the ColumnA value to the set of 250,000 values returned by the subquery. When ColumnA does exist in the subquery set, it will stop the comparisons once it's found a match, but in many cases it will have to make all 250,000 comparisons before it can affirm that ColumnA doesn't exists in the subquery set. Worst case scenario in this example would be upwards of 200 trillion compare operations, I think.
With the correlated subquery, your RDBMS can pull out that arbitrary "1" value quickly, especially if you're working with indexed columns. So using the million-row table example above, instead of having to make up to 250,000 comparisons for each row in the table, the RDBMS just has to run the correlated subquery once per row and see if the result set is empty or not. Exactly how big your table needs to be before you'll really notice the performance improvement is going to depend on your DB server's I/O, CPU and memory speed, and probably your flavor of RDBMS. On the 9i, 10g or 11g Oracle systems I work with, using relatively recent hardware, I find on any table with more than a couple thousand rows a NOT EXISTS query will be noticeably faster than a NOT IN equivalent.
I thank you for this information and will try to remember this on my database.