So, let's say I have the following table
ColumnA | Column B
ValueA | sec999
ValueA | sec000
ValueB | sec000
I want to return all rows that don't have sec999 in ColumnB. But, since ValueA has sec999 and sec000, I don't want that to appear in my results.
So pretty much, the above code should just return
ValueB | sec000
Select columnA, columnB
from table
where columnB <> 'sec999'
that code returns
ValueA | sec000
ValueB | sec000
but, I just want
ValueB | sec000
how would I do it?
Posts
Select columnA, columnB
from table
where columnB != 'sec999'
? I thought != exists.
e: oooh I get what you're saying now.
Is how I would do it.
Or try the minus keyword instead of "left join <tableB>... where <tableB>.<key> is null" if your version of sql supports it.
That's pretty much equivalent and much easier to read.
I have a habit of avoiding the IN syntax if the subquery could possibly return a large result. I found that in mssql 2000 it performed very slowly, but they did something in later versions that made it so there's very difference in performance. (This is just an experience thing, I haven't actually read anything that says use one over the other).
Especially if you're working with a large data set, using a non-correlated subquery and a NOT IN clause often forces your database to test each value in the outer query against every value in the subquery's result set before it can positively say "No, this value is not in that set". Say your table had a million rows, and a quarter-million of them have "sec999" in ColumnB. With the NOT IN query, for every row in your table the query compares the ColumnA value to the set of 250,000 values returned by the subquery. When ColumnA does exist in the subquery set, it will stop the comparisons once it's found a match, but in many cases it will have to make all 250,000 comparisons before it can affirm that ColumnA doesn't exists in the subquery set. Worst case scenario in this example would be upwards of 200 trillion compare operations, I think.
With the correlated subquery, your RDBMS can pull out that arbitrary "1" value quickly, especially if you're working with indexed columns. So using the million-row table example above, instead of having to make up to 250,000 comparisons for each row in the table, the RDBMS just has to run the correlated subquery once per row and see if the result set is empty or not. Exactly how big your table needs to be before you'll really notice the performance improvement is going to depend on your DB server's I/O, CPU and memory speed, and probably your flavor of RDBMS. On the 9i, 10g or 11g Oracle systems I work with, using relatively recent hardware, I find on any table with more than a couple thousand rows a NOT EXISTS query will be noticeably faster than a NOT IN equivalent.
I thank you for this information and will try to remember this on my database.