SQL problem

JeiceJeice regular
edited February 2010 in Help / Advice Forum
So, let's say I have the following table

ColumnA  |   Column B
ValueA      |   sec999
ValueA      |   sec000
ValueB      |    sec000


I want to return all rows that don't have sec999 in ColumnB. But, since ValueA has sec999 and sec000, I don't want that to appear in my results.

So pretty much, the above code should just return

ValueB | sec000
Select columnA, columnB
from table
where columnB <> 'sec999'

that code returns

ValueA | sec000
ValueB | sec000

but, I just want
ValueB | sec000

how would I do it?

Jeice on

Posts

  • PooshlmerPooshlmer Registered User regular
    edited February 2010
    Look up NOT IN

    Pooshlmer on
  • urahonkyurahonky Cynical Old Man Registered User regular
    edited February 2010
    Can't you do

    Select columnA, columnB
    from table
    where columnB != 'sec999'

    ? I thought != exists.

    e: oooh I get what you're saying now.

    urahonky on
  • exmelloexmello Registered User regular
    edited February 2010
    Select columnA, columnB
    From table as qryA
    LEFT JOIN
    (Select columnA, columnB
    From table
    where columnB <> 'sec999') as qryB
    ON qryA.columnA = qryB.columnA
    Where gryB.columnA is null
    

    Is how I would do it.

    Or try the minus keyword instead of "left join <tableB>... where <tableB>.<key> is null" if your version of sql supports it.

    exmello on
  • JHunzJHunz Registered User regular
    edited February 2010
    I'd personally do it like so (obviously syntax varies a little between products):
    select columnA, columnB from table
    where columnA not in (select distinct columnA from table where columnB = 'sec999')
    

    JHunz on
    bunny.gif Gamertag: JHunz. R.I.P. Mygamercard.net bunny.gif
  • exmelloexmello Registered User regular
    edited February 2010
    JHunz wrote: »
    I'd personally do it like so (obviously syntax varies a little between products):
    select columnA, columnB from table
    where columnA not in (select distinct columnA from table where columnB = 'sec999')
    

    That's pretty much equivalent and much easier to read.

    I have a habit of avoiding the IN syntax if the subquery could possibly return a large result. I found that in mssql 2000 it performed very slowly, but they did something in later versions that made it so there's very difference in performance. (This is just an experience thing, I haven't actually read anything that says use one over the other).

    exmello on
  • vonPoonBurGervonPoonBurGer Registered User regular
    edited February 2010
    NOT IN will work, but a correlated subquery using NOT EXISTS will often give you much better performance. Like so:
    SELECT ColumnA FROM table t WHERE NOT EXISTS
    (SELECT 1 FROM table WHERE ColumnA = t.ColumnA AND ColumnB = 'sec999');
    

    Especially if you're working with a large data set, using a non-correlated subquery and a NOT IN clause often forces your database to test each value in the outer query against every value in the subquery's result set before it can positively say "No, this value is not in that set". Say your table had a million rows, and a quarter-million of them have "sec999" in ColumnB. With the NOT IN query, for every row in your table the query compares the ColumnA value to the set of 250,000 values returned by the subquery. When ColumnA does exist in the subquery set, it will stop the comparisons once it's found a match, but in many cases it will have to make all 250,000 comparisons before it can affirm that ColumnA doesn't exists in the subquery set. Worst case scenario in this example would be upwards of 200 trillion compare operations, I think.

    With the correlated subquery, your RDBMS can pull out that arbitrary "1" value quickly, especially if you're working with indexed columns. So using the million-row table example above, instead of having to make up to 250,000 comparisons for each row in the table, the RDBMS just has to run the correlated subquery once per row and see if the result set is empty or not. Exactly how big your table needs to be before you'll really notice the performance improvement is going to depend on your DB server's I/O, CPU and memory speed, and probably your flavor of RDBMS. On the 9i, 10g or 11g Oracle systems I work with, using relatively recent hardware, I find on any table with more than a couple thousand rows a NOT EXISTS query will be noticeably faster than a NOT IN equivalent.

    vonPoonBurGer on
    Xbox Live:vonPoon | PSN: vonPoon | Steam: vonPoonBurGer
  • JHunzJHunz Registered User regular
    edited February 2010
    Yeah, not exists is probably significantly better (in large-dataset cases). That's what I get for posting SQL at 2 AM.

    JHunz on
    bunny.gif Gamertag: JHunz. R.I.P. Mygamercard.net bunny.gif
  • Gilbert0Gilbert0 North of SeattleRegistered User regular
    edited February 2010
    NOT IN will work, but a correlated subquery using NOT EXISTS will often give you much better performance. Like so:
    SELECT ColumnA FROM table t WHERE NOT EXISTS
    (SELECT 1 FROM table WHERE ColumnA = t.ColumnA AND ColumnB = 'sec999');
    

    Especially if you're working with a large data set, using a non-correlated subquery and a NOT IN clause often forces your database to test each value in the outer query against every value in the subquery's result set before it can positively say "No, this value is not in that set". Say your table had a million rows, and a quarter-million of them have "sec999" in ColumnB. With the NOT IN query, for every row in your table the query compares the ColumnA value to the set of 250,000 values returned by the subquery. When ColumnA does exist in the subquery set, it will stop the comparisons once it's found a match, but in many cases it will have to make all 250,000 comparisons before it can affirm that ColumnA doesn't exists in the subquery set. Worst case scenario in this example would be upwards of 200 trillion compare operations, I think.

    With the correlated subquery, your RDBMS can pull out that arbitrary "1" value quickly, especially if you're working with indexed columns. So using the million-row table example above, instead of having to make up to 250,000 comparisons for each row in the table, the RDBMS just has to run the correlated subquery once per row and see if the result set is empty or not. Exactly how big your table needs to be before you'll really notice the performance improvement is going to depend on your DB server's I/O, CPU and memory speed, and probably your flavor of RDBMS. On the 9i, 10g or 11g Oracle systems I work with, using relatively recent hardware, I find on any table with more than a couple thousand rows a NOT EXISTS query will be noticeably faster than a NOT IN equivalent.

    I thank you for this information and will try to remember this on my database.

    Gilbert0 on
Sign In or Register to comment.