Our new Indie Games subforum is now open for business in G&T. Go and check it out, you might land a code for a free game. If you're developing an indie game and want to post about it, follow these directions. If you don't, he'll break your legs! Hahaha! Seriously though.
Our rules have been updated and given their own forum. Go and look at them! They are nice, and there may be new ones that you didn't know about! Hooray for rules! Hooray for The System! Hooray for Conforming!

Oracle SQL help

urahonkyurahonky Registered User regular
edited February 2011 in Help / Advice Forum
Hey guys so I need some help. I'm working with Oracle (which, to me, feels more of a pain in the ass than MySQL but it's what the instructor is using) and we're on the topic of subqueries, which are my bane.

This is my main query:
SELECT c_last, SUM(inv_price)
FROM customer, inventory
WHERE c_id IN (SELECT UNIQUE c_id
               FROM orders
               WHERE o_id=1)
AND
INV_PRICE IN (SELECT UNIQUE INV_PRICE
              FROM inventory
              WHERE inv_id IN (SELECT order_line.inv_id
                               FROM order_line
                               WHERE o_id=1))
GROUP BY c_last;

Gets the last name of the customer, but the sum is way off. The sum is saying 879.68. Now I took apart the last part, and just ran:
SELECT UNIQUE INV_PRICE
              FROM inventory
              WHERE inv_id IN (SELECT order_line.inv_id
                               FROM order_line
                               WHERE o_id=1);

And the only two things returned are:

259.99
59.99

Why the hell am I getting 879.68? I'm at my wits end...

urahonky on
Games completed recently: Resident Evil 4: HD Edition, Typing of the Dead: Overkill, Sleeping Dogs, Dishonored, On the Rain Precipice of Darkness 1-4.

Posts

  • DocDoc Registered User, ClubPA regular
    edited February 2011
    I'd bet ten cents that inv_price also a column name in one of those tables.

  • urahonkyurahonky Registered User regular
    edited February 2011
    Well I tore apart my query statement and found out for some reason it was returning a bunch of different numbers, and unique helped out. But I have a question... If I need to get the sum of something, and there's a quantity involved, how do I do that?

    In the order_line table there is a row: OL_QUANTITY and one of them has a 2 instead of 1... How do I run it twice?

    e: Unfortunately you'd lose the 10 cents :P It's only in the inventory table.

    Games completed recently: Resident Evil 4: HD Edition, Typing of the Dead: Overkill, Sleeping Dogs, Dishonored, On the Rain Precipice of Darkness 1-4.
  • DocDoc Registered User, ClubPA regular
    edited February 2011
    Well in any case, the sum is summing all of your inv_prices in that table, not just the unique ones, because that's what you are telling it to do.

  • urahonkyurahonky Registered User regular
    edited February 2011
    Yeah I found that out the hard way, when I took apart my query again. I guess this is why you test each query before you place it in as a subquery. That way you could see it.

    Games completed recently: Resident Evil 4: HD Edition, Typing of the Dead: Overkill, Sleeping Dogs, Dishonored, On the Rain Precipice of Darkness 1-4.
  • VThornheartVThornheart Registered User regular
    edited February 2011
    Also, I wonder...

    it looks like your query is doing a cross join on customer and inventory.

    This may end up resulting in multiple duplicate rows with INV_PRICE, even if the inner query to select the price is UNIQUE...

    Think of it this way:

    if you have 1 customer and 2 inventory items and do a cross join, you're going to get 2 rows back: one for each combination of customer and inventory item.

    If you have 2 customers and 2 inventory items and do a cross join, you're going to get 4 rows back: again, one for each combination of customer and inventory item.

    If you have 2 customers and 2 inventory items and do an inner join on whatever relates them, you're going to only get rows back for customers that have bought those inventory items: it could be 4, if each customer bought each item. But more than likely, it will be less than that. Depending on the way the database is storing info, it could be *more* rows (for example, it looks like they can likely place orders multiple times for the same item, in which case multiple results may return).

    Now, the WHERE clauses look like they're singling out only rows from customers who've placed an order AND inventory items that were part of an order...

    ... BUT there's nothing in that query making sure that the order the customer placed that was pulled from that cross join is actually related to the order of the inventory item pulled from that cross join.

    As a result, you're going to get some bizzare mixes, as it counts orders for items that a customer never made (as long as that customer made SOME order SOMEWHERE, and that item was ordered SOMETIME by SOMEONE... doesn't matter who).

    Just a minute, let me see if I can provide more helpful advice. It may help if we had the schema for the tables involved in the question, can you provide that?

    EDIT: Also, it looks like those subqueries have a specific order number in them... was that intentional, or was that just for testing?

    EDIT 2: I think the real problem in your query is that the subqueries aren't doing what you're hoping that they'll do... they're not providing the full chain of relations between a customer and the inventory items they've purchased. I'm making a follow up post right now to try and explain, but I wanted to put this here really quick so my previous response isn't too confusing.

    $14,276 for Child's Play from the Cookie Brigade at PAX Prime 2011!!!
    Join the Cookie Brigade Forum for PAX Prime 2011 now! We need your help! If you decide to join, sign up at the Cookie Brigade website!
  • VThornheartVThornheart Registered User regular
    edited February 2011
    Let's start by breaking apart the sub queries and seeing what you'd really get out of them.
    (QUERY_1)
    SELECT order_line.inv_id
                                  FROM order_line
                                  WHERE o_id=1
    
    This is saying "Give me all order lines specifically from the Order with an ID of 1". I don't know if you necessarily want to be that restrictive, but let's just go with it for analysis purposes as that's what it was.
    (QUERY_2)
    SELECT UNIQUE INV_PRICE
                  FROM inventory
                  WHERE inv_id IN (QUERY_1))
    
    This is saying "give me the inventory price for each inventory item found, where we have found an order line for that inventory item that was in Order #1... and make sure that duplicate inventory prices are only returned once"

    At this point its beginning to look questionable: what if two inventory items have the same price?
    (QUERY_3)
    SELECT UNIQUE c_id
                  FROM orders
                  WHERE o_id=1
    
    This is saying "give me the Customer ID from Order #1". You don't need unique here necessarily, because Order #1 can only have one Customer ID... if you truly meant this to be restricted to just one order.
    SELECT c_last, SUM(inv_price)
    FROM customer, inventory
    WHERE c_id IN (QUERY_3)
    AND
    INV_PRICE IN (QUERY_2)
    GROUP BY c_last;
    
    This is saying "for each combination of customer and inventory, where the Customer was the customer of Order #1 and the price of the inventory item was one of the prices from Order #1, sum those prices.

    Do you see where this goes wrong?

    There's a few problems:

    1) Using the inventory price to identify the inventory item in the WHERE clause is problematic, as two items may share the same inventory price. Use the ID.
    2) The "FROM customer, inventory" is causing all sorts of chaos here that makes this both more complicated and more buggy than it needs to be: if this query ever works, it will likely be highly situational as a result. What is happening here is a cross join (not sure if they've covered that in the class yet?), which means that every combination of customer and inventory is being examined. This means that, when you get the resultant rows, you'll be getting one row not just for the customer who happened to place Order #1, but for every inventory item that happens to have the inventory price returned from QUERY_2. This may, by coincidence, be the number of rows that you want: but it will only be coinciental and situational.
    3) The moment you pull out the Order #1 specific clauses, these queries will fail entirely: suddenly you'll have a row returned for every customer who's ever placed an order, and every item that has ever been ordered, regardless of who ordered it or what order the item was a part of.
    4) (Just for reference) the UNIQUE clauses probably aren't actually helpful here, as the IN clause doesn't care whether it matches once, twice, or a million times: all its concerned with is whether the value passed in from the current row exists in the set provided. Its a red herring.

    With these three problems combined (and possibly more that I'm not seeing at the moment), you're going to want to potentially examine a different, easier solution that better fits the nature of the problem you're trying to solve.

    Have they talked about inner joins yet, by any chance? That would be the best approach, but if they haven't mentioned it you may not be allowed to use it.

    Let us know... I'll spoiler my inner join recommendation in case it isn't relevant due to what they've taught so far.
    Spoiler:

    My personal advice is, if at all possible, avoid using subqueries to dictate the relationships between tables in a query: and also be very careful of cross joins.

    Using subqueries to establish relationships between Order and Customer, for example, is both confusing and ultimately incorrect unless you do a very specific subquery as you did: if you were to have returned all Orders to make the query more generic, the IN clause wouldn't have been able to actually establish the link between Customers and Orders at all: only the knowledge that a customer had placed an order at some point in time.

    Cross joins are often dangerous, and often not what a person actually needs. When you are starting at point A (Customers, in this case), and want to get to point B (inventory prices), you need to examine the specific relationships that exist in every table that leads you directly between points A and B, and establish those relationships in the query such that a single row will result for each logical combination of A and B, and no extra or illogical rows will exist. A cross join combines every possible combination, however, which means that you've got a much harder task in your WHERE clause at that point to weed out the irrelevant rows.

    You can do it manually, wiring up the ID relationships in the WHERE clause: but INNER JOIN exists as a nice shortcut for that, and subqueries in the WHERE clause are not going to do the job of properly weeding out the "bad" combinations created by the cross join for the reasons stated above.

    I hope that this helps... please let me know if something's confusing or you need more information... and let us know if they've already covered JOINs and thus we can talk about them a bit more freely.

    $14,276 for Child's Play from the Cookie Brigade at PAX Prime 2011!!!
    Join the Cookie Brigade Forum for PAX Prime 2011 now! We need your help! If you decide to join, sign up at the Cookie Brigade website!
  • urahonkyurahonky Registered User regular
    edited February 2011
    Wow, holy crap VThorn, you're amazing. Thank you... We have learned about joins and stuff, but there's an exam on Tuesday and this is a perfect review.

    Games completed recently: Resident Evil 4: HD Edition, Typing of the Dead: Overkill, Sleeping Dogs, Dishonored, On the Rain Precipice of Darkness 1-4.
  • VThornheartVThornheart Registered User regular
    edited February 2011
    No problem, I'm glad to help! =)

    Just remember (I always try to keep this in mind when I make queries) the real essence of them... this applies to most basic types of "I have a specific question that needs a specific answer" queries, but not to like data mining kind of queries or other craziness:

    SELECT
    (*Define what you want in your results here: and remember [as in your question about handling quantity] that you can create new fields here that perform mathematical operations on already existing fields *)
    FROM
    (*Define the needed "chain of relationships" between all the tables from which you must obtain results here: 99 times out of 100, this will either be a single table or a set of JOINs, and almost always INNER JOINs. Only OUTER JOINs in very specific scenarios like when you want results even if data doesn't exist in a table you're joining with. Almost never the implied Cross Join (implied when you simply do "FROM x, y"), unless you truly want every possible combination of the two rows*)
    WHERE
    (*Define what results should be chosen or ignored from among the previously defined relationships: Remember than an IN clause with a subquery should be used for weeding out or including a subset of results, but not for defining relationships between specific data. Think of "IN" as a "filter", and not the establishment of a relation, and it will help!*)
    (OPTIONAL: GROUP BY if your results need to be grouped by one or more fields)

    Have at it, you're got this! =)

    $14,276 for Child's Play from the Cookie Brigade at PAX Prime 2011!!!
    Join the Cookie Brigade Forum for PAX Prime 2011 now! We need your help! If you decide to join, sign up at the Cookie Brigade website!
Sign In or Register to comment.