sql server 2005 database tuning advisor yields 0% improvement (+ query optimization)

lujluj Registered User new member
edited June 2009 in Help / Advice Forum
I've been trying to speed up a set of queries in SQL Server 2005 to no avail.

The script I've written combines the use of both static tables which contain many rows of relational data and temporary tables which are created/destroyed within a single execution of the script.

For statements that only deal with temporary tables the advisor says "Statement does not reference any tables" (which makes sense I suppose). The advisor completes with "success" on all steps but unfortunately reports an "estimated improvement" of 0%.

I tried passing the script as a workload, I tried creating a trace file, both yield the same result.

It may be doing its job the way it is supposed to; that is, there may not be any way to help the performance.

However, if there is anything to suggest I try, please let me know.


On the other hand...

If the dta is in fact doing its job right, then perhaps the statement which I believe gives me the biggest hit in performance can be optimized:

(The static portion of the database is pretty simple; it contains relationships between Words, Titles, Articles, and Categories. The ArtCat table stores the relationship between Article IDs and Category IDs, TitWord table stores the relationship between Title IDs and Word IDs, etc... The only temporary table in use is #QueryWords which contains WIDs of interest.)
WHILE @i > 1
BEGIN
	INSERT INTO #Quads
	SELECT DISTINCT #QueryWords.WID, TACs.TID, TACs.AID, TACs.CID FROM
		(SELECT TAs.TID, TAs.AID, ArtCat.CID FROM
			(SELECT Ts.TID, TA.AID FROM
				(SELECT DISTINCT TitWord.TID FROM TitWord INNER JOIN #QueryWords ON TitWord.WID = #QueryWords.WID) AS Ts
				INNER JOIN TitArt AS TA ON Ts.TID = TA.TID
				WHERE
				(
					(SELECT COUNT(*) FROM TitWord WHERE TitWord.TID = Ts.TID AND TitWord.WID IN (SELECT WID FROM #QueryWords)) +
					(SELECT COUNT(*) FROM ArtWord WHERE ArtWord.AID = TA.AID AND ArtWord.WID IN (SELECT WID FROM #QueryWords)
						AND ArtWord.WID NOT IN
						(SELECT DISTINCT TW.WID FROM TitWord AS TW INNER JOIN #QueryWords AS QW ON TW.WID = QW.WID WHERE TW.TID = Ts.TID)
					)
				) = @i
			) AS TAs		
			INNER JOIN ArtCat ON TAs.AID = ArtCat.AID
		) AS TACs
		CROSS JOIN #QueryWords;
	
	IF @@ROWCOUNT > 0
		BREAK
	
	SET @i = @i - 1;
END


Any help is much appreciated.

luj on

Posts

  • exmelloexmello Registered User regular
    edited June 2009
    Take a look at the estimated query plan and get an idea about where the biggest chunk of execution time is going.

    Someone after me might reply with a superstisious premature optimization about changing count(*) to count(id) or something. The trick to performance optimization is finding out the single change that will change your query by the greatest order of magnitude, not shaving .0001ms off a 10 second query.

    Things that pop out at me are: That loop, how big is @i? That cross join, you do understand what a cross join is, right? I could explain why that is most likely bad, but you would be better off googling it and finding out yourself.

    Edit: The more I look at this, the more curious I am at what it is trying to accomplish. It is very verbose for such a simple relationship, but I don't know what to think without actually knowing what this procedure is trying to accomplish. Can you tell me what the goal of this code is? There may a much simpler solution.

    exmello on
  • lujluj Registered User new member
    edited June 2009
    Thanks exmello, I'll look into query plan analysis, or whatnot

    @i is typically between 1 and 5... sometimes as large as 7; not very large.

    The cross join is a necessary evil. For each intermediate T-A-C triple/row, a row exists in the final result with the triple + a query word for all query words (best explanation of my understanding of the definition of a cross join).

    Also, I began to wonder if a UNION would suffice; instead of counting those in TitWord then counting those in ArtWord that aren't in TitWord, count those in the union of the two. Thinking about it now, the amount of processing might be the similar. I don't know how to trust the optimizer yet. Either way, I'll try that too.

    Edit: the goal of the code is to get T-A pairs which encompass @i distinct WIDs in #QueryWords (through the relationships TitWord and ArtWord), create T-A-C triples through the ArtCat relationship, then end with W-T-A-C quads by simply cross-joining.

    It is quite verbose. I like explicit joins over the implicit joins (especially for queries this large) in order to spare people in the future (or now) from trying to decipher it. I inherited code that contained implicit joins and that changed my whole perspective. Indenting also helps.

    luj on
  • exmelloexmello Registered User regular
    edited June 2009
    I'm still thinking about this, it's fun trying to think outside of my own business domain. I don't have any idea for the query as a whole yet, but have you thought of wording

    SELECT COUNT(*) FROM ArtWord WHERE ArtWord.AID = TA.AID AND ArtWord.WID IN (SELECT WID FROM #QueryWords)
    AND ArtWord.WID NOT IN
    (SELECT DISTINCT TW.WID FROM TitWord AS TW INNER JOIN #QueryWords AS QW ON TW.WID = QW.WID WHERE TW.TID = Ts.TID)

    as a LEFT JOIN WHERE ____ IS NULL? Try that, union to the other Count(), and see what kind of difference that makes.

    Edit: Also, I don't know if Sql Server makes this optimization itself in the back-end, but are all of the inner DISTINCTs really necessary?

    Edit 2: Is there some way to bring everything in the WHERE (_____) = @i up from a WHERE clause into a join, and carry it up through the nested levels and don't filter @i = until the top level? Depending on how many Titles and TitArt you have, that may help the indexes do their job better.

    exmello on
  • lujluj Registered User new member
    edited June 2009
    I do think the count is the problem with the quickness of the execution. Queries took about 5 seconds before the WHERE statement was added, whereas they can now take 20 seconds to a minute and a half.

    Unfortunately, I don't know how a left-join would work in this case.

    I think I understand where you're coming from though: I've been joining on #QueryWords up until the count statement, where I for some reason say "WHERE WID IN (SELECT WID FROM #QueryWords" -- this accomplishes the same task. This might be because the TitWord and ArtWord tables are 100 000s of rows (maybe even a million).

    @ROWCOUNT > 0 BREAK.

    If there were some way to compare the unique set of WIDs from both TitWord and ArtWord (where the TID = Ts.TID and the AID = TA.AID, remember this is an insert) in less than 6 selects + 1 join, I'm sure there would be a nice improvement.

    A union would get rid of the INNER JOIN and I think a single SELECT statement, but the syntax is unacceptable to accomplish this:
    WHERE
        (SELECT COUNT(*) FROM
            (SELECT WID FROM TitWord WHERE TitWord.TID = Ts.TID AND WID IN (SELECT WID FROM #QueryWords)
                UNION
                SELECT WID FROM ArtWord WHERE ArtWord.AID = TA.AID AND WID IN (SELECT WID FROM #QueryWords)
            )
        ) = @i
    

    is incorrect syntax.

    Feel free to let me know what you think about my above rambling.

    Edit: I need the result of the SELECT to be distinct. I'm under the impression, in SQL Server, if there are two rows with the same A column, a "SELECT A FROM..." will produce duplicates (let me know if I'm wrong on this). Whether it is true or not, I'm explicit on my need for it to be distinct.

    luj on
  • Gilbert0Gilbert0 North of SeattleRegistered User regular
    edited June 2009
    You do know about the distinct keyword right? I haven't looked at the query at all but that could help you.

    select distinct A from ....

    http://www.sql-tutorial.com/sql-distinct-sql-tutorial/

    Gilbert0 on
  • exmelloexmello Registered User regular
    edited June 2009
    luj wrote: »
    I need the result of the SELECT to be distinct. I'm under the impression, in SQL Server, if there are two rows with the same A column, a "SELECT A FROM..." will produce duplicates (let me know if I'm wrong on this). Whether it is true or not, I'm explicit on my need for it to be distinct.

    I'm pretty sure the results would be the same if you left the distinct out until the top level. WHERE 3 NOT IN(1,2,3) is the same as WHERE 3 NOT IN(1,2,3,3). I seem to recall distinct forcing additional processing time, but in this case it may or may not matter. Everything really just relies on what your data is like.

    exmello on
Sign In or Register to comment.