So I have this database in SQL Server 2005. My queries are running slowly. After checking with the query optimizer, it seems that the problem comes from one specific column, where I search for a specific value. That table is basically a custom-built dictionary of English words, and I'm searching for a specific word. It seems to scan the entire table looking for that word, which is too slow. The solution, IMO, is to build an index.
Problem is, the column is of type nvarchar(max). I made it that way to accommodate words of any possible length. SQL Server apparently only allows indexes on columns with a maximum size of 900 bytes. so I need to convert the column to nvarchar(900).
alter table Dictionary add Word900 as cast(Word as nvarchar(900))
alter table Dictionary drop column Word
I see three problems with this:
1) Won't it be wasteful of memory space? I mean, won't all words be fixed to 900 bytes, even if they take less than that?
2) How will that affect my queries? Will I need to pad each word I'm searching for with spaces to get them up to 900 bytes to compare with the Word900 column?
3) Obviously, words that are more than 900 characters in length will be truncated. How do I know if there are any? Is there a way to search the Word column to return the longest word?
As a bonus question: do you know of some other, undocumented way of making an index on an nvarchar(max) column?
Thanks in advance for your help.
Posts
They are very different than char in that they are NOT padded with spaces.
http://msdn.microsoft.com/en-us/library/ms176089.aspx
Nvarchar is functionally identical to varchar, except it stores data in unicode instead of ASCII. If it's just English words in your database and you have no plans to support multibyte languages you should use varchar instead. Nvarchar uses twice as much storage space because unicode characters are two bytes each instead of the usual one byte per character for ASCII.
Depends what you're comparing them against. If the word you're searching for has no whitespace padding, and the matching value in the table also has no whitespace padding that you've explicitly added, then you can probably just do a straight comparison. It's easy to test though: If the first query returns no results but the second one does, you know that the values in your dictionary table have some space padding. Chances are both queries will return the same results. The database isn't going to mangle your data on insert.
In Oracle I would do the following:
I think this would be the equivalent in SQL Server, but it could be off: But seriously, what language has words longer than 900 characters in length?? A realistic, safe limit for English words would probably be something like 50 characters unless you're talking about chemical names. Even the longest German word is only 64 characters.
My dictionary is built from an online source, that does include chemical and scientific names, as well as typos and missing spaces that wrongly merge multiple words together. So I cannot assume that the maximum length will be the length of the longest word in the language. After checking with your command, it seems the longest word in my database is 255 characters long, or 510 bytes since it's an nvarchar.
So I guess I'll try converting to nvarchar(510) and see what happens. Thanks!
I'm too lazy to look it up, but I think you only use datalength for comparing text or binary columns.