SQL Index and variable conversion

Richy · September 2009

So I have this database in SQL Server 2005. My queries are running slowly. After checking with the query optimizer, it seems that the problem comes from one specific column, where I search for a specific value. That table is basically a custom-built dictionary of English words, and I'm searching for a specific word. It seems to scan the entire table looking for that word, which is too slow. The solution, IMO, is to build an index.

Problem is, the column is of type nvarchar(max). I made it that way to accommodate words of any possible length. SQL Server apparently only allows indexes on columns with a maximum size of 900 bytes. so I need to convert the column to nvarchar(900).

alter table Dictionary add Word900 as cast(Word as nvarchar(900))
alter table Dictionary drop column Word

I see three problems with this:

1) Won't it be wasteful of memory space? I mean, won't all words be fixed to 900 bytes, even if they take less than that?

2) How will that affect my queries? Will I need to pad each word I'm searching for with spaces to get them up to 900 bytes to compare with the Word900 column?

3) Obviously, words that are more than 900 characters in length will be truncated. How do I know if there are any? Is there a way to search the Word column to return the longest word?

As a bonus question: do you know of some other, undocumented way of making an index on an nvarchar(max) column?

Thanks in advance for your help.

exmello · September 2009

First things first, read up on varchar and nvarchar.

They are very different than char in that they are NOT padded with spaces.

http://msdn.microsoft.com/en-us/library/ms176089.aspx

Use varchar when the sizes of the column data entries vary considerably.

vonPoonBurGer · September 2009

Richy wrote:

1) Won't it be wasteful of memory space? I mean, won't all words be fixed to 900 bytes, even if they take less than that?

The "var" in varchar is short for "variable", as in variable length. You use a varchar when you want to store text but you're not sure how long the text will be. Which is almost always. A column defined as varchar(900) will store up to 900 characters, but if you put the word "foo" in that column, it only uses three characters' worth of storage.

Nvarchar is functionally identical to varchar, except it stores data in unicode instead of ASCII. If it's just English words in your database and you have no plans to support multibyte languages you should use varchar instead. Nvarchar uses twice as much storage space because unicode characters are two bytes each instead of the usual one byte per character for ASCII.

Richy wrote:

2) How will that affect my queries? Will I need to pad each word I'm searching for with spaces to get them up to 900 bytes to compare with the Word900 column?

Depends what you're comparing them against. If the word you're searching for has no whitespace padding, and the matching value in the table also has no whitespace padding that you've explicitly added, then you can probably just do a straight comparison. It's easy to test though:

select * from dictionary where word900 = 'word';
select * from dictionary where ltrim(rtrim(word900)) = 'word';

If the first query returns no results but the second one does, you know that the values in your dictionary table have some space padding. Chances are both queries will return the same results. The database isn't going to mangle your data on insert.

Richy wrote:

3) Obviously, words that are more than 900 characters in length will be truncated. How do I know if there are any? Is there a way to search the Word column to return the longest word?

In Oracle I would do the following:

select max(length(word900)) from dictionary;

I think this would be the equivalent in SQL Server, but it could be off:

select max(datalength(word900)) from dictionary;

But seriously, what language has words longer than 900 characters in length?? A realistic, safe limit for English words would probably be something like 50 characters unless you're talking about chemical names. Even the longest German word is only 64 characters.

Richy · September 2009

Thanks for the explanations, and for the "datalength" command I didn't know.

My dictionary is built from an online source, that does include chemical and scientific names, as well as typos and missing spaces that wrongly merge multiple words together. So I cannot assume that the maximum length will be the length of the longest word in the language. After checking with your command, it seems the longest word in my database is 255 characters long, or 510 bytes since it's an nvarchar.

So I guess I'll try converting to nvarchar(510) and see what happens. Thanks!

exmello · September 2009

I just use LEN( ) in sql server.

I'm too lazy to look it up, but I think you only use datalength for comparing text or binary columns.

vonPoonBurGer · September 2009

exmello wrote: »

I just use LEN( ) in sql server.

I'm too lazy to look it up, but I think you only use datalength for comparing text or binary columns.

Yeah, I'm not really familiar with SQL Server and in my Googling I read that LEN() wasn't appropriate for us with text. What I didn't realize was that "text" is actually a SQL Server datatype (the Oracle equivalent would probably be LONG).

Penny Arcade

Quick Links

SQL Index and variable conversion

Posts