# On Hashing functions and the development therof

Registered User regular
edited February 2008
I'm assigned to develop an implementation of a hash-like data structure, that uses an array of queue objects.

I need to find a way to hash the key and figure out which queue in which to store the value.

My first though was just to take the first 8 bits or whatever, convert that to an integer, and modulus it by the array size and be done with it, but I need a C++ function that grabs those first 8 bits from a given index. What's the best way to implement that in c++? Is there an easy function I can call that will just read that binary information?

Legoman05 on

## Posts

• Registered User regular
edited February 2008
Are the keys strings? Just add up the total value of their ASCII chars.

Lewisham on
• Registered User regular
edited February 2008
int randombits = 3254654;
int i = static_cast<int>(static_cast<char>(randombits >> 24));


Just look up bit operators. Also look up the endianness of whatever system you're on.

<< not just for streams!

falsedef on
• Registered User regular
edited February 2008
Legoman05 wrote: »
My first though was just to take the first 8 bits or whatever, convert that to an integer, and modulus it by the array size and be done with it, but I need a C++ function that grabs those first 8 bits from a given index.

If I'm reading this right, you want to know how to index a string? Just use [], as in arrays.

Janin on
[SIGPIC][/SIGPIC]
• Registered User regular
edited February 2008
Depending on the length of your keys (which I assume are strings) you might find it practical to do it like this:

Assume the key has characters c1, c2, ... , cN.

base hash value = 3*c1 + 3^2*c2 + 3^3*c3 + ... + 3^N * cN

Where 3 is a fairly small number, it doesn't have to be 3.

Then actual location = base hash value % TABLE_SIZE

Keep in mind you don't want to be doing 100000-character key values by this method. But it should ensure that even if you have similar keys, collisions should be minimized and you won't have every key taking up space in the same queue. That would be the problem with taking the first N bits as an integer and modding it by the table size: every single key that shares the same first N bits is going to be in the same queue. So if your data set is, say, a bunch of names, then the bucket containing "John Smith" (for example) will be burdensomely slow to deal with.

Cold Koala on