On Hashing functions and the development therof

Legoman05Legoman05 Registered User regular
edited February 2008 in Help / Advice Forum
I'm assigned to develop an implementation of a hash-like data structure, that uses an array of queue objects.

I need to find a way to hash the key and figure out which queue in which to store the value.

My first though was just to take the first 8 bits or whatever, convert that to an integer, and modulus it by the array size and be done with it, but I need a C++ function that grabs those first 8 bits from a given index. What's the best way to implement that in c++? Is there an easy function I can call that will just read that binary information?

Legoman05 on

Posts

  • LewishamLewisham Registered User regular
    edited February 2008
    Are the keys strings? Just add up the total value of their ASCII chars.

    Not sure about your actual questions.

    Lewisham on
  • falsedeffalsedef Registered User regular
    edited February 2008
    int randombits = 3254654;
    int i = static_cast<int>(static_cast<char>(randombits >> 24));
    

    Just look up bit operators. Also look up the endianness of whatever system you're on.

    << not just for streams!

    falsedef on
  • JaninJanin Registered User regular
    edited February 2008
    Legoman05 wrote: »
    My first though was just to take the first 8 bits or whatever, convert that to an integer, and modulus it by the array size and be done with it, but I need a C++ function that grabs those first 8 bits from a given index.

    If I'm reading this right, you want to know how to index a string? Just use [], as in arrays.

    Janin on
    [SIGPIC][/SIGPIC]
  • Cold KoalaCold Koala Registered User regular
    edited February 2008
    Depending on the length of your keys (which I assume are strings) you might find it practical to do it like this:

    Assume the key has characters c1, c2, ... , cN.

    base hash value = 3*c1 + 3^2*c2 + 3^3*c3 + ... + 3^N * cN

    Where 3 is a fairly small number, it doesn't have to be 3.

    Then actual location = base hash value % TABLE_SIZE

    Keep in mind you don't want to be doing 100000-character key values by this method. But it should ensure that even if you have similar keys, collisions should be minimized and you won't have every key taking up space in the same queue. That would be the problem with taking the first N bits as an integer and modding it by the table size: every single key that shares the same first N bits is going to be in the same queue. So if your data set is, say, a bunch of names, then the bucket containing "John Smith" (for example) will be burdensomely slow to deal with.

    Cold Koala on
Sign In or Register to comment.