The mapping of keys to indices of a hash table is called as hash function.It is the usually the composition of two maps:
- Hash Code Map - Maps keys to integers when they are not integers.
- Compression Map - Maps wide range of integers to size of hash table.If S is size of table then integers need to brought to range 0 to [S -1]
Hash Code Maps
Integer Cast - Reinterpret the bits of number as integer for numeric type with 32 bits or less.
Component sum - For numeric type with more than 32 bits,we can chunk into group of 32 bits and then interpret as integers.Component sum is not a good choice for strings as it leads to more number of collisions.For e.g. if I want to convert 'cusp' to integer value firstly ASCII value will be fetched and then added up together to obtain the sum and integer value will be obtained.But there are huge no of possibilities that many words of english dictionary have the same sum.If sum is same,multiple keys are mapped to same location resulting in high number of collisions.One of the technique's used to overcome this problem is Polynomial accumulation
For strings of english language,combine the ASCII values(ao, a1 ,a2,..an-1) and view them as coefficients of a polynomial.
ao + a1x + a2x2 + .an-1xn-1
The integer value is computed with Horner's rule for certain fixed value of x.The choice of x=33,37,39 or 41 gives at most 6 collisions on vocabulary of 50,000 words,
ao + x(a1+ x(a2 + .x(an-2 +x an-1)))
Use the remainder method - Use the modulo method for mapping high range of integers to size of table.If h is the hash function,k is the key and s is the size of table then use
h(k) = k mod s
It uses two hash functions h1 and h2.h1(k) is the position in the table to check for key k and h2 determines the offset from first position we use when searching again for k.Offset is how much distance has to covered while mapping the key to slot.The offset for linear probing is always 1,as we jumped one location ahead to store the element in the table.
Consider a series of number 18,41,22,44,59,32,31,73 to be stored in hash table.
h1(k) = k mod 13(Fig 1)
h2(k) = 8 - (k mod 8)
Hashing with non-integer keys
We can find some technique/way to covert keys to integers.For e.g if we have a mobile number like 98134-93789,we can remove hyphen store it like 9813493789 or if we have a string then we can add ASCII values and then use hash functions on integer values.
Features of Good Hash Function
- It is faster to compute.
- Distribution of keys is uniform throughout the table.
- It minimizes the probability of collisions.(No matter how good hash function we use,there is some probability of collisions in the slots).