Hashing is a method and useful technique to implement dictionaries.This method is used to perform searching,insertion and deletion at a faster rate.A function called Hash Function is used to compute and return position of the record instead of searching with comparisons.The data is stored in array called as Hash table.The Hash Function maps keys into positions in a hash table.The mapping of keys to indices of a hash table is known as Hash Function.The major requirement of hash function is to map equal keys to equal indices.
Given a key, the algorithm computes an index that suggests where the entry can be found:
index = f(key, array_size)
The value of index is determined by 2 steps
- hash = hash_func(key)
- index = hash % array_size
A hash function is purely a user choice and a good hash function has following features :
- It is quick to compute.
- Keys are distributed uniformly throughout the table.
- The load factor is defined by t= n/m where m is the slots/buckets and n is the elements in m of hash table t.
Say we want to perform certain operations on salaries of employes from employee id's 51000 to 52000.In order to access the data at a quicker rate we can use hashing(Fig 1 and Fig 2).
- Now keys(k) are are id's from 51000 to 52000, i.e the array_size would be 1000.
- Say table is the array.
- We choose a Hash Function
hash = k - 51000 index = hash % array_size
When two or more values are mapped at same location/bucket by the hash function,then collision arises.
Say we have 100 students of subject CS501 of batch 2014 and the roll no of first student is like 2014 CS501210,second is like 2014 CS501211 and the list goes on till 2104 CS501310.
- There are 100 students in class so we create a hash table of size 100.
- Say we chose a hash function of last two digits.
- 2014 CS501210 is stored at 10 location.
- Where will 2014 CS501310 go?Again at same location i.e at 10 ?
- A concept called collision arises when two or more entries are mapped at same location.
In chaining,an array of tables/links is set up indexed by the keys to the list of items with the same key.Say we have a hash function which computed modulo 5 of any series and say hash table has 6 locations in it.If two or more keys are mapped into same location we will keep on adding to the linked list.By example given below 1 key is mapped to location 0,2 keys are mapped to location 1,1 key is mapped to location 2,2 keys to location 3,3keys to location 4 and no keys are mapped to location 5.The hash values ending at 0 are stored at 0 location,ending at 1 is stored at first location and the list goes on in the similar pattern..Thus ,the problem of collision is resolved by chaining.
It is a method to resolve collision.The item will be stored at next available slot in the table if there is clash,assuming that the table is already not full.It uses less memory than chaining as links are not required but is slower than chaining as one has to move across the table for long time.
Consider Fig 4.Say a list of numbers has to be stored in a hash table of size 8 and hash function is h(k) = k%8.The no. in series are 48,26,43,61,86,10,4 ,13. Mapping is done and numbers are stored in the table.10 and 26 have same remainder(2),so 10 goes to next empty slot.i.e. at location 4,Number 4 was to be stored at location 4 but since it is occupied it will go next empty slot.i.e. at 7 and lastly 13 and 61 have same remainder (5) but 61 is already occupied so 13 will go to location 1.