Wednesday, January 27, 2016

HASHING & HASH COLLISSION in SQL SERVER

HASHING:
Hashing is the mechanism/algorithm to generate hash key values for provided inputs, there is different SQL server algorithms/in build functions to generate hash values.
HASHBYTES is the function to generate hash values using algorithms (MD2 | MD4 | MD5 | SHA | SHA1 | SHA2_256 | SHA2_512)
 
We also use CHECKSUM & BINARY_CHECKSUM to generate hash values.

 

HASH COLLISSION:  When hashing mechanism generate same Hash values for not very similar inputs.

Execute below queries for examples and watch result marked in particular fashion.
Examples:
DECLARE @V Varchar(88) ='AAADB'
DECLARE @V1 Varchar(88) = 'AAAAAAAAAAAAAAAAAAADB'
SELECT CHECKSUM(@V) AS HashValue,@V AS InputString
UNION ALL
SELECT CHECKSUM(@V1) AS HashValue,@V1 AS ChangedInputString

SET @V = 'DBAAADB'
SET @V1 = 'DBAAAAAAAAAAAAAAAAAAADB'
SELECT  CHECKSUM(@V) AS CHECKSUM_HashValue, @V AS  InputString
UNION ALL
SELECT  CHECKSUM(@V1) AS CHECKSUM_HashValue, @V1 AS ChangedInputString

SET @V = 'DBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADB'
SET @V1 = 'DBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADB'

SELECT  HASHBYTES('SHA', @V) AS SHA_HASHBYTES_HashValue, @V AS InputString
UNION ALL
SELECT  HASHBYTES('SHA', @V1) AS SHA_HASHBYTES_HashValue, @V1 AS ChangedInputString

SELECT  HASHBYTES('SHA1', @V) AS SHA1_HASHBYTES_HashValue, @V AS InputString
UNION ALL
SELECT  HASHBYTES('SHA1', @V1) AS SHA1_HASHBYTES_HashValue, @V1 AS ChangedInputString

SELECT HASHBYTES('SHA2_256', @V) AS SHA2_HASHBYTES_HashValue, @V AS InputString
UNION ALL
SELECT HASHBYTES('SHA2_256', @V1) AS SHA2_HASHBYTES_HashValue, @V1 AS ChangedInputString

SELECT HASHBYTES('SHA2_512', @V) AS SHA2_512_HASHBYTES_HashValue, @V AS InputString
UNION ALL
SELECT HASHBYTES('SHA2_512', @V1) AS SHA2_512_HASHBYTES_HashValue , @V1 AS ChangedInputString 

Queries Output:

From above result set it is very much clear hashing functions/algorithms generate same hash values for different input strings which result to hash collision.