Abstract
In the talk, we shall discuss quality measures for hash functions used
in data structures and algorithms, and survey positive and negative
results. (This talk is not about cryptographic hash functions.)
For the analysis of algorithms involving hash functions, it is often
convenient to assume the hash functions used behave fully randomly; in
some cases there is no analysis known that avoids this assumption. In
practice, one needs to get by with weaker hash functions that can be
generated by randomized algorithms. A wellstudied range of applications concern realizations of dynamic dictionaries (linear
probing, chained hashing, dynamic perfect hashing, cuckoo hashing and
its generalizations) or Bloom filters and their variants.
A particularly successful and useful means of classification are
Carter and Wegman's universal or kwise independent classes,
introduced in 1977. A natural and widely used approach to analyzing
an algorithm involving hash functions is to show that it works if a
sufficiently strong universal class of hash functions is used, and to
substitute one of the known constructions of such classes. This
invites research into the question of just how much independence in
the hash functions is necessary for an algorithm to work. Some recent
analyses that gave impossibility results constructed rather artificial
classes that would not work; other results pointed out natural, widely
used hash classes that would not work in a particular application.
Only recently it was shown that under certain assumptions on some
entropy present in the set of keys even 2wise independent hash
classes will lead to strong randomness properties in the hash values.
The negative results show that these results may not be taken as
justification for using weak hash classes indiscriminately, in
particular for key sets with structure.
When stronger independence properties are needed for a theoretical
analysis, one may resort to classic constructions. Only in 2003 it
was found out how full randomness can be simulated using only linear
space overhead (which is optimal). The "splitandshare" approach
can be used to justify the full randomness assumption in some
situations in which full randomness is needed for the analysis to go
through, like in many applications involving multiple hash functions
(e.g., generalized versions of cuckoo hashing with multiple hash
functions or larger bucket sizes, load balancing, Bloom filters and
variants, or minimal perfect hash function constructions).
For practice, efficiency considerations beyond constant factors are
important. It is not hard to construct very efficient 2wise
independent classes. Using kwise independent classes for constant k
bigger than 3 has become feasible in practice only by new
constructions involving tabulation. This goes together well with the
quite new result that linear probing works with 5independent hash
functions.
Recent developments suggest that the classification of hash function
constructions by their degree of independence alone may not be
adequate in some cases. Thus, one may want to analyze the behavior of
specific hash classes in specific applications, circumventing the
concept of kwise independence. Several such results were recently
achieved concerning hash functions that utilize tabulation. In
particular if the analysis of the application involves using
randomness properties in graphs and hypergraphs (generalized cuckoo
hashing, also in the version with a "stash", or load balancing), a
hash class combining kwise independence with tabulation has turned
out to be very powerful.
BibTeX  Entry
@InProceedings{dietzfelbinger:LIPIcs:2012:3388,
author = {Martin Dietzfelbinger},
title = {{On Randomness in Hash Functions (Invited Talk)}},
booktitle = {29th International Symposium on Theoretical Aspects of Computer Science (STACS 2012)},
pages = {2528},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783939897354},
ISSN = {18688969},
year = {2012},
volume = {14},
editor = {Christoph D{\"u}rr and Thomas Wilke},
publisher = {Schloss DagstuhlLeibnizZentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2012/3388},
URN = {urn:nbn:de:0030drops33884},
doi = {10.4230/LIPIcs.STACS.2012.25},
annote = {Keywords: Algorithms, hash functions, randomized algorithms, data structures, graphs, hypergraphs}
}
Keywords: 

Algorithms, hash functions, randomized algorithms, data structures, graphs, hypergraphs 
Collection: 

29th International Symposium on Theoretical Aspects of Computer Science (STACS 2012) 
Issue Date: 

2012 
Date of publication: 

24.02.2012 