Hash function

A hash function is any function that can be used to map data of arbitrary size to fixed-size values, though there are some hash functions that support variable length output.^[1] The values returned by a hash function are called hash values, hash codes, hash digests, digests, or simply hashes.^[2] The values are usually used to index a fixed-size table called a hash table. Use of a hash function to index a hash table is called hashing or scatter storage addressing.

"hashlink" redirects here. For the Haxe virtual machine, see HashLink.

Hash functions and their associated hash tables are used in data storage and retrieval applications to access data in a small and nearly constant time per retrieval. They require an amount of storage space only fractionally greater than the total space required for the data or records themselves. Hashing is a computationally and storage space-efficient form of data access that avoids the non-constant access time of ordered and unordered lists and structured trees, and the often exponential storage requirements of direct access of state spaces of large or variable-length keys.

Use of hash functions relies on statistical properties of key and function interaction: worst-case behaviour is intolerably bad but rare, and average-case behaviour can be nearly optimal (minimal collision).^[3]^: 527

Hash functions are related to (and often confused with) checksums, check digits, fingerprints, lossy compression, randomization functions, error-correcting codes, and ciphers. Although the concepts overlap to some extent, each one has its own uses and requirements and is designed and optimized differently. The hash function differs from these concepts mainly in terms of data integrity.

Convert variable-length keys into fixed length (usually machine word length or less) values, by folding them by words or other units using a like ADD or XOR.

parity-preserving operator

Scramble the bits of the key so that the resulting values are uniformly distributed over the keyspace.

Map the key values into ones less than or equal to the size of the table

A hash function takes a key as an input, which is associated with a datum or record and used to identify it to the data storage and retrieval application. The keys may be fixed length, like an integer, or variable length, like a name. In some cases, the key is the datum itself. The output is a hash code used to index a hash table holding the data or records, or pointers to them.

A hash function may be considered to perform three functions:

A good hash function satisfies two basic properties: 1) it should be very fast to compute; 2) it should minimize duplication of output values (collisions). Hash functions rely on generating favourable probability distributions for their effectiveness, reducing access time to nearly constant. High table loading factors, pathological key sets and poorly designed hash functions can result in access times approaching linear in the number of items in the table. Hash functions can be designed to give the best worst-case performance,^{[Notes 1]} good performance under high table loading factors, and in special cases, perfect (collisionless) mapping of keys into hash codes. Implementation is based on parity-preserving bit operations (XOR and ADD), multiply, or divide. A necessary adjunct to the hash function is a collision-resolution method that employs an auxiliary data structure like linked lists, or systematic probing of the table to find an empty slot.

Integrity check: Identical hash values for different files imply equality, providing a reliable means to detect file modifications.

Key derivation: Minor input changes result in a random-looking output alteration, known as the diffusion property. Thus, hash functions are valuable for key derivation functions.

Message Authentication Codes (MACs): Through the integration of a confidential key with the input data, hash functions can generate MACs ensuring the genuineness of the data, such as in .

HMACs

Password storage: The password's hash value doesn't expose any password details, emphasizing the importance of securely storing hashed passwords on the server.

Signatures: Message hashes are signed rather than the whole message.

16: a = 9E37₁₆ = 40503₁₀

32: a = 9E3779B9₁₆ = 2654435769₁₀

48: a = 9E3779B97F4B₁₆ = 173961102589771₁₀

[Notes 5]

64: a = 9E3779B97F4A7C15₁₆ = 11400714819323198485₁₀

Analysis[edit]

Worst case result for a hash function can be assessed two ways: theoretical and practical. Theoretical worst case is the probability that all keys map to a single slot. Practical worst case is expected longest probe sequence (hash function + collision resolution method). This analysis considers uniform hashing, that is, any key will map to any particular slot with probability $1/ m$ , characteristic of universal hash functions.

While Knuth worries about adversarial attack on real time systems,^[23] Gonnet has shown that the probability of such a case is "ridiculously small". His representation was that the probability of $k$ of $n$ keys mapping to a single slot is ${\frac {e^{-\alpha }\alpha ^{k}}{k!}}$ where $α$ is the load factor, $n / m$ .^[24]

History[edit]

The term hash offers a natural analogy with its non-technical meaning (to chop up or make a mess out of something), given how hash functions scramble their input data to derive their output.^[25]^: 514 In his research for the precise origin of the term, Donald Knuth notes that, while Hans Peter Luhn of IBM appears to have been the first to use the concept of a hash function in a memo dated January 1953, the term itself would only appear in published literature in the late 1960s, in Herbert Hellerman's Digital Computer System Principles, even though it was already widespread jargon by then.^[25]^{: 547–548}

by Timo Denk

Calculate hash of a given value

(PDF) by Mayur Patel

The Goulburn Hashing Function

(PDF) Latest Trends on Computers, Vol.2, pp. 483–489, CSCC Conference, Corfu, 2010