## The Message Length Cost of Perfectly Injective Hashing

### April 4, 2019

Modular arithmetic is (the) central pillar of number theory. Number theorists being ‘Lord of the Rings’, if you will. Modular arithmetic can be characterised as sacrificing operational precision of the most significant digit.

Floating point operations however, involve loss of precision at -least- significant numerical placeholder.

Floating point numbers are quite rightly regarded as mathematical ‘hacks’. Unpredictable numerical compromises following from the infinite Reals being shoehorned into finite representation. A distasteful reminder of reality best left to computer engineers, not appropriate to the uncompromising nature of pure math. Donald Knuth’s volumes on representations of number systems might be unkindly summed as, ‘thar be dragons!’

That being said, well conditioned floating point operations compliant with IEEE 754, will reliably return precision within +/- 1 decimal unit in the last place, ULP.

The PRNG-Hash function presented on these pages –here-, harvests pseudo-randomness from floating point imprecision over a correlated hash of a binary string-file. From its state dependent forward-feed construction, it satisfies strict avalanche criteria. It is a PRNG-expanding function that takes a fixed length message and returns an expanded string comprised of the original message interleaved with error correcting data. In addition to the expanded string, it also returns a correlated numerical message-digest which, in tandem with the binary file, can be inverted to reconstruct the original binary file.

Discussion has so-far centred around cryptographic utility but I wish to present this work in a different light, as a universal indexing function.

A conventional perfect-hash of a fixed-length message, relies upon expanding a set S of n elements to a range of *O*(*n*) indices, and then mapping each index to a range of hash values. Obviously, the hash function itself requires storage space *O*(*n*) to store k, p, and all of the second-level linear modular functions.

What then, is the message length cost of an injective but message-expanding function?

The ratio of error-free to imprecise inversion-recursion is asymptotic of 1:1.666

Which means total combined hashed message length of, on average, x 2.666 the original.

Plus the floating point digest.

From consideration of strict avalanche criteria, we may state that the function’s hash & digest are injective for a message of (n) length.

What point therefore, is there to this construction?

The answer lies in the economy of the indexing cost and the ability to self-construct non-archetypal strings from a reference hash-string without recourse to another specific hash file, using only the reference archetype & an initialisation vector plus terminal digest. This process will be elaborated in future posts.