A Womanium Global Media Project Initiative.

Collisions in hash functions

A collision in a hash function is when two different inputs produce the same output hash. This is a problem because it violates the fundamental premise of a hash function, which is to produce a unique output for each unique input. Collisions can be exploited by attackers to create fraudulent transactions or gain unauthorized access to secure systems, making them a critical concern in the design and implementation of hash functions.

To avoid collisions, hash functions use complex mathematical algorithms to ensure that each input produces a unique output. The quality of a hash function is typically measured by its collision resistance, which refers to its ability to prevent collisions from occurring. Strong collision resistance is important for the security of systems that rely on hash functions, such as digital signatures and cryptographic hash functions used in encryption.

The BHT Algorithm

The BHT algorithm ( Citation: , & al., , & (n.d.). Quantum Algorithm for the Collision Problem. ) is a generic quantum collision-finding algorithm developed by Brassard, Høyer, and Tapp. The goal is to find a collision of a random function F:{0,1}n{0,1}nF : \{ 0, 1 \}^n \rightarrow \{0, 1\}^n.

Grover’s Algorithm and Generalization

Given a function F:X{0,1}F: X \rightarrow \{0,1\} such that there exists a unique x0Xx_0 \in X so that F(x0)=1F(x_0) = 1. The task is to find x0x_0. We know that this requires O(N)O(\sqrt{N}) evaluations of FF where, N=XN = |X| ( Citation: , (n.d.). A fast quantum mechanical algorithm for database search. ) .

To generalize this, consider an arbitrary function F:XYF: X \rightarrow Y. Now, given some y0Yy_0 \in Y, find xXx \in X such that F(x)=y0F(x) = y_0 provided such xx exists. If there are tt number of solutions i.e. there are tt number of such xx, then it requires O(N/t)O(\sqrt{N/t}) evaluations of FF. From now onwards, we denote this generalization by Grover(F,y0F, y_0). This will be handy later.

What is qRAM?

qRAM is a quantum analogue of RAM which allows to efficiently access stored data in quantum superpositions.

Let there be a list of classical data L={x0,x1x2m1}L = \{ x_0, x_1 \dots x_{2^m -1} \} where xi{0,1}nx_i\in \{0,1\}^n for each ii. Then, the qRAM for LL is modeled as an unitary operator UqRAM(L)U_{\text{qRAM}}(L) defined by

UqRAM(L):iyiyxi \begin{equation} U_{\text{qRAM}}(L): \ket{i} \otimes \ket{y} \rightarrow \ket{i} \otimes \ket{y \oplus x_i} \end{equation}

for i{0,1}mi\in \{0,1\}^m and y{0,1}ny\in \{0,1\}^n .

In this context, having qRAM available means that we have a quantum gate capable of realizing the unitary operation (1) for a list LL of classical data in addition to access to basic quantum gates ( Citation: & , & (n.d.). Finding Hash Collisions with Quantum Computers by Using Differential Trails with Smaller Probability than Birthday Bound. ) .

BHT Algorithm

The algorithm consists of 2 steps

Step 1

  1. Start by selecting an arbitrary subset KXK \subseteq X of cardinality k=2n/3k = 2^{n/3}. Create a table LL where each item in LL holds a unique pair (x,F(x))(x, F (x)) with xKx \in K.
  2. Sort LL according to the second entry in each item of LL.
  3. Verify whether LL contains any collisions, meaning check if there are distinct elements (x0,F(x0)),(x1,F(x1))L(x_0, F (x_0)), (x_1, F (x_1)) \in L such that F(x0)=F(x1)F (x_0) = F (x_1). If so, proceed to Step 2.3. If not, the 2n/32^{n/3} pairs of LL are stored in qRAM. The implication is that they can be accessed in quantum superposition.

Step 2

  1. Calculate x1=Grover(H,1)x_1 = \textbf{Grover}(H, 1) where H:X0,1H : X \rightarrow {0, 1} denotes the function defined by H(x)=1H(x) = 1 iff there exists x0Kx_0 \in K such that (x0,F(x))L(x_0 , F (x)) \in L but xx0x \not = x_0. (It’s worth noting that if such x0x_0 exists, it is unique, as we have already verified the absence of collisions in LL).
  2. Search (x0,F(x1))L(x_0 , F (x_1 )) \in L.
  3. Output the collision set {x0,x1}\{x_0 , x_1 \}.

Collision Complexity

For the BHT algorithm, the step 1 involves making queries in time O(2n/3)O(2^{n/3}). Step 2 uses Grover’s search and runs in O(2n/L)=O(2n/3)O\left(\sqrt{2^n/|L|}\right) = O(2^{n/3}). Therefore, BHT finds a collision in time O~(2n/3)\tilde{O}(2^{n/3}) by making O(2n/3)O(2^{n/3}) quantum queries, provided that there is access to exponentially large quantum random access memory (qRAM).

Classical Setting

In the classical setting, the generic attack complexity to find collisions against an n-bit hash function is O(2n/2)O(2^{n/2}) due to the birthday paradox. As a result, any dedicated attack that finds collisions with less than O(2n/2)O(2^{n/2}) complexity is considered as a meaningful attack.

Quantum Setting

In the quantum setting, the complexity of a generic attack depends on the specific assumptions or model of the underlying quantum machines. Regardless of the model, the lower bound of the query complexity is proven to be Ω(2n/3)\Omega(2^{n/3}) ( Citation: , (n.d.). A Note on the Quantum Collision and Set Equality Problems. ) and there is an attack matching this bound if O(2n/3)O(2^{n/3}) qubits are available which is the BHT algorithm. By the same analogy, any dedicated attack with less than O(2n/3)O(2^{n/3}) quantum complexity should be regarded as a meaningful attack. However, comparing dedicated attacks with the generic attack complexity in the quantum setting requires careful consideration because the generic attack complexity depends on the model of the quantum computations. We’ll see this in the next sections.

Impact of qRAM on BHT

We consider a model of a small quantum computer of polynomial size and a qRAM that allows us to access exponentially many classical data in quantum superposition. Here we do not\textbf{not} consider any parallelized computations. In this model, the best collision-finding algorithm is BHT because we can implement it with qRAM (of size O(2n/3)O(2^{n/3})) and its time complexity matches the tight quantum query complexity Θ(2n/3)\Theta (2^{n/3} ).

Time and memory tradeoff compared to classical algorithms

BHT achieves a time complexity O~(2n/3)\tilde{O}(2^{n/3}), however, it requires a large qRAM of size O~(2n/3)\tilde{O}(2^{n/3} ) as well. This raises concerns since the availability of such qRAM in the future is uncertain, especially in the case of small quantum computers.

Alternatively, one can use a quantum circuit of size O(2n/3)O(2^{n/3}), but it comes with the trade-off of using an exponentially large number of qubits. This leads to questions about parallelization. In terms of time-space complexity, BHT is less efficient than the classical parallel rho method ( Citation: , (n.d.). Cost analysis of hash collisions: Will quantum computers make SHARCS obsolete. ) ( Citation: & , & (n.d.). Parallel collision search with application to hash functions and discrete logarithms. ) . When PP classical processors are available, the parallel rho method finds a collision in time O(2n/2/P)O(2^{n/2}/P). This means that if a quantum computer of size 2n/32^{n/3} is available, but qRAM is not, simply running the parallel rho method on the quantum computer can find a collision in time 2n/62^{n/6} which is much faster than BHT.

In the classical setting, the rho method finds a memory-less collision in time O(2n/2)O(2^{n/2}), which matches the classical tight bound for query complexity. However, there is no known any memory-less\textbf{memory-less} quantum collision finding algorithm that matches the optimal query complexity 2n/32^{n/3}.

Let SS denote the size of computational resources required for a quantum algoithm and SS is the maximum size of quantum computers and classical memory. Let TT denote its time complexity. Then the trade-off TS=2n/2T \cdot S = 2^{n/2} given by the parallel rho method is the best one even in the quantum setting.

Outlook

In the quantum setting, comparing dedicated attacks with the generic attack complexity requires caution since the generic attack complexity depends on the model of the quantum computations. As we have seen, the BHT cannot outperform the classical computations, considering that each qubit can behave as either processor or memory (by running 2n/32^{n/3} processors in parallel, collisions can be found in time O(2n/6)O(2^{n/6}) even with classical machines). However, if a quantum computer of polynomial size with exponentially large qRAM is available, BHT is the best collision attack. It is challenging to predict which model is more likely to be realized in the future, and it would be valuable to discuss the advantages of attacks in various models with different generic attack complexities.

References

Bernstein (n.d.)
(n.d.). Cost analysis of hash collisions: Will quantum computers make SHARCS obsolete.
Brassard, Hoyer & Tapp (n.d.)
, & (n.d.). Quantum Algorithm for the Collision Problem.
Hosoyamada & Sasaki (n.d.)
& (n.d.). Finding Hash Collisions with Quantum Computers by Using Differential Trails with Smaller Probability than Birthday Bound.
Grover (n.d.)
(n.d.). A fast quantum mechanical algorithm for database search.
Oorschot & Wiener (n.d.)
& (n.d.). Parallel collision search with application to hash functions and discrete logarithms.
Zhandry (n.d.)
(n.d.). A Note on the Quantum Collision and Set Equality Problems.