# Harrow Hassidim Lloyd (HHL) Algorithm

(01/22/2021)

Written by Bikash's Quantum for the Strangeworks Platform

Multiple linear regression, one of the most fundamental supervised learning algorithms, assumes an imperative role in the field of machine learning. In 2009, Harrow et al. [Phys. Rev. Lett. 103, 150502 (2009)] showed that their algorithm could be used to sample the solution of a linear system Ax=bAx=b exponentially faster than any existing classical algorithm. Remarkably, any multiple linear regression problem can be reduced to a linear system of equations problem. However, finding a practical and efficient quantum circuit for the quantum algorithm in terms of elementary gate operations is still an open topic. Here we put forward a 7​-qubit quantum circuit design, based on an earlier work by Cao et al. [Mol. Phys. 110, 1675 (2012)], to solve a 3-variable regression problem, utilizing only basic quantum gates. Furthermore, we demonstrate the circuit implementation of the given algorithm after designing the circuits in Qiskit and Cirq platform.

## Introduction

Quantum algorithms running on quantum computers aim at quickly and efficiently solving several important computational problems faster than classical algorithms running on classical computers. One key way in which quantum algorithms differ from classical algorithms is that they utilize quantum mechanical phenomena such as superposition and entanglement, which allow us to work in exponentially large Hilbert spaces with only polynomial overheads. This in turn, in some cases, allows for exponential speed-ups in terms of algorithmic complexity.

In today's world, machine learning is primarily concerned with the development of low-error models in order to make accurate predictions possible by learning and inferring from training data. It borrows heavily from the field of statistics in which linear regression is one of the flagship tools. The theory of multiple linear regression or more generally multivariate linear regression was largely developed in the field of statistics in the pre-computer era. It is one of the most well understood, versatile and straightforward techniques in any statistician's toolbox. It is also an important and practical supervised learning algorithm. Supervised learning is where one has some labelled input data samples {xi,yi}i=1N\{\mathbf{x}_i,\mathbf{y}_i\}_{i=1}^{N}​  (where xi\mathbf{x}_i's are the feature vectors and yi\mathbf{y}_i's are the corresponding labels) and then based on some criteria (which might depend on the context) chooses a mapping from input set X\mathbf{X}​ to the output set Y\mathbf{Y}​. And that mapping can help to predict the probable output corresponding to an input lying outside of the training data sets. Multiple linear regression is similar in the sense that given some training samples one identifies a closely-fitting hyperplane depending on the specific choice of a loss function (the most common one being a quadratic loss function based on the least squares method). Interestingly, any multiple regression problem can be converted into an equivalent system of linear equations problem or more specifically, a Quantum Linear Systems Problem (QLSP) problem.

Suppose, that we are given a system of NN​ linear equations with NN​ unknowns, which can be expressed as Ax=b\mathbf{A}\mathbf{x}=\mathbf{b}​. Now, what we are interested in, is: given a matrix A∈CN×N\mathbf{A} \in \mathbb{C}^{N\times N}​ with a vector b∈CN\mathbf{b}\in \mathbb{C}^N​, find the solution x∈CN\mathbf{x} \in \mathbb{C}^N​ satisfying Ax=b\mathbf{Ax=b}​ (which is A−1b\mathbf{{A}^{-1}b}​ if A\mathbf{A}​ is invertible), or else return a flag if no solution exists. This is known as the Linear Systems Problem (LSP). However, we will consider only a special case of this general problem, in form of the Quantum Linear Systems Problem (QLSP).

The quantum version of the LSP problem, that is, the QLSP can be expressed as:

Let A\mathbf{A}​ be a N×NN\times N​ Hermitian matrix with a spectral norm bounded by unity and a known condition number κ\kappa​. The quantum state on ⌈log⁡N⌉\lceil{\log N\rceil}​ qubits ∣b⟩|{b}\rangle​ can be given by

∣b⟩:=∑ibi∣i⟩∣∣∑ibi∣i⟩∣∣|{b}\rangle :=\frac{\sum_i b_i|{i}\rangle}{||\sum_i b_i |{i}\rangle||}​​

and ∣x⟩|{x}\rangle​ by

∣x⟩:=∑ixi∣i⟩∣∣∑ixi∣i⟩∣∣|{x}\rangle :=\frac{\sum_i x_i|{i}\rangle}{||\sum_i x_i |{i}\rangle||}​​

where bi,xib_i,x_i​ are respectively the ith\text{i}^{\text{th}}​ component of vectors b\mathbf{b}​ and x\mathbf{x}​. Given the matrix A\mathbf{A}​ (whose elements are accessed by an oracle) and the state ∣b⟩|{b}\rangle​, an output state ∣x~⟩|{\tilde{x}}\rangle​ is such that ∣∣∣x~⟩−∣x⟩∣∣2≤ϵ|||{\tilde{x}}\rangle-|{x}\rangle||_2\leq \epsilon​, with some probability Ω(1)\Omega(1)​ (practically at least 12\frac{1}{2}​) along with a binary flag indicating success or failure.

In 2009, Harrow, Hassidim and Lloyd put forward a quantum algorithm (popularly known as the HHL algorithm) to obtain information about the solution x\mathbf{x}​ of certain classes of linear systems Ax=b\mathbf{A}\mathbf{x}=\mathbf{b}​. As we know, algorithms for finding the solutions to linear systems of equations play an important role in engineering, physics, chemistry, computer science, and economics apart from other areas. However, experimentally implementing the HHL algorithm for solving an arbitrary system of linear equations to a satisfactory degree of accuracy, remains an infeasible task even today, due to several physical and theoretical restrictions imposed by the algorithm and the currently accessible hardware. It is therefore imperative to extract as much practical benefit as we can from the algorithm. We present an application (in the context of multiple regression) of a modified version of the earlier circuit which was meant for implementing the HHL algorithm for a 4×44\times 4​ linear system on real quantum computers. This circuit requires only 7 qubits and it should be simple enough to experimentally verify it, one gets access to quantum computers with quantum logic gates with sufficiently low error rates.

Although the HHL solves the QLSP for all such matrices A\mathbf{A}​ or

[0A†A0]\left[ {\begin{array}{cc}0 & \mathbf{A}^\dagger \\\mathbf{A} & 0 \end{array} } \right], it can be efficiently implemented only when they are sparse and well-conditioned (the sparsity condition may be slightly relaxed). In this context, efficient means at most poly-logarithmic in system size. A N×NN\times N​ matrix is called ss​-sparse if it has at most ss​ non-zero entries in any row or column. We call it simply sparse if it has at most poly(log⁡N)\text{poly}(\log N)​ entries per row. We generally call a matrix well-conditioned when its singular values lie between the reciprocal of its condition number (1κ\frac{1}{\kappa}​) and 1. Condition number κ\kappa​ of a matrix is the ratio of largest to smallest singular value and is undefined when the smallest singular value of A\mathbf{A}​is 00​. For Hermitian matrices the magnitude of the eigenvalues are equal to the magnitudes of the respective singular values.

At this point it is important to reiterate that unlike the output A−1b\mathbf{A}^{-1}\mathbf{b}​ of a classical linear system solver, the output copy of ∣x~⟩|{\tilde{x}}\rangle​ does not provide access to the coordinates of A−1b\mathbf{A}^{-1}\mathbf{b}​. Nevertheless, it allows for sampling from the solution vectors like ⟨x~∣M∣x~⟩\langle{\tilde{x}}| M |{\tilde{x}}\rangle​, where MM​ is a quantum- mechanical operator. This is one main difference between solutions of the LSP and solutions of the QLSP. We should also keep in mind that reading out the elements of ∣x~⟩|{\tilde{x}}\rangle​ in itself takes O(N)\mathcal{O}(N)​ time. Thus, a solution to QLSP might be useful only in applications where just samples from the vector ∣x~⟩|{\tilde{x}}\rangle​ are needed.

## The Harrow Hassidim Lloyd Algorithm

The HHL algorithm consists of three major steps which we will briefly discuss one by one. Initially, we begin with a Hermitian matrix A\mathbf{A}​ and an input state ∣b⟩|{b}\rangle​ corresponding to our specific system of linear equations [Fig. 1]. The assumption that A\mathbf{A}​ is Hermitian may be dropped without loss of generality since we can instead solve the linear system of equations given by [0A†A0]y=[b0]\left[{\begin{array}{cc}0 & \mathbf{A}^{\dagger} \\ \mathbf{A} & 0\end{array}}\right]\mathbf{y} = \left[\begin{array}{c}\mathbf{b} \\ 0\end{array}\right]​ which has the unique solution y=[0x]\mathbf{y}=\left[\begin{array}{c}0 \\ \mathbf{x}\end{array}\right]​ when A\mathbf{A}​ is invertible. This transformation does not alter the condition number (ratio of the magnitudes of the largest and smallest eigenvalues) of $\mathbf{A}$. However, in the case our original matrix A\mathbf{A}​ is not Hermitian, the transformed system with the new matrix [0A†A0]\left[\begin{array}{c}0 & \mathbf{A}^\dagger \\ \mathbf{A} & 0\end{array}\right]​ needs oracle access to the non-zero entries of the rows and columns of A\mathbf{A}​.

Here iAti\mathbf{A}t​ and −iAt-i\mathbf{A}t​ commute and hence eiAte−iAt=eiAt−iAt=e0=Ie^{i\mathbf{A}t}e^{-i\mathbf{A}t}=e^{i\mathbf{A}t-i\mathbf{A}t}=e^{\mathbf{0}}=\mathbf{I}​. Moreover, eiAte^{i\mathbf{A}t}​ shares all its eigenvectors with A\mathbf{A}​, while it's eigenvalues are eiλjte^{i\lambda_jt}​ if the eigenvalues of A\mathbf{A}​ are taken to be λj\lambda_j​. Suppose that ∣uj⟩|{u_j}\rangle​ are the eigenvectors of A\mathbf{A}​ and λj\lambda_j​ are the corresponding eigenvalues. We recall that we assumed all the eigenvalues to be of magnitude less than 1 (spectral norm is bounded by unity). As the eigenvalues λj\lambda_j​ are of the form 0.a1a1a3⋯0.a_1a_1a_3\cdots​ in binary, we use ∣λj⟩|{\lambda_j}\rangle​ to refer to ∣a1a2a3⋯⟩|{a_1a_2a_3\cdots}\rangle​. We know from the spectral theorem that every Hermitian matrix has an orthonormal basis of eigenvectors. So, in this context, A\mathbf{A}​ can be re-written as ∑jλj∣uj⟩⟨uj∣\sum_j \lambda_j|{u_j}\rangle\langle{u_j}|​ (via eigendecomposition of A\mathbf{A}​) and ∣b⟩|{b}\rangle​ as ∑jβj∣u⟩j\sum_j \beta_j |{u}\rangle_j​.

## Phase Estimation

The quantum phase estimation algorithm performs the mapping (∣0⟩⊗n)C∣u⟩I∣0⟩S↦∣φ~⟩C∣u⟩I∣0⟩S\left(|{0}\rangle^{\otimes n}\right)^C |{u}\rangle^I |{0}\rangle^S \mapsto |{\tilde \varphi}\rangle^C|{u}\rangle^I|{0}\rangle^Swhere ∣u⟩|{u}\rangle​ is an eigenvector of a unitary operator UU​ with an unknown eigenvalue ei2πφe^{i2\pi \varphi}​. φ~\tilde{\varphi}​ is a tt​-bit approximation of φ\varphi​, where tt​ is the number of qubits in the clock register. The superscripts on the kets indicate the names of the registers which store the corresponding states. In the HHL algorithm the input register begins with a superposition of eigenvectors instead i.e., ∣b⟩=∑jβj∣uj⟩|{b}\rangle = \sum_j\beta_j|{u_j}\rangle​ instead of a specific eigenvector ∣u⟩|{u}\rangle​, and for us the unitary operator is eiAte^{i\mathbf{A}t}​. So the phase estimation circuit performs the mapping

(∣0⟩⊗n)C∣b⟩I↦(∑j=1Nβj∣uj⟩I∣λ~jt0⟩2πC)\left(|{0}\rangle^{\otimes n}\right)^C |{b}\rangle^I \mapsto \left(\sum_{j=1}^{N}\beta_j|{u_j}\rangle^I|{\frac{\tilde\lambda_jt_0}\rangle {2\pi}}^C\right)

where λ~j\tilde\lambda_j​'s are the binary representations of the eigenvalues of A\mathbf{A}​ to a tolerated precision. To be more explicit, here λ~j\tilde{\lambda}_j​ is represented as b1b2b3⋯btb_1b_2b_3\cdots b_t​ (tt​ is the number of qubits in the clock register) if the actual binary equivalent of λj\lambda_j​ is of the form λ=0.b1b2b3⋯\lambda=0.b_1b_2b_3\cdots​. To avoid the factor of 2π2\pi​ in the denominator, the evolution time t0t_0​ is generally chosen to be 2π2\pi​. However, t0t_0​ may also be used to normalize A\mathbf{A}​ (by re-scaling t0t_0​) in case the spectral norm of AA​ exceeds 1. Additionally, an important factor in the performance of the algorithm is the condition number κ\kappa​. As κ\kappa​ grows, A\mathbf{A}​ tends more and more towards a non-invertible matrix, and the solutions become less and less stable. Matrices with large condition numbers are said to be ill-conditioned. The HHL algorithm generally assumes that the singular values of A\mathbf{A}​ lie between 1/κ1/\kappa​ and 1, which ensures that the matrices we have to deal with are well-conditioned. Nonetheless, there are methods to tackle ill-conditioned matrices and those have been thoroughly discussed in the paper by Lloyd et al. It is worth mentioning that in this step the clock register-controlled Hamiltonian simulation gate UU​ can be expressed as ∑k=0T−1∣τ⟩⟨τ∣C⊗eiAτt0/T\sum_{k=0}^{T-1} |{\tau}\rangle\langle{\tau}|^C \otimes e^{i\mathbf{A}\tau t_0/T}​, where T=2tT=2^t​ (tt​ is the number of qubits in the clock register) and evolution time t0=O(κ/ϵ)t_0 = \mathcal{O}(\kappa/\epsilon)​. Interestingly, choosing t0=O(κ/ϵ)t_0 = \mathcal{O}(\kappa/\epsilon)​ can at maximum error cause an error of magnitude ϵ\epsilon​ in the final state.

## Controlled Rotation (R(λ~−1)R(\tilde{\lambda}^{-1})​)

A clock register controlled σy\sigma_y​-rotation of the ancilla qubit produces a normalized state of the form

∑j=1Nβj∣uj⟩I∣λ~j⟩C(1−C2λ~j2∣0⟩+Cλ~j∣1⟩)S\sum_{j=1}^{N}\beta_j|{u_j}\rangle^I|{\tilde\lambda_j}\rangle^C\left(\sqrt{1-\frac{C^2}{\tilde\lambda_j^2}}|{0}\rangle+\frac{C}{\tilde\lambda_j}|{1}\rangle\right)^S​​

These rotations, conditioned on respective λ~j\tilde{\lambda}_j​, can be achieved by the application of the

exp⁡(−iθσy)=[cos(θ)−sin(θ)sin(θ)cos(θ)]\exp(-i\theta\sigma_y) = \left[\begin{array}{cc} cos(\theta) & - sin(\theta) \\ sin(\theta) & cos(\theta)\end{array}\right]

operators where θ=cos−1(Cλ~j)\theta = cos^{-1}\left(\dfrac{C}{\tilde \lambda_j}\right)​. CC​ is a scaling factor to prevent the controlled rotation from being unphysical. That is, practically $C<\lambda \min C$​ (minimum eigenvalue of A\mathbf{A}​) is a safe choice, which may be more formally stated as C=O(1/κ)C = \mathcal{O}(1/\kappa)​.

## Uncomputation

In the final step, the inverse quantum phase estimation algorithm sets back the clock register to (∣0⟩⊗n)C(|{0}\rangle^{\otimes n})^C​ and leaves the remaining state as

∑j=1Nβj∣uj⟩I(1−C2λ~j2∣0⟩+Cλ~j∣1⟩)S\sum_{j=1}^{N}\beta_j|{u_j}\rangle^I\left(\sqrt{1-\frac{C^2}{\tilde\lambda_j^2}}|{0}\rangle+\frac{C}{\tilde\lambda_j}|{1}\rangle\right)^S​​

Postselecting on the ancilla ∣1⟩S|{1}\rangle^S​ gives the final state C∑j=1N(βjλj)∣uj⟩IC\sum_{j=1}^{N}(\frac{\beta_j}{\lambda_j})|{u_j}\rangle^I​. The inverse of the Hermitian matrix A\mathbf{A}​ can be written as ∑j1λj∣uj⟩⟨uj∣\sum_j \frac{1}{\lambda_j}|{u_j}\rangle\langle{u_j}|​, and hence A−1∣b⟩\mathbf{A}^{-1}|b\rangle​ matches ∑j=1Nβjλj~∣uj⟩I\sum_{j=1}^{N}\frac{\beta_j}{\tilde{\lambda_j}}|{u_j}\rangle^I​. This outcome state, in the standard basis, is component-wise proportional to the exact solution x\mathbf{x}​ of the system Ax=b\mathbf{Ax = b}​.

## Linear Regression Utilizing the HHL

Linear regression models a linear relationship between a scalar response variable and one or more feature variables. Given a nn​-unit data set {yi,xi1,⋯,xip}i=1n\{y_i,x_{i1},\cdots,x_{ip}\}_{i=1}^{n}​, a linear regression model assumes that the relationship between the dependent variable yy​ and a set of pp​ attributes i.e., x={x1,⋯,xp}\mathbf{x} = \{x_{1},\cdots,x_{p}\}​ is linear. Essentially, the model takes the form

yi=β0+β1x1+⋯+βpxip+ϵi=xiTβ+ϵiy_i = \beta_0 + \beta_1 x_1 + \cdots + \beta_p x_{ip} + \epsilon_i = \mathbf{x}_i^T \pmb{\beta} + \epsilon_i

where ϵi\epsilon_i​ is the noise or error term. Here ii​ ranges from 1 to nn​. xiT\mathbf{x}_i^T​ denotes the transpose of the column matrix xi\mathbf{x}_i​. And xiTβ\mathbf{x}_i^T\pmb{\beta}​ is the inner product between vectors xi\mathbf{x}_i​ and β\pmb{\beta}​. These nn​ equations may be more compactly represented in the matrix notation, as y=Xβ+ϵ\mathbf{y=X\pmb{\beta}+\pmb{\epsilon}}​. Now, we will consider a simple example with 3 feature variables and a bias β0\beta_0​. Say our data sets are {−18+182,−2,12,−12}\{-\frac{1}{8} + \frac{1}{8\sqrt{2}},-\sqrt{2},\frac{1}{\sqrt{2}},-\frac{1}{2}\}​, {38−382,−2,−12,12}\{\frac{3}{8} - \frac{3}{8\sqrt{2}},-\sqrt{2},-\frac{1}{\sqrt{2}},\frac{1}{2}\}​, {−18−182,2,−12,−12}\{-\frac{1}{8} - \frac{1}{8\sqrt{2}},\sqrt{2},-\frac{1}{\sqrt{2}},-\frac{1}{2}\}​ and {38+382,2,12,12}\{\frac{3}{8} + \frac{3}{8\sqrt{2}},\sqrt{2},\frac{1}{\sqrt{2}},\frac{1}{2}\}​.

Plugging in these data sets we get the linear system:

β0−2β1+12β2−12β3=−18+182\beta_0 -\sqrt{2} \beta_1 + \frac{1}{\sqrt{2}} \beta_{2} - \frac{1}{2} \beta_3 = -\frac{1}{8} + \frac{1}{8\sqrt{2}}​​

β0−2β1−12β2+12β3=38−382\beta_0 -\sqrt{2} \beta_1 - \frac{1}{\sqrt{2}} \beta_2 + \frac{1}{2} \beta_3 = \frac{3}{8} -\frac{3}{8\sqrt{2}}

β0+2β1−12β2−12β3=−18−182\beta_0 + \sqrt{2} \beta_1 - \frac{1}{\sqrt{2}} \beta_2 - \frac{1}{2} \beta_3 = - \frac{1}{8} - \frac{1}{8\sqrt{2}}

β0+2β1+12β2+12β3=38+382\beta_0 + \sqrt{2} \beta_1 + \frac{1}{\sqrt{2}} \beta_2 + \frac{1}{2} \beta_3 = \frac{3}{8} + \frac{3}{8\sqrt{2}}

To estimate β\pmb{\beta}​ we will use the popular least squares method, which minimizes the residual sum of squares ∑i=1N(yi−xiβi)2\sum_{i=1}^{N}(y_i - \mathbf{x}_i\pmb{\beta}_i)^2​. If X\mathbf{X}​ is positive definite (and in turn has full rank) we can obtain a unique solution for the best fit β^\pmb{\hat{\beta}}​, which is (XTX)−1XTy\mathbf{(X^TX)^{-1}X^Ty}​. It is possible that all the columns of X\mathbf{X}​ are not linearly independent and by extension X\mathbf{X}​ is not full rank. This kind of a situation might occur if two or more of the feature variables are perfectly correlated. Then XTX\mathbf{X^TX}​ would be singular and β^\hat{\beta}​ wouldn't be uniquely defined. Nevertheless, there exist techniques like filtering to resolve the non-unique representations by reducing the redundant features. Rank deficiencies might also occur if the number of features pp​ exceed the number of data sets NN​. If we estimate such models using regularization, then redundant columns should not be left out. The regulation takes care of the singularities. More importantly, the final prediction might depend on which columns are left out.

Equations (1)-(4) may be expressed in the matrix notation as:

[−2112−12−21−1212−2−11212211212][β1β0β2β3]=[−18+18238−38218+18238+382]\left[\begin{array}{cccc}-\sqrt{2} & 1 & \frac{1}{\sqrt{2}} & -\frac{1}{2} \\-\sqrt{2} & 1 & -\frac{1}{\sqrt{2}} & \frac{1}{2} \\-\sqrt{2} & -1 & \frac{1}{\sqrt{2}} & \frac{1}{2} \\\sqrt{2} & 1 & \frac{1}{\sqrt{2}} & \frac{1}{2}\end{array}\right] \left[\begin{array}{c}\beta_1 \\ \beta_0 \\ \beta_2 \\ \beta_3 \end{array}\right] = \left[\begin{array}{c}-\frac{1}{8} + \frac{1}{8\sqrt{2}} \\ \frac{3}{8} - \frac{3}{8\sqrt{2}} \\ \frac{1}{8} + \frac{1}{8\sqrt{2}} \\ \frac{3}{8} + \frac{3}{8\sqrt{2}} \end{array}\right]

Note that unlike common convention, our representation of X\mathbf{X}​ does not contain a column full of 1's corresponding the bias term. This representation is used simply because of the convenient form which we obtain for XTX\mathbf{X}^T\mathbf{X}​. The final result remains unaffected as long as y=Xβ\mathbf{y=X\pmb{\beta}}​ represents the same linear system.

Now

XTX=14[1595−39153−55315−9−3−5−915]\mathbf{X}^T\mathbf{X} = \frac{1}{4} \left[\begin{array}{cccc}15 & 9 & 5 & -3 \\9 & 15 & 3 & -5 \\5 & 3 & 15 & -9 \\-3 & -5 & -9 & 15\end{array}\right]

and

XTy=[12121212]\mathbf{X}^T\mathbf{y} = \left[\begin{array}{c}\frac{1}{2} \\ \\\frac{1}{2} \\ \\\frac{1}{2} \\ \\\frac{1}{2}\end{array}\right].

Thus we need to solve for β^\pmb{\hat{\beta}} from XTXβ^=XTy\mathbf{X}^T\mathbf{X}\pmb{\hat{\beta}}=\mathbf{X}^T\mathbf{y}​ .

Quantum circuit for solving a 4×44\times 4​ system of linear equations Ax=b\mathbf{Ax=b}​. The top qubit (∣s⟩|s\rangle​) is the ancilla qubit. The four qubits in the middle (∣j0⟩,∣j1⟩,∣j2⟩|j_0\rangle, |j_1\rangle, |j_2\rangle​ and ∣j3⟩|j_3\rangle​) stand for the Clock register CC​. The two qubits at the bottom (∣q0⟩|q_0\rangle​ and ∣q1⟩|q_1\rangle​) form the Input register II​ and two Hadamard gates are applied on them to initialize the state ∣b⟩|{b}\rangle​.

## Quantum Circuit

Having discussed the general idea behind the HHL algorithm in the above Section and its possible application in drastically speeding up multiple regression, we now move on to the quantum circuit design meant to solve the 4×44\times 4​ linear system which we have already encountered i.e., XTXβ^=XTy\mathbf{X}^T\mathbf{X}\boldsymbol{\hat{\beta}}=\mathbf{X}^T\mathbf{y}​. For sake of convenience we will now denote XTX\mathbf{X}^T\mathbf{X}​ with A\mathbf{A}​, β^\boldsymbol{\hat{\beta}}​ with x\mathbf{x}​ and XTy\mathbf{X}^T\mathbf{y}​ with b\mathbf{b}​. The circuit requires only 7 qubits, with 4 qubits in the clock register, 2 qubits in the input register and remaining 1 as an ancilla qubit. At this point it is imperative to mention that we specifically chose the form of the regression data points in the previous Section such that A\mathbf{A}​ turns out to be Hermitian, has four distinct eigenvalues of the form λi=2i−1\lambda_i = 2^{i-1}​ and b\mathbf{b}​ has a convenient form which can be efficiently prepared by simply using two Hadamard gates.

A=14[1595−39153−55315−9−3−5−915]\mathbf{A} = \frac{1}{4} \left[\begin{matrix}15 & 9 & 5 & -3 \\9 & 15 & 3 & -5 \\5 & 3 & 15 & -9 \\-3 & -5 & -9 & 15\end{matrix}\right]

A\mathbf{A}​ is a Hermitian matrix with eigenvalues λ1=1,λ2=2,λ3=4\lambda_1= 1,\lambda_2 = 2, \lambda_3 = 4​ and λ4=8\lambda_4 = 8​. The corresponding eigenvectors encoded in quantum states ∣uj⟩|u_j\rangle​ may be expressed as

∣u1⟩=−∣00⟩−∣01⟩−∣10⟩+∣11⟩|u_1\rangle = -|00\rangle -|01\rangle - |10\rangle + |11\rangle​​

∣u2⟩=+∣00⟩+∣01⟩−∣10⟩+∣11⟩|u_2\rangle = +|00\rangle +|01\rangle - |10\rangle + |11\rangle​​

∣u3⟩=+∣00⟩−∣01⟩+∣10⟩+∣11⟩|u_3\rangle = +|00\rangle - |01\rangle + |10\rangle + |11\rangle​​

∣u4⟩=−∣00⟩+∣01⟩+∣10⟩+∣11⟩|u_4\rangle = -|00\rangle + |01\rangle + |10\rangle + |11\rangle​​

Also, b=[12121212]T\mathbf{b} = \left[\begin{matrix} \frac{1}{2} & \frac{1}{2} & \frac{1}{2} & \frac{1}{2} \end{matrix}\right]^T​ can be written as ∑j=1j=4βj∣uj⟩\sum_{j=1}^{j=4}\beta_j|u_j\rangle​ where βj=12\beta_j = \frac{1}{2}

We will now trace through the quantum circuit in Fig. (2). ∣q0⟩|q_0\rangle​ and ∣q1⟩|q_1\rangle​ are the input register qubits which are initialized to a combined quantum state

∣b⟩=12∣00⟩+12∣01⟩+12∣10⟩+12∣11⟩|b\rangle = \frac{1}{2}|00\rangle+\frac{1}{2}|01\rangle+\frac{1}{2}|10\rangle+\frac{1}{2}|11\rangle

which is basically the state-encoded format of b\mathbf{b}​. This is followed by the quantum phase estimation step which involves a Walsh-Hadamard transform on the clock register qubits ∣j0⟩,∣j1⟩,∣j2⟩,∣j3⟩|j_0\rangle,|j_1\rangle,|j_2\rangle,|j_3\rangle​, a clock register controlled unitary gates U20,U21,U22U^{2^0},U^{2^1},U^{2^2}​ and U23U^{2^3} where U=exp⁡(iAt/16)U=\exp(i\mathbf{A}t/16)​ and an inverse quantum Fourier transform on the clock register. As discussed in the above Section, this step would produce the state 12∣0001⟩C∣u1⟩I+12∣0010⟩C∣u2⟩I+12∣0100⟩C∣u3⟩I+12∣1000⟩C∣u4⟩I\frac{1}{2}|0001\rangle^C|u_1\rangle^I + \frac{1}{2}|0010\rangle^C|u_2\rangle^I+\frac{1}{2}|0100\rangle^C|u_3\rangle^I+\frac{1}{2}|1000\rangle^C|u_4\rangle^I​, which is essentially the same as ∑j=1Nβj∣uj⟩I∣λ~jt02π⟩C\sum_{j=1}^{N}\beta_j|{u_j}\rangle^I|{\frac{\tilde\lambda_jt_0}{2\pi}\rangle }^C​, assuming t0=2πt_0=2\pi​. Also, in this specific example ∣λ~j⟩=∣λj⟩|\tilde\lambda_j\rangle = |\lambda_j\rangle​, since the 4 qubits in the clock register are sufficient to accurately and precisely represent the 4 eigenvalues in binary. As far as the endianness of the combined quantum states is concerned we must keep in mind that in our circuit ∣q0⟩|q_0\rangle​ is the most significant qubit (MSQ) and ∣q3⟩|q_3\rangle​ is the least significant qubit (LSQ).

Next is the R(λ~−1)R(\tilde\lambda^{-1})​ rotation step. We make use of an ancilla qubit ∣s⟩|s\rangle​ (initialized in the state ∣0⟩|0\rangle​), which gets phase shifted depending upon the clock register's state. Let us take an example clock register state ∣0100⟩C=∣0⟩q0C⊗∣1⟩q1C⊗∣0⟩q2C⊗∣0⟩q3C|0100\rangle^C = |0\rangle_{q_0}^C\otimes|1\rangle_{q_1}^C\otimes|0\rangle_{q_2}^C\otimes|0\rangle_{q_3}^C​ (binary representation of the eigenvalue corresponding to ∣u3⟩|u_3\rangle​, that is 4). In this combined state, ∣q1⟩|q_1\rangle​  is in the state ∣1⟩|1\rangle​ while ∣q0⟩,∣q2⟩|q_0\rangle,|q_2\rangle​ and ∣q3⟩|q_3\rangle​ are all in the state ∣0⟩|0\rangle​. This state will only trigger the Ry(8π4.2r)R_y(\frac{8\pi}{4.2^r})​ rotation gate, and none of the other phase shift gates. Thus, we may say that the smallest eigenvalue states in C cause the largest ancilla rotations. Using linearity arguments, it is clear that if the clock register state had instead been ∣b⟩|b\rangle​, as in our original example, the final state generated by this rotation step would be  ∑j=1Nβj∣uj⟩I∣λ~j⟩C((1−C2/λ~j2)1/2∣0⟩+Cλ~j∣1⟩)S\sum_{j=1}^{N}\beta_j|{u_j}\rangle_I|{\tilde\lambda_j}\rangle^C((1-C^2/\tilde\lambda_j^2)^{1/2}|{0}\rangle+\frac{C}{\tilde\lambda_j}|{1}\rangle)^S where C=8π/2rC=8\pi/2^r​. For this step, an apriori knowledge of the eigenvalues of A\mathbf{A}​ was necessary to design the gates.

Then, as elaborated above, the inverse phase estimation step essentially reverses the quantum phase estimation step. The state produced by this step, conditioned on obtaining $|1\rangle$ in ancilla is 8π2r∑j=1j=4122j−1∣uj⟩\frac{8\pi}{2^r}\sum_{j=1}^{j=4}\frac{\frac{1}{2}}{2^{j-1}}|u_j\rangle​. Upon writing in the standard basis and normalizing, it becomes 1340(−∣00⟩+7∣01⟩+11∣10⟩+13∣11⟩)\frac{1}{\sqrt{340}}(-|00\rangle + 7|01\rangle + 11|10\rangle + 13|11\rangle)​. This is proportional to the exact solution of the system x=132[−171113]T\mathbf{x} = \frac{1}{32} \left[\begin{matrix}-1 & 7 & 11 & 13\end{matrix}\right]^T​.

The Group Leaders Optimization Algorithm is employed to approximately decompose the exp⁡(iAt2k)\exp(\frac{i\mathbf{A}t}{2^k})​ gates in the Hamiltonian simulation step into elementary quantum gates.

## Quantum Circuit Simulation

from qiskit import QuantumCircuit, execute
from strangeworks.qiskit import get_backend
from math import pi

qc = QuantumCircuit(7, 2)

qc.h(1)
qc.h(2)
qc.h(3)
qc.h(4)
qc.h(5)
qc.h(6)

#application of exp(iAt/16) operator
qc.h(6)
qc.ccx(1,5,6)
qc.h(6)
qc.cu3(0.196,-pi/2,pi/2,1,6)
qc.cu3(pi/2,pi/2,-pi/2,1,6)
qc.u1(0.379,1)

qc.cu3(0.981,-pi/2,pi/2,1,5)
qc.u1(0.589,1)
qc.ccx(1,5,6)
qc.cu3(0.196,-pi/2,pi/2,1,5)
qc.ccx(1,5,6)
qc.h(6)
qc.ccx(1,5,6)
qc.h(6)

#application of exp(iAt/8) operator
qc.h(6)
qc.ccx(2,5,6)
qc.h(6)
qc.cu3(1.963,-pi/2,pi/2,2,6)
qc.cu3(pi/2,pi/2,-pi/2,2,6)
qc.u1(1.115,2)

qc.cu3(1.963,-pi/2,pi/2,2,5)
qc.u1(2.615,2)
qc.ccx(2,5,6)
qc.cu3(0.178,-pi/2,pi/2,2,5)
qc.ccx(2,5,6)
qc.h(6)
qc.ccx(2,5,6)
qc.h(6)

#application of exp(iAt/4) operator
qc.h(6)
qc.ccx(3,5,6)
qc.h(6)
qc.cu3(-0.785,-pi/2,pi/2,3,6)
qc.cu3(pi/2,pi/2,-pi/2,3,6)
qc.u1(1.017,3)

qc.cu3(3.927,-pi/2,pi/2,3,5)
qc.u1(2.517,3)
qc.ccx(3,5,6)
qc.cu3(2.356,-pi/2,pi/2,3,5)
qc.ccx(3,5,6)
qc.h(6)
qc.ccx(3,5,6)
qc.h(6)

#application of exp(iAt/2) operator
qc.h(6)
qc.ccx(4,5,6)
qc.h(6)
qc.cu3(-9.014*10**(-9),-pi/2,pi/2,4,6)
qc.cu3(pi/2,pi/2,-pi/2,4,6)
qc.u1(-0.750,4)

qc.cu3(1.571,-pi/2,pi/2,4,5)
qc.u1(0.750,4)
qc.ccx(4,5,6)
qc.cu3(-1.571,-pi/2,pi/2,4,5)
qc.ccx(4,5,6)
qc.h(6)
qc.ccx(4,5,6)
qc.h(6)

#Applying Inverse Fourier Transform
qc.h(1)
qc.cu1(pi/2,2,1)
qc.cu1(pi/4,3,1)
qc.cu1(pi/8,4,1)
qc.h(2)
qc.cu1(pi/2,3,2)
qc.cu1(pi/4,4,2)
qc.h(3)
qc.cu1(pi/2,4,3)
qc.h(4)

#applying control-Ry operations
qc.cu3(8*pi/2,0,0,1,0)
qc.cu3(4*pi/2,0,0,2,0)
qc.cu3(2*pi/2,0,0,3,0)
qc.cu3(pi/2,0,0,4,0)

#Applying Quantum Fourier Transform
qc.h(4)
qc.cu1(-pi/2,4,3)
qc.h(3)
qc.cu1(-pi/4,4,2)
qc.cu1(-pi/2,3,2)
qc.h(2)
qc.cu1(-pi/8,4,1)
qc.cu1(-pi/4,3,1)
qc.cu1(-pi/2,2,1)
qc.h(1)

#application of exp(-iAt/2) operator
qc.h(6)
qc.ccx(4,5,6)
qc.h(6)
qc.ccx(4,5,6)
qc.cu3(-1.571,-pi/2,pi/2,4,5)
qc.ccx(4,5,6)
qc.u1(0.750,4)
qc.cu3(1.571,-pi/2,pi/2,4,5)
qc.u1(-0.750,4)
qc.cu3(pi/2,pi/2,-pi/2,4,6)
qc.cu3(-9.014*10**(-9),-pi/2,pi/2,4,6)
qc.h(6)
qc.ccx(4,5,6)
qc.h(6)

#application of exp(-iAt/4) operator

qc.h(6)
qc.ccx(3,5,6)
qc.h(6)
qc.ccx(3,5,6)
qc.cu3(2.356,pi/2,-pi/2,3,5)
qc.ccx(3,5,6)
qc.u1(-2.517,3)
qc.cu3(3.927,pi/2,-pi/2,3,5)
qc.u1(-1.017,3)
qc.cu3(pi/2,3*pi/2,-3*pi/2,3,6)
qc.cu3(-0.785,pi/2,-pi/2,3,6)
qc.h(6)
qc.ccx(3,5,6)
qc.h(6)

#application of exp(-iAt/8) operator

qc.h(6)
qc.ccx(2,5,6)
qc.h(6)
qc.ccx(2,5,6)
qc.cu3(0.178,pi/2,-pi/2,2,5)
qc.ccx(2,5,6)
qc.u1(-2.615,2)
qc.cu3(1.963,pi/2,-pi/2,2,5)
qc.u1(-1.115,2)
qc.cu3(pi/2,3*pi/2,-3*pi/2,2,6)
qc.cu3(1.963,pi/2,-pi/2,2,6)
qc.h(6)
qc.ccx(2,5,6)
qc.h(6)

#Applying exp(-iAt/16) operator
qc.h(6)
qc.ccx(1,5,6)
qc.h(6)
qc.ccx(1,5,6)
qc.cu3(0.196,pi/2,-pi/2,1,5)
qc.ccx(1,5,6)
qc.u1(0.589,1)
qc.cu3(0.981,pi/2,-pi/2,1,5)
qc.u1(0.379,1)
qc.cu3(pi/2,3*pi/2,-3*pi/2,1,6)
qc.cu3(0.196,pi/2,-pi/2,1,6)
qc.h(6)
qc.ccx(1,5,6)
qc.h(6)

qc.measure([5,6], [0,1])

# get a Strangeworks backend
backend = get_backend("qasm_simulator")

# execute circuit
execute(qc, backend, shots=8192)



## Execution results

After executing the circuits on the Strangeworks backend, the outcomes 00, 01, 10, and 11 are obtained with 22%, 25%, 27%, and 26% which clearly shows the required solution state.

## Conclusion

To conclude, we have noticed that any multiple regression problem can be reduced to an equivalent linear system of equations problem. This allows us to employ the general quantum speed-up techniques like the HHL algorithm. However, for the HHL we need low error low cost circuits for the Hamiltonian simulation and controlled rotation steps to get accurate results. Although methods like Trotter-Suzuki decomposition and the Solvay-Kitaev algorithm can help to decompose Hamiltonian simulation, but except in certain cases they provide neither minimum cost nor efficient gate sequences. In most practical scenarios we would prefer a low cost gate sequence which approximates the unitary operator well, rather than a high cost exact decomposition. This is where stochastic genetic algorithms like the GLOA comes into play. However, we noted earlier, the GLOA turns out to be useful only in those cases where the matrix exponentials can be efficiently calculated. Otherwise the speedup provided by the HHL would be nullified. One might also argue that the large time taken to find the correct set of gates for the HHL, given a particular A\mathbf{A}​, renders the quantum speed-up achieved useless. That is indeed true for smaller datasets but one should recall that the discussed technique reveals its true speed-up only for large datasets. This is because we restrict the number of gates to be used for the GLOA to a maximum of 20. So while the number and size of the data sets can be arbitrarily large, only a limited number of gates will ever be used for approximating the Hamiltonian simulation step of the HHL.

## References

Sanchayan Dutta, Adrien Suau, Sagnik Dutta, Suvadeep Roy, Bikash K. Behera and Prasanta K. Panigrahi, Demonstration of a Quantum Circuit Design Methodology for Multiple Regression, [arXiv:1811.01726 ]