 # Quantum Computing, Beyond Qubits – Part 3: AI, Optimization, and Quantum Annealing

## 1. Overview

Perhaps a great way of starting this article would be by defining the high-level concepts behind combinatorial optimization and quantum annealing.

Let’s tackle Combinatorial Optimization first.

#### Combinatorial Optimization

Combinatorial optimization is a subfield of mathematics that deals with specific optimization problems. These are problems where a large but finite number of discrete solutions exist, with one particular solution being the best.

The best (or optimal) solution satisfies a certain condition, such as the cost function’s minimization (or maximization) associated with this problem. We will have a bit more to say about this later on.

The most popular example of Combinatorial Optimization is the Travelling Salesman Problem. In this story, a salesman has to visit several cities using the shortest path only once.

To find the shortest path, we would need to try them all. This is a perfect example of a hard problem where the number of computations required to solve this problem rises exponentially with its size.

Now the question becomes: Is there a way to find the shortest path to this problem without visiting all of them?

The answer turns out to be YES. You can find an exact by using a quantum annealing technique that runs on a quantum computer.

Before discussing quantum annealing, we must dissect its classical counterpart, simulated annealing.

#### Simulated Annealing

Simulated Annealing is a probabilistic method of searching many solutions by randomly jumping from one solution to the next and selecting the better candidate. At the beginning of the simulation, it is possible to select a worse solution in the neighbourhood of the current one with some probability p. This allows the exploration of farther regions of the search space for better solutions. As time passes, the probability of selecting a worse solution decreases such that, at some point $T_{max}$ in the future, it becomes null, and the current candidate is selected as the solution to the problem.

Simulated Annealing does not guarantee an optimal solution, but it usually gets you close enough. For most practical situations, this happens to be alright.

But what if that’s not enough?

It turns out that using quantum wizardry, T. Kadowaki and H. Nishimori proved in a 1998 research paper that quantum annealing techniques guarantee exact solutions and run in shorter times.

The big surprise, however, was when, on May 11, 2011, D-Wave Systems, a Canadian-based company, announced D-Wave One, “the world’s first commercially available quantum computer”, and turned Kadowaki and Nishimori’s idea into a real working device.

In this article, we will explore this fascinating topic in some depth. We will use some of the notions we explored in Part 1 and Part 2 on quantum computing.

We will try to make sense of how it works and its real-world practical implications, of which there are more than a few.

#### Do Not Be Intimidated by the Equations!

We will keep the mathematical equations in this text to an absolute minimum. There will also be many accompanying explanations whenever these are present. In most cases, they are very simple and should not be difficult to comprehend.

## 3. Combinatorial Optimization

### 3.1 What is Combinatorial Optimization

Combinatorial optimization is a subfield of mathematical optimization that deals with specific types of problems.

For these problems, the following applies:

• The problem itself can be defined as the search for an optimum (maximum or minimum) value $x_{opt}$ of an objective or cost function: $f(x): A \rightarrow \mathbb{R}$

Where: $f(x_{opt}) = f_{opt}$

• The search space is discrete and finite: $x_{j, min} \leq x_j \leq x_{j, max}, \quad j \in J$

• And finally, some linear constraints might apply to the inputs: $l_{min} \leq Bx \leq l_{max}, \quad B_{ij} \in \mathbb{R}^{m \times n}$

The tricky thing about combinatorial optimization problems is that they are hard to solve.

The only way we know today to find guaranteed, exact solutions to these problems is by trying all the possible inputs.

If you know the maximum or minimum of the cost function beforehand, you may be able to stop the search once you hit that number without trying out all the possibilities. That, however, is not always the case.

These problems are called intractable, where solutions cannot be determined efficiently (i.e. by anything other than brute force).

### 3.2 Example: Problem of Resource Allocation

Let’s take an example of a combinatorial optimization problem that anybody in the software project delivery world is familiar with: resource allocation.

So we have a team of skilled software engineers and several tasks that must be performed.

#### Resource Allocation Problem 1

The problem of resource allocation can be stated as follows: given the number of resources we have and the tasks that need to be completed, how best do we allocate these resources so that the whole set of tasks can be finished in the least number of days possible?

We can also formulate the problem in a slightly different way:

#### Resource Allocation Problem 2

An alternative formulation of the resource allocation problem can be as follows: given the tasks that need to be completed, what is the minimum number of resources required to finish the tasks in a predefined number of days?

We can make the problem slightly harder with a third formulation.

#### Resource Allocation Problem 3

Given a pool of skilled resources and several tasks to be completed, what is the combination of resource usage and timeline that would minimize the project’s overall cost? In this case, the project cost is calculated as the number of resources used times and how many hours they have spent on the project.

The solution to the problem of resource allocation needs to satisfy certain constraints.

#### Constraint 1: Resource Headcount Limit

There would be a maximum number of resources available for this project. If you use all the available resources, you might finish on time, but the project cost might be higher than it could be.

Some tasks have dependencies on other tasks and, therefore, cannot be started unless their dependencies have been completed. Other tasks can be run in parallel without any issues.

#### Constraint 3: Resource Specialization

Some resources are better than others at certain tasks due to specific skills they might possess. This needs to be taken into account in the optimization process.

#### Constraint 4: Balancing Usage

Ideally, if a fixed number of resources is available for the project, we would like to have all of them utilized no less than, say, 60% and no more than 110%. For example, can we get them to do something else for highly specialized resources that might need to wait until certain dependencies are completed in the meantime?

### 3.3 Combinatorial Optimization Solutions

Problems such as resource allocation with smaller sizes can typically be solved by brute force. It’s only beyond a certain size threshold that such exercises become intractable.

Fortunately, and for most practical cases, we do not require an exact solution; a good approximation will do.

Google has developed and open-sourced a number of these algorithms which they refer to as solvers.

These solvers provide approximate solutions for the below list of optimization problems:

1. Linear optimization
2. Constraint optimization
3. Bin packing
4. Network flows
5. Scheduling
6. Routing

A metaheuristic called Simulated Annealing is at the heart of some of these algorithms.

In the next section, we look at Simulated Annealing in some depth. Next, we take a deep dive into the quantum version of this metaheuristic and see when and how it can perform better than its classical counterpart.

## 4. Simulated Annealing

### 4.1 What is Simulated Annealing

Simulated annealing is a probabilistic sampling method for finding the global optimum of a given cost function.

It is referred to as a metaheuristic. Let’s talk a bit about heuristics and metaheuristics.

#### What’s a Heuristic

heuristic (from Greek εὑρίσκω “I find, discover”) is a technique designed for solving difficult problems. A problem is said to be difficult if there is no guarantee that an exact solution actually exists or if its exact solution can take too long to discover. A heuristic would then be applied to find an approximate solution good enough for practical purposes. In essence, a heuristic would sacrifice precision and optimality for speed.

#### What’s a Metaheuristic

metaheuristic is a technique for finding heuristics. That may seem confusing but not by much. Metaheuristics explore a set of solutions to a certain optimization problem by making certain assumptions about that problem. As we shall see with simulated annealing, those assumptions can help reduce the search space size to be explored.

Simulated annealing comes from annealing in metallurgy, a technique that involves heating and controlled cooling of a material to alter its physical properties.

We will not worry too much about the historical origins of the name as it’s not required to understand how the algorithm works.

The problems solved by simulated annealing are very similar to the resource allocation problem: an objective function (the cost or timeframe of a project) of many variables (resources, skillsets, tasks, dependencies), subject to linear constraints (limited number of resources, skillsets, etc.).

In 1983, Kirkpatrick, Gelatt Jr., and Vecchi proposed this approach for solving The Traveling Salesman problem. They also gave it its current name, simulated annealing.

### 4.2 How Simulated Annealing Works

#### 4.2.1 Simulated Annealing Algorithm

To understand how simulated annealing works, which is quite simple and somehow more intuitive than it suggests, let us look at a pseudo-code that runs it.

First, we define the system’s state at any given time as $s \in E$ where $E$ is the universe of all possible states of the system.

For example, going back to the resource allocation problem, we can think of s as any project plan that does not violate our constraints on the resources, tasks, skillsets, etc.

We kick-off the algorithm by selection a state $s_0$ from $E$

Step 1: Let $s = s_0$

Step 2: For $k = 0 \rightarrow k_{max}$:

— Step 2a: Update temperature: $T \leftarrow temperature( 1 - \dfrac{k+1}{k_{max}})$

Step 2b: If $P(E(s), E(s_{new}), T) \geq random(0, 1)$: $s \leftarrow s_{new}$

Step 3: Output: the final state s

The general idea behind the algorithm is quite simple. It runs something like this:

2. For a predetermined number of iterations k
• Look for a better solution by randomly exploring the universe of possible solutions E

Now the exploration process is not totally random, as we shall describe later.

To fully appreciate the gain in performance that we achieve in this new exploration method, we need to understand three of its elements:

1. The neighbour finding function
2. The Temperature T
3. Probability P of switching to a new state

Let’s look at these in turn.

#### 4.2.2 The Neighbour Function

The gain in search speed that the simulated annealing heuristic provides comes from the assumption that we make about the structure of the search space.

The assumption is this: if a state with a lower cost function than the current state exists, chances are the new state is in the vicinity of the current candidate.

#### Exploiting the Structure of Search Spaces

If the search space structure was completely random, no heuristic that would provide better results than brute force would exist. We will use our knowledge of the structure of the search space to design an algorithm that would put that information to good use.

To achieve this, we need a function (which we call neighbour() in our pseudo-code) to find a new state that happens to be in the vicinity of our current candidate.

To give you a feel of how that works, let’s revisit the resource allocation problem once more.

The neighbour selection goes like this: when choosing an alternative project plan, we can start with the current one and permutate two variables (resources or two tasks) while maintaining the rest in their current positions.

Of course, the structure of the search space can also be deceptive: if the local minimum is too deep, chances are that the algorithm will get stuck inside.

#### 4.2.3 Temperature

Because this is an iterative approach, we need to make sure it converges and finally stops, and this is exactly what the Temperature‘s role is supposed to help with.

To understand how this works, let’s have a look at some of the generic properties of the Temperature function:

• The Temperature is a function of the step: $T = T( 1 - \dfrac{k+1}{k_{max}})$

• The rate of change of the Temperature is negative with respect to k. When the process reaches $k_{max}$ we expect to see: $T(k \rightarrow k_{max}) < \epsilon$

So how does the temperature help us find the optimum solution?

The decreasing temperature impacts the probability of accepting a worse state. It makes jumping to a worse state at lower temperatures more difficult.

We will discuss the probability of acceptance in the next subsection.

#### 4.2.4 Probability of Acceptance

The probability of replacing the current state with a new state is determined by the Acceptance Probability Function: $P(s = s_{new}) = P(E_s, E_{s_{new}}, T)$

To understand how it works, we make the following observations:

• The probability P must be greater than 0 even when the new state has higher energy than the current one. This prevents the algorithm from getting stuck at a local minimum. $P(s = s_{new}) > 0, \quad \forall s, s_{new}$

• The Acceptance Probability Function depends on the annealing schedule through the following rule: if the temperature tends to 0, the probability of accepting a worse state must also tend to 0 $P_{T \rightarrow 0} (s=s_{new} | E_{s_{new}} > E_s) \rightarrow 0$

• If the new state’s energy is smaller than the current one, the probability of accepting it should be greater than 0, regardless of the temperature. $E(s_{new}) < E(s) \Rightarrow P(E_s, E_{s_{new}}, T) > 0, \quad \forall T$

## 5. Quantum Annealing

### 5.1 What is Quantum Annealing

#### 5.1.1 The Quantum Annealing Model

We will use the 1998 research paper published by Tadashi Kadowaki and Hidetoshi Nishimori to get an idea of the theoretical model they have used to introduce the current version of quantum annealing.

Let’s have a look at an excerpt from the abstract.

We introduce quantum fluctuations into the simulated annealing process of optimization problems, aiming at faster convergence to the optimal state. Quantum fluctuations cause transitions between states and thus play the same role as thermal fluctuations in the conventional approach. The idea is tested by the transverse Ising model, in which the transverse field is a function of time similar to the temperature in the conventional method. The goal is to find the ground state of the diagonal part of the Hamiltonian with high accuracy as quickly as possible.

— Quantum Annealing in the Transverse Ising Model, 1998

Instead of thermal fluctuations, we now have quantum fluctuations, which, using quantum tunnelling, can change the system’s state from one search space region to another, overcoming energy barriers.

Playing the role of the temperature in the classical model, they now introduced a transverse magnetic field based on the Ising model.

#### 5.1.2 The Ising Model

Things can become quite technical quickly, but we will endeavour to keep things as simple as possible without losing the essence.

The Ising model can be considered a network of discrete variables representing magnetic dipole moments of spins in one of two states: +1 or −1.

The spins are usually arranged in a crystal lattice, allowing each spin to interact with its neighbours.

Neighbouring spins with the same value have lower energy than those with differing values.

Under this condition, i.e. neighbours with similar spin values have lower energies, we can write the overall energy of the system (or Hamiltonian) as follows: $\mathcal{H}(s) = -\displaystyle\sum_{i,j} J_{ij} \sigma_i \sigma_j -\displaystyle\sum_{i} h_i \sigma_i$

Where s is the current state of the system determined by the following:

• Values of the spins $\sigma_i$ at each vertex i
• The interactions $J_{ij}$ between two neighbouring vertices i and j
• The transverse magnetic field’s interaction $h_{i}$ with vertex i
• And finally, $\mu$ the magnetic moment

#### 5.1.3 Coding the Objective Function

At this stage, we should know what quantum annealing looks like. We still need to figure out one important thing.

How do we communicate a problem to the quantum computer, or how best to encode the objective function on the quantum computer?

This is a relatively easy problem to understand. Implementation is a different story, but we will let the engineers at D-Wave Systems worry about that.

In the next paragraphs, we will use some D-Wave terminologies.

We must encode the objective function into a Quadratic Unconstrained Binary Optimization (or QUBO) problem to communicate the problem to the quantum computer.

The form of a QUBO is as follows: $F(s) = \displaystyle\sum_{i,j} b_{ij} q_i q_j +\displaystyle\sum_{i} a_i q_i$

Where:

• The $a_i$ are called biases
• The $b_{ij}$ are called couplings
• And the $q_i$ are quantum bits (or qubits)

You have probably already noticed the resemblance of this formulation to the Hamiltonian of an Ising model.

In the next section, we look at how this problem will be solved on the quantum computer.

#### 5.1.4 Finding the Ground State

Given that the system has been prepared in such a way that its energy state can be identified with the objective function, all we need to do now is let the system evolve on its own for some time (typically of the order of a few microseconds on the D-Wave Annealer) to get the solution.

Why? Because, by definition, the system’s energy is not yet at its lowest, and like all physical systems, it will tend to evolve towards its lowest energy state.

This evolution follows the Schrödinger equation, equivalent to Newton’s equation of motion for quantum systems.

Once it’s at the minimum level, we would measure the values of the qubits to find our solution.

Here is how it all works.

#### Step 1: Preparing the System

During this initial step, the system is prepared in a superposition of all states. The energy level is high because the spins are set randomly (remember, the Hamiltonian, as described in the Ising model, is lower when neighbouring states have similar spins).

Also, because of its ability to have many qubits placed into superposition, the quantum computer can store many states.

The transverse field has been applied at this stage.

#### Step 2: Quantum Annealing Starts

Once the system is ready, we kick off the quantum annealing process by slowly reducing the strength of the transverse field (which is what kept the qubits at that high energy level).

At the same time, we start applying the couplings and the biases. Both of these changes will allow the energy landscape to start forming.

We now have qubit quantum states possessing lower energy states. We let the system evolve following the Schrodinger equation. This is similar to letting a ball slide on a slope.

#### Step 3: Measuring the Result

The process is repeated until the system settles down on its lowest energy state.

The state of the qubits is measured to obtain the solution to our optimization problem.

### 5.2 Is Quantum Annealing Superior to Simulated Annealing?

Yes, in some cases.

The research paper by the Japanese scientists we mentioned in the previous section showed the result of an interesting numerical experiment that they did.

The following excerpt from their paper explains the experiment and the results.

We have proposed the idea of quantum annealing (QA) […] in contrast to the usual thermal transitions in simulated annealing (SA). The idea was tested in the transverse Ising model obeying the time-dependent Schrödinger equation. […] The numerical results on the probability to find the system in the ground state were compared with the corresponding probability derived from the numerical solution of the master equation representing the SA processes. We have found that QA shows convergence to the optimal (ground) state with larger probability than SA in all cases if the same annealing schedule is used.

— Quantum Annealing in the Transverse Ising Model, 1998

Here is what the above excerpt is trying to say.

The authors proposed a mathematical model for a Quantum Computing algorithm (D-Wave did not exist then) that could solve a typical combinatorial optimization problem.

Next, they took an example of a fixed-size system for which the ground state (or optimal solution was known in advance).

Then, they tried to demonstrate the superiority of Quantum Annealing by comparing the performance of the latter against simulated annealing on that particular system.

For quantum and simulated annealing, they solved the equations numerically. Finally, they compared the results.

Now, if you look at the graph above, you will find that QA reached an optimum result almost 100% of the time. That’s pretty impressive compared to only 60%, which SA showed.

### 5.3 When Is Quantum Annealing Superior

To understand why Quantum Annealing can be superior and under what conditions, we need to dive deeper into the quantum tunnelling effects underlying its inner workings.

So how does quantum tunnelling works?

#### Quantum Tunnelling

Quantum tunnelling describes the phenomenon whereby a particle can suddenly appear on the other side of a barrier. The reason why this happens is due to the nature of quantum mechanical particles and phenomena. More precisely, the properties of a quantum mechanical system are generally described by a wavefunction. The latter is not so useful on its own. Still, its squared amplitude, calculated at a certain position, gives the probability of finding that particle at that position when a measurement is performed. Quantum tunnelling occurs because the wavefunction, on the other side of the barrier, is found to be non-zero.

In simulated annealing, thermal transition probabilities are responsible for the algorithm’s ability to jump to different states. The probability is found to be proportional to the following term: $e^{-{\frac {\Delta }{k_{B}T}}}}$

Where T is the temperature, kB is the Boltzmann constant, and $\Delta = E_{s_{new}} - E_s$ is the height of the barrier.

The result is that for very high barriers, it is extremely difficult for thermal fluctuations to move the system from a deep local minimum to a different state.

However, in 1989 Ray, Chakrabarti & Chakrabarti stated that the quantum tunnelling probability through the same barrier depends not only on the height of the barrier but also on its width: $e^{-{\frac {{\sqrt {\Delta }}w}{\Gamma }}}}$

It is great that the probability of advancing to a new state now depends also on the width of the barrier an not just its height.

Why? If $w \ll \sqrt {\Delta}$, quantum fluctuations can extricate them out of the local minimum.

To exploit this property of quantum tunnelling, the energy landscape of the cost function we are trying to optimize should present narrow barriers between local minima.

## 6. Quantum Annealing and AI

Now that we have a real physical device that can perform quantum computations and solve very hard optimization problems through quantum annealing, what has it been useful for?

How far can it advance research or applications that use Artificial Intelligence (AI)?

To appreciate the significance of this technology in research that leverages AI, we just need to look at who is currently using it and to what ends.

Here is a list of success stories from D-Wave customers:

1. Multiverse Computing: Optimizing Financial Portfolios with Quantum Computing
2. Groovenauts and Mitsubishi EstateCreating Sustainable Cities through Waste Collection Optimization
3. Menten AI Battles COVID-19 with Quantum Peptide Therapeutics

But the list goes on, although this time from Wikipedia, to include Lockheed Martin, the University of Southern California, Google/NASA, and Los Alamos National Lab.

According to Menten CEO Hans Melo, the results have been extremely encouraging, with the hybrid quantum approach often outperforming classical solvers. Menten AI’s unique protein designs have been computationally validated, chemically synthesized, and are being advanced to live-virus testing against COVID-19.

— Menten AI is Reimagining Biology with Quantum-Powered Protein Design

The most interesting of the lot, in my opinion, are problems that lay in the field of biotechnology, and that use quantum computers to solve problems of protein design for use in new drugs.

We are fortunate to live in a day and age where anybody can access state-of-the-art technology online.

Whether it’s D-Wave Systems, Google AI, or IBM’s Q System One, anybody with enough scientific curiosity and a bit of knowledge in physics can enjoy the immense satisfaction you get when something really difficult suddenly starts making sense.