# Quantum Computing, Beyond Qubits – Part 3: AI, Optimization, and Quantum Annealing

#### Georges Lteif

Software Engineer

Last Updated on December 14, 2022.
Subscribe now to stay posted!

## 1. Overview

Perhaps a great way of starting this article would be by defining the high-level concepts behind combinatorial optimization and quantum annealing.

Let’s tackle Combinatorial Optimization first.

#### Combinatorial Optimization

Combinatorial optimization is a subfield of mathematics that deals with specific types of optimization problems. These are problems where a large but finite number of discrete solutions exist, with one particular solution being the best.

The best (or optimal) solution satisfies a certain condition such as the minimization (or maximization) of a cost function associated with this problem. We will have a bit more to say about this later on.

The most popular example of Combinatorial Optimization is the Travelling Salesman Problem. In this particular story, a salesman has to visit a number of cities only once using the shortest path.

To find the shortest path, we would need to try them all. This is a perfect example of hard problems where the number of computations required to solve this problem rises exponentially with its size.

Now the question becomes: Is there a way for finding the shortest path in this problem without having to visit all of them?

The answer turns out to be YES. In fact, you can find an exact by using a technique called quantum annealing that runs on a quantum computer.

Before we can discuss quantum annealing, we will need to dissect its classical counterpart otherwise known as simulated annealing.

#### Simulated Annealing

Simulated Annealing is a probabilistic method of searching a large number of solutions by randomly jumping from one solution to the next and selecting the better candidate. At the beginning of the simulation, it is possible to select a worse solution in the neighborhood of the current one with some probability p. This allows the exploration of farther regions of the search space for better solutions. As time goes by, the probability of selecting a worse solution decreases such that, at some point $T_{max}$ in the future, it becomes null, and the current candidate is selected as the solution to the problem.

Simulated Annealing does not guarantee an optimal solution but it does get you close enough most of the time. For most practical situations, this happens to be alright.

But what if that’s not enough?

It turns out that by using quantum wizardry, T. Kadowaki and H. Nishimori were able to prove in a 1998 research paper that quantum annealing techniques not only guarantee exact solutions but also run do that in shorter times.

The big surprise however was when, on May 11, 2011, D-Wave Systems, a Canadian-based company, announced D-Wave One, “the world’s first commercially available quantum computer”, and turned Kadowaki and Nishimori’s idea into a real working device.

In this article, we will explore this fascinating topic in some depth. We will use some of the notions we explored in Part 1 and Part 2 on quantum computing.

We will try to make sense of how it works, as well as its real-world practical implications, of which there are more than a few.

#### Do Not Be Intimidated by the Equations!

We will keep the mathematical equations in this text to an absolute minimum. There will also be a generous amount of accompanying explanations whenever these are present. In most cases, they are very simple and should not be difficult to comprehend.

## 3. Combinatorial Optimization

### 3.1 What is Combinatorial Optimization

Combinatorial optimization is a subfield of mathematical optimization that deals with specific types of problems.

For these problems, the following applies:

• The problem itself can be defined as the search for an optimum (maximum or minimum) value $x_{opt}$ of an objective or cost function:

$f(x): A \rightarrow \mathbb{R}$

Where:

$f(x_{opt}) = f_{opt}$

• The search space is discrete and finite:

$x_{j, min} \leq x_j \leq x_{j, max}, \quad j \in J$

• And finally, some linear constraints might apply to the inputs:

$l_{min} \leq Bx \leq l_{max}, \quad B_{ij} \in \mathbb{R}^{m \times n}$

The tricky thing about combinatorial optimization problems is that they are hard to solve.

In fact, the only way we know of today to find guaranteed, exact solutions to these problems is by trying out all the possible inputs.

If you know the maximum or minimum of the cost function beforehand, then you may be able to stop the search once you hit that number and without having to try out all the possibilities. That however is not always the case.

These types of problems, where solutions cannot be determined efficiently (i.e. by anything other than brute-force), are called intractable.

### 3.2 Example: Problem of Resource Allocation

Let’s take an example of a combinatorial optimization problem that anybody in the software project delivery world is familiar with: resource allocation.

So we have a team of skilled software engineers and a number of tasks that need to be performed.

#### Resource Allocation Problem 1

The problem of resource allocation can be stated as follows: given the number of resources we have, and the tasks that need to be completed, how best do we allocate these resources so that the whole set of tasks can be finished in the least number of days possible?

We can also formulate the problem in a slightly different way:

#### Resource Allocation Problem 2

An alternative formulation of the problem of resource allocation can be as follows: given the tasks that need to be completed, what is the minimum number of resources required to finish the tasks in a predefined number of days?

We can make the problem slightly harder with a third formulation.

#### Resource Allocation Problem 3

Given a pool of skilled resources and a number of tasks to be completed, what is the combination of resource usage and timeline that would minimize the overall cost of the project? In this case, the cost of the project is calculated as the number of resources used times how many hours they have spent on the project.

The solution to the problem of resource allocation needs to satisfy certain constraints.

#### Constraint 1: Resource Headcount Limit

There would be a maximum number of resources available for this project. If you use all the available resources, you might finish on time but the project cost might be higher than it could be.

Some tasks, by virtue of their nature, have dependencies on other tasks and therefore cannot be started unless their dependencies have been completed. Other tasks can be run in parallel without any issues.

#### Constraint 3: Resource Specialization

Some resources are better than others at certain tasks due to specific skills they might possess. This needs to be taken into account in the optimization process.

#### Constraint 4: Balancing Usage

Ideally, if a fixed number of resources is available for the project, we would like to have all of them utilized no less than, say, 60%, and no more than 110%. For example, for highly specialized resources that might need to wait until certain dependencies are completed, can we get them to do something else in the meantime?

### 3.3 Combinatorial Optimization Solutions

Problems such as resource allocation with smaller sizes can typically be solved by brute force. It’s only beyond a certain size threshold that such exercises become intractable.

Fortunately, and for most practical cases, we do not require an exact solution, a good approximation will do.

Google has developed and open-sourced a number of these algorithms which they refer to as solvers.

These solvers provide approximate solutions for the below list of optimization problems:

1. Linear optimization
2. Constraint optimization
3. Bin packing
4. Network flows
5. Scheduling
6. Routing

At the heart of some of these algorithms is a metaheuristic called Simulated Annealing.

In the next section, we look at Simulated Annealing in some depth. Next, we take a deep dive into the quantum version of this metaheuristic and see when and how it can actually perform better than its classical counterpart.

## 4. Simulated Annealing

### 4.1 What is Simulated Annealing

Simulated annealing is a probabilistic sampling method for finding the global optimum of a given cost function.

It is referred to as a metaheuristic. Let’s talk a bit about heuristics and metaheuristics.

#### What’s a Heuristic

heuristic (from Greek εὑρίσκω “I find, discover”) is a technique designed for solving difficult problems. A problem is said to be difficult if there is no guarantee that an exact solution actually exists, or if its exact solution can take too long to discover. A heuristic would then be applied to find an approximate solution that is good enough for practical purposes. In essence, a heuristic would sacrifice precision and optimality for speed.

#### What’s a Metaheuristic

A metaheuristic is a technique for finding heuristics. That may seem confusing but not by much. Metaheuristics explore a set of solutions to a certain optimization problem by making certain assumptions about that problem. As we shall see with simulated annealing, those assumptions can be helpful in reducing the size of the search space to be explored.

Simulated annealing comes from annealing in metallurgy, a technique that involves heating and controlled cooling of a material to alter its physical properties.

We will not worry too much about the historical origins of the name as it’s not required for understanding how the algorithm works.

The problems solved by simulated annealing are very similar to the resource allocation problem: an objective function (the cost or timeframe of a project) of many variables (resources, skillsets, tasks, dependencies), subject to linear constraints (limited number of resources, skillsets, etc.).

In 1983, Kirkpatrick, Gelatt Jr., Vecchi proposed this approach for solving The Traveling Salesman problem. They also gave it its current name, simulated annealing.

### 4.2 How Simulated Annealing Works

#### 4.2.1 Simulated Annealing Algorithm

To understand how simulated annealing works, which is quite simple and somehow more intuitive than it suggests, let us look at a pseudo-code that runs it.

First, we define the state of the system at any given moment in time as $s \in E$ where $E$ is the universe of all possible states of the system.

For example, and going back to the resource allocation problem, we can think of s as any project plan that does not violate the constraints we have on the resources, tasks, skillsets, etc.

We kick-off the algorithm by selection a state $s_0$ from $E$

Step 1: Let $s = s_0$

Step 2: For $k = 0 \rightarrow k_{max}$:

— Step 2a: Update temperature:

$T \leftarrow temperature( 1 - \dfrac{k+1}{k_{max}})$

Step 2b: If $P(E(s), E(s_{new}), T) \geq random(0, 1)$:

$s \leftarrow s_{new}$

Step 3: Output: the final state s

The general idea behind the algorithm is quite simple. It runs something like this:

2. For a predetermined number of iterations k
• Look for a better solution by randomly exploring the universe of possible solutions E

Now the exploration process is not totally random as we shall describe later.

To fully appreciate the gain in performance that we achieve in this new exploration method, we need to understand three of its elements:

1. The neighbour finding function
2. The Temperature T
3. Probability P of switching to a new state

Let’s look at these in turn.

#### 4.2.2 The Neighbour Function

The gain in search speed that the simulated annealing heuristic provides comes from the assumption that we make about the structure of the search space.

The assumption is this: if a state with a lower cost function than the current state exists, chances are the new state is in the vicinity of the current candidate.

#### Exploiting the Structure of Search Spaces

If the structure of the search space was completely random, no heuristic that would provide better results than brute force would exist. In fact, we will use our knowledge of the structure of the search space to design an algorithm that would put that information to good use.

To achieve this, we need a function (which we called neighbour() in our pseudo-code) to find a new state that happens to be in the vicinity of our current candidate.

To give you a feel of how that works, let’s revisit the resource allocation problem once more.

The neighbor selection goes like this: when choosing an alternative project plan, we can start with the current one and permutate two variables (resources or two tasks) while maintaining the rest in their current positions.

Of course, the structure of the search space can also be deceptive: if the local minimum is too deep, chances are that the algorithm will get stuck inside.

#### 4.2.3 Temperature

Because this is an iterative approach, we need to make sure it converges and finally stops and this is exactly what the Temperature‘s role is supposed to help with.

To understand how this works, let’s have a look at some of the generic properties of the Temperature function:

• The Temperature is a function of the step:

$T = T( 1 - \dfrac{k+1}{k_{max}})$

• The rate of change of the Temperature is negative with respect to k. When the process reaches $k_{max}$ we expect to see:

$T(k \rightarrow k_{max}) < \epsilon$

So how does the temperature help us find the optimum solution?

In fact, the decreasing temperature has an impact on the probability of accepting a worse state. It makes it more difficult to jump to a worse state at lower temperatures.

We will discuss the probability of acceptance in the next subsection.

#### 4.2.4 Probability of Acceptance

The probability of replacing the current state with a new state is determined by the Acceptance Probability Function:

$P(s = s_{new}) = P(E_s, E_{s_{new}}, T)$

To understand how it works, we make the following observations:

• The probability P must be greater than 0 even when the new state has a higher energy that the current one. This prevents the algorithm from getting stuck at at local minimum.

$P(s = s_{new}) > 0, \quad \forall s, s_{new}$

• The Acceptance Probability Function depends on the annealing schedule through the following rule: if the temperature tends to 0, the probability of accepting a worse state must also tend to 0

$P_{T \rightarrow 0} (s=s_{new} | E_{s_{new}} > E_s) \rightarrow 0$

• If the energy of the new state is the smaller the current one, the probability of accepting it should be greater than 0 regardless of the temperature.

$E(s_{new}) < E(s) \Rightarrow P(E_s, E_{s_{new}}, T) > 0, \quad \forall T$

## 5. Quantum Annealing

### 5.1 What is Quantum Annealing

#### 5.1.1 The Quantum Annealing Model

We will use the 1998 research paper published by Tadashi Kadowaki and Hidetoshi Nishimori to get an idea of the theoretical model they have used to introduce the current version of quantum annealing.

Let’s have a look at an excerpt from the abstract.

We introduce quantum fluctuations into the simulated annealing process of optimization problems, aiming at faster convergence to the optimal state. Quantum fluctuations cause transitions between states and thus play the same role as thermal fluctuations in the conventional approach. The idea is tested by the transverse Ising model, in which the transverse field is a function of time similar to the temperature in the conventional method. The goal is to find the ground state of the diagonal part of the Hamiltonian with high accuracy as quickly as possible.
Quantum Annealing in the Transverse Ising Model, 1998

Instead of thermal fluctuations, we now have quantum fluctuations which, by the use of quantum tunneling, can change the system’s state from one region of the search space to another overcoming energy barriers.

Playing the role of the temperature in the classical model, they now introduced a transverse magnetic field based on the Ising model.

#### 5.1.2 The Ising Model

Things can become quite technical very quickly, but we will endeavor to keep things as simple as possible without losing the essence.

The Ising model can be thought of as a network of discrete variables that represent magnetic dipole moments of spins that can be in one of two states: +1 or −1.

The spins are usually arranged in a crystal lattice, allowing each spin to interact with its neighbours.

Neighbouring spins with the same value have lower energy than those with differing values.

Under this condition, i.e. neighbours with similar spin values have lower energies, we can write the overall energy of the system (or Hamiltonian) as follows:

$\mathcal{H}(s) = -\displaystyle\sum_{i,j} J_{ij} \sigma_i \sigma_j -\displaystyle\sum_{i} h_i \sigma_i$

Where s is the current state of the system determined by the following:

• Values of the spins $\sigma_i$ at each vertex i
• The interactions $J_{ij}$ between two neighbouring vertices i and j
• The transverse magnetic field’s interaction $h_{i}$ with vertex i
• And finally, $\mu$ the magnetic moment

#### 5.1.3 Coding the Objective Function

At this stage, we should have some idea of what quantum annealing looks like. We still need to figure out one important thing.

How do we communicate a problem to the quantum computer, or how best to encode the objective function on the quantum computer?

In fact, this is a relatively easy problem to understand. Implementation though is a different story but we will let the engineers at D-Wave Systems worry about that.

In the next paragraphs, we will be using some of the D-Wave terminologies.

To communicate the problem to the quantum computer, we need to encode the objective function into what is called a Quadratic Unconstrained Binary Optimization (or QUBO) problem.

The form of a QUBO is as follows:

$F(s) = \displaystyle\sum_{i,j} b_{ij} q_i q_j +\displaystyle\sum_{i} a_i q_i$

Where:

• The $a_i$ are called biases
• The $b_{ij}$ are called couplings
• And the $q_i$ are quantum bits (or qubits)

You have probably already noticed the resemblance of this formulation to the Hamiltonian of an Ising model.

In the next section, we look at how this problem is going to be solved on the quantum computer.

#### 5.1.4 Finding the Ground State

Given that the system has been prepared in such a way that its energy state can be identified with the objective function, all we need to do now is let the system evolve on its own for some time (typically of the order of a few microseconds on the D-Wave Annealer) to get the solution.

Why? Because, by definition, the energy of the system is not yet at its lowest, and like all physical systems, it will have the tendency to evolve towards its lowest energy state.

This evolution follows the Schrödinger equation, which is the equivalent of Newton’s equation of motion for quantum systems.

Once its at the minimum level, we would measure the values of the qubits to find our solution.

Here is how it all works.

#### Step 1: Preparing the System

During this initial step, the system is prepared in a superposition of all states. The energy level is high because the spins are set in a random manner (remember the Hamiltonian, as we have described it in the Ising model, is lower when neighboring states have similar spins).

Also, because of its ability to have many qubits placed into superposition, the quantum computer can store an exponentially large number of states.

The transverse field has been applied at this stage.

#### Step 2: Quantum Annealing Starts

Once the system is ready, we kick-off the quantum annealing process by slowly reducing the strength of the transverse field (which is what was keeping the qubits at that high energy level).

At the same time, we start applying the couplings and the biases. Both of these changes will allow the energy landscape to start forming.

We now have qubit quantum states possessing lower energy states. We let the system evolve following the Schrodinger equation. This is similar to letting a ball slide on a slope.

#### Step 3: Measuring the Result

The process is repeated many times until the system settles down on its lowest energy state.

The state of the qubits is measured to obtain the solution to our optimization problem.

### 5.2 Is Quantum Annealing Superior to Simulated Annealing?

Yes, in some cases.

The research paper by the Japanese scientists we mentioned in the previous section showed the result of an interesting numerical experiment that they did.

The following excerpt from their paper explains the experiment and the results.

We have proposed the idea of quantum annealing (QA) […] in contrast to the usual thermal transitions in simulated annealing (SA). The idea was tested in the transverse Ising model obeying the time-dependent Schrödinger equation. […] The numerical results on the probability to find the system in the ground state were compared with the corresponding probability derived from the numerical solution of the master equation representing the SA processes. We have found that QA shows convergence to the optimal (ground) state with larger probability than SA in all cases if the same annealing schedule is used.
Quantum Annealing in the Transverse Ising Model, 1998

Here is what the above excerpt is trying to say.

The authors proposed a mathematical model for a Quantum Computing algorithm (D-Wave did not exist at that time) that could solve a typical combinatorial optimization problem.

Next, they took an example of a fixed-size system, for which the ground state (or optimal solution was known in advance).

Then, they tried to demonstrate the superiority of Quantum Annealing by comparing the performance of the latter against simulated annealing on that particular system.

For quantum and simulated annealing, they solved the equations numerically. Finally, they compared the results.

Now, if you look at the graph above, you will find that QA reached an optimum result almost 100% of the time. That’s pretty impressive when compared to only 60% which is what SA showed.

### 5.3 When Is Quantum Annealing Superior

To understand why Quantum Annealing can be superior, and under what conditions, we need to dive a bit deeper into the quantum tunneling effects which underlie its inner workings.

So how does quantum tunneling actually works?

#### Quantum Tunnelling

Quantum tunneling describes the phenomenon whereby a particle can suddenly appear on the other side of a barrier. The reason why this happens is due to the nature of quantum mechanical particles and phenomena. More precisely, the properties of a quantum mechanical system are generally described by a wavefunction. The latter is not so useful on its own but its squared amplitude, calculated at a certain position, gives the probability of finding that particle at that position when a measurement is performed. Quantum tunneling occurs because the wavefunction, on the other side of the barrier, is found to be non-zero.

In simulated annealing, thermal transition probabilities are responsible for the algorithm’s ability to jump to different states. The probability is found to be proportional to the following term:

${\displaystyle e^{-{\frac {\Delta }{k_{B}T}}}}$

Where T is the temperature, kB is the Boltzmann constant, and $\Delta = E_{s_{new}} - E_s$ is the height of the barrier.

The result is that for very high barriers, it is extremely difficult for thermal fluctuations alone to move the system, from a deep local minimum to a different state.

However, in 1989 Ray, Chakrabarti & Chakrabarti stated that the quantum tunneling probability through the same barrier depends not only on the height of the barrier but also on its width:

${\displaystyle e^{-{\frac {{\sqrt {\Delta }}w}{\Gamma }}}}$

It is great that the probability of advancing to a new state now depends also on the width of the barrier an not just its height.

Why? If $w \ll \sqrt {\Delta}$, quantum fluctuations will be able to extricate the out of the local minimum.

To be able to exploit this property of quantum tunneling, the energy landscape of the cost function that we are trying to optimize should present narrow barriers between local minima.

## 6. Quantum Annealing and AI

Now that we have a real physical device that can perform quantum computations and solve very hard optimization problems through quantum annealing, what exactly has it been useful for?

How far can it advance research or applications that use Artificial Intelligence (AI)?

To appreciate the significance of this technology in research that leverages AI, we just need to look at who is currently using it and to what ends.

Here is a list of success stories from D-Wave customers:

But the list goes on, although this time from Wikipedia, to include Lockheed Martin, University of Southern California, Google/NASA, and Los Alamos National Lab.

According to Menten CEO Hans Melo, the results have been extremely encouraging, with the hybrid quantum approach often outperforming classical solvers. Menten AI’s unique protein designs have been computationally validated, chemically synthesized, and are being advanced to live-virus testing against COVID-19.
Menten AI is Reimagining Biology with Quantum-Powered Protein Design

The most interesting of the lot, in my opinion, are problems that lay in the field of biotechnology, and that use quantum computers to solve problems of protein design for use in new drugs.

We are quite fortunate to live in a day and age where anybody can obtain access to state-of-the-art technology through the internet.

Whether it’s D-Wave Systems, Google AI, or IBM’s Q System One, anybody with enough scientific curiosity, and a bit of knowledge in physics, can enjoy the immense satisfaction that you get during that moment when something really difficult suddenly starts making sense.