Modelling the Spread of COVID-19: Part 2: Model Definition
Starting the Project
When COVID-19 was declared a pandemic, I struggled to find solid answers. Like the rest of the world, I wanted to understand how this crisis would unfold. The following questions were particularly troublesome:
- What would be an accurate estimation of the case fatality ratio – or how many people would die?
- What is the correct value of the Basic Reproduction Number (R0) – or how contagious is the disease?
- Is the lockdown effective – if yes, for how long and what is the correct process for easing the restrictions?
- Will there be a second wave?
- When will heard immunity be reached?

Having some academic background in computer simulation and being a software engineer for over 15 years, I attempted to put this knowledge into practice by trying to build a model of the virus spread hoping that it would help me understand the dynamics of the disease and maybe have some insights into the aforementioned questions.
The aim of the exercise is to simulate a real world scenario where a virus is introduced into a wholly susceptible population and, under the prevailing conditions (lockdown, social distancing, shortened recovery period through a treatment, etc.), analyze the possible outcomes.
This article will attempt to explain how the model was built as well as the assumptions made and the rationale behind any key decisions. Having got no formal education in the field of medicine, biology, or infectious diseases, I was excited and thrilled to see what the results would look like and how would they compare to prevailing models in the literature.
Selecting the Programming Language
This was probably the easiest decision to be made; Python was the first contender for the following reasons:
- I have had the chance to familiarize myself with the language over a number of years
- The availability of a myriad of open source libraries for data science and data analysis makes it ideal for this type of exercise. This is extremely convenient as it would allow the developer to focus on implementing the business logic rather than developing the basic tools required to support that logic
- A high-level language such as Python allows for faster code development, which is extremely beneficial in this case.
I used MS Visual Code as the preferred IDE. The code can be found on GitLab, you can access it through this link.
Basic Considerations and Constraints
To produce meaningful results, the model needs to tick a few boxes.
1. Structure of the Physical System
The Physical System (see Part 1) can be conceptualized as a set of components which are highly connected and commonly interacting. In this context, these components will be referred to as “clusters”. Clusters are defined as a group of closely connected agents that we will call “particles”.
Individual clusters can be thought of as geographical regions (households, cities, suburbs, regions, states, countries etc.) or professional (public transport, national healthcare system, food chains, and many institutions supplying essential goods and services) sectors. Each of these clusters, as we shall refer to them, will contain a number of interacting particles with specific properties.
The below diagram shows how a system can be decomposed into clusters, and clusters into particles.

The following diagram shows an example of a physical system comprised of different geographical suburbs (in this case A, B, C, D, and E). Each suburb will constitute a cluster in the system. The interaction between the different suburbs can be seen in the existence of overlapping regions (in this case EA, AB, BC, ABE, etc.).

We can also look at the system in a slightly different way: the clusters now group people who work in the same sector. In this case, not all clusters are interconnected; the Public Transport cluster for example is only connected to schools and food chains, etc.

This representation is versatile enough to allow the simulation of different real life scenarios. Therefore, no assumptions would need to be made on the nature of each cluster; a Cluster is simply an arbitrary grouping of people, the nature of which would be irrelevant to the outcome of the simulation. What is important though would be: the number of clusters in the system, the number of people in each cluster, and the nature and strength of the interactions between one cluster and another.
2. Cluster Interactions
Clusters A and B interact with one another when particles from Cluster A come into close contact with particles from Cluster B. The result is the transfer of infection from B to A.

We define the cross-contamination from cluster B on cluster A as the total number of new infections in cluster A generated by close contact between a fraction of particles in A with a random sample from B:
represents the number of new infections in A as a result of its interaction at time n with particles from cluster B.
is the number of susceptible particles in A.
is the number of infections in B. Its calculated as the ratio of infected people in B over the population size.
represents the percentage of particles from cluster A coming into contact with particles from B and is a random number between 0 and 1. The higher this number, the stronger the interaction between A and B.
is number between 0 and 1 reflecting a measure of social distancing. Its role is to reduce the strength of the interaction between the two clusters during disease transmission.
is the Lockdown Function. If a lockdown is enforced, there is 80% chance that this interaction is severed and then
. In the remaining 20%,
. The number 80/20 have been chosen as an example but could be any percentage.
3. Cluster Properties
A cluster property that is quite relevant to this exercise is age distribution. It has now been proven that age plays a huge role in the spread and dynamics of COVID-19. This is evident through the role of children and the mortality rate observed in different age groups. In our simulations, a slightly modified version of the log-normal distribution has been used to generate a particle’s age. The distribution parameters can be modified to skew the distribution towards a more senior population.

Another important property of a cluster is its population size. Cluster size and cluster numbers in a system have a direct impact on the dynamics of the spread of the diseases when in comes to lockdowns as we shall see later on. The model presented allows the specification of the average cluster size. The actual size will be have a Gaussian distribution centered around the average with a variance of one fourth the average.
4. Particle Interactions
Two particles in the same cluster are assumed to be in close contact all the time. An infectious particle will generate R0 new infections on average for the duration of the infection. These new infections will all happen within the cluster by randomly selecting particles and infecting them. If the targeted particles happen to be immune, dead, or already infected, then nothing changes. As more people become infected, immune, or dead, R0 goes down and the disease peters out.
The number of new infections resulting from close contact of particles within a cluster can be defined as follows:
Where is the contract rate, a degree to which the virus is transmissible.
is the number of infections in cluster A at time n.
is the number of susceptible particles in cluster A at time n.
is the Social Distancing factor which is a number between 0 and 1. We have chosen three values for
based on the social distancing severity applied: 0.1 for high, 0.4 for medium, and 1 when no social distancing measures are applied.
5. Particle Properties
One particle property that is relevant to this exercise is the age. The following rules have been applied:
- If the particle is below a certain threshold
, the particle is immune
- If the particle is above a certain threshold
, the particle is susceptible
- If the particle has an age between
and
, there is a 50% chance that it might be immune.
A crucial assumption was made around the duration of the epidemic: its small enough that no major changes can be observed in the relative age groups. We have also assumed that no new particles are created for the duration of the epidemic.
6. Dynamic States of Clusters
Clusters can be healthy, immune, or infected. A healthy cluster has no infections but a substantial size of susceptible particles. An immune cluster has no susceptible particles, and an infected cluster has at least one infected particle.

A cluster starts in a Susceptible state. This is especially true in the case of a novel corona virus. It does not mean though that all the particles in the cluster are susceptible; there is a certain ratio of susceptible to immune particles so as to result in an outbreak of the virus in case one or more particles gets infected.
If a number of particles gets infected resulting in an outbreak (an initial exponential spread followed by a sharp decrease in number of new cases; the cluster’s state is now Infected. From this state, two possible scenarios could occur:
- If R0 becomes less than 1 through social distancing or lock downs, the disease peters out but the cluster is not yet immune so it returns to a state of being susceptible.
- If all the particles in the cluster are infected and either die or recover, the cluster becomes immune and remains in this state indefinitely.
7. Dynamic States of Particles
Particles are a bit more complex and can be in one of the following states:
- Healthy but susceptible to the disease. A particle can start in this state if its age is greater than 30 years old. From this state, it can move to incubating once it contracts the virus.
- Incubating: the particle has been infected but has not shown yet any symptoms and is not contagious. The particle will remain in this state for a number of days before symptoms start to show. The incubation period has been set to 5 days inline with the general consenus.
- Infectious: the particle is now contagious. In this case, it could show no symptoms (this is known as the asymptomatic case) and will therefore remain in this state until total recovery. If the particle shows symptoms of the disease, it might move into self-isolation depending on the sanitary policies applied, or immunity after a sufficient period of time has passed. During this state, virus shedding is maximal at the beginning and declines steadily until it becomes naught on complete recovery.
- Self-isolating: the particle has now moved into self-isolation. It is still infectious in 10% of its interactions. This is necessary as it has a greater resemblance to the real world scenarios. In this example, 10% is an arbitrary number; it can in fact be any number between 0 and 100%.
- Immune: the particle has recovered and is now immune to the disease. Children and adults below a certain could also start in this state. It is considered that immune particles do no contract the virus and therefore are never contagious. The recovery period is estimated around 20 days. We have used a log-normal distribution with
centered around 20 to introduce some randomness in the recovery period.
- Hospitalization: the particle may require hospitalization depending on its age. Once its hospitalized, it could either recover of die. The death rates as well as days to ICU admission were configured in the model based on data from the CDC website.
- Dead: the particle has died as a result of being infected with the disease. Death rates are directly impacted by age, and by the availability and quality of medical care.
The below diagram shows the movement of a particle between its different states.

The following diagram shows the infectiousness of the virus during the different periods.

8. Time Evolution
At each turn or epoch (representing a full day), the follows occurs:
- The number of new infections in each is calculated based on internal and inter-cluster cross-contamination.
- A random selection of particles from each cluster is infected. The state of Immune, dead, or already infected particles is not altered.
- The number of people who have completed their recovery period move to the recovered compartment.
- If self-isolation is enabled, infected particles will move to that state after a random amount of time equal to 20% of their recovery period, itself being a random number centered around 20 days.
The system’s evolution is monitored over time. The following data is captured: total number of infections, number of infectious particles at any given time, number of particles in isolation, cumulative number of particles immune or dead. In the coming parts of this series, we will run different scenarios and analyze the different measures and their potential impact on these numbers.
Further Reading
The epidemiology literature has a number of models around the topic of infectious diseases. A really simple but quite interesting model to start with is the S-I-R model. You can also check the Wikipedia page on Compartmental Models.
Models can be made as complex as desired: in this article, an SIR model with time delay is elaborated.