Stress, Load, and Performance Testing in Software and IT Systems

1. Overview

Performance is a pivotal element in non-functional requirements of IT and software system design for the following reasons:

  1. Application performance directly impacts user experience and conversion rates. Research unequivocally shows that users will navigate away from slow websites, which has recently led Google to use page speed as a ranking factor in search results. High-volume, high-speed software systems supporting service stations like trains or supermarket POS machines cannot afford long queues or turn customers back.
  2. Mission-critical IT systems require almost 100% uptime — IT system crashes or poor responsiveness can severely impact their reputation, clients, and business. Payment switches or stock trading platforms are two common examples.

Stress and load testing are now standard implementation, testing and quality assurance procedures.

This article aims to give the reader a thorough understanding of the stress, load, spike and other performance tests in software applications and systems.

We start by explaining the architecture of an online transaction processing IT system. Next, it discusses the significant players that impact performance. Then, we discuss the three types of performance testing before concluding with a guide on preparing, executing, and troubleshooting stress tests.

We hope this guide can help you achieve the best results when stress-testing your system.

3. Online IT systems

3.1 Definition

An Online IT system (OLTP for short) is a class of software applications typically deployed in multi-tier architectures and geared towards processing large numbers of transactions.

Examples of familiar OLTP systems are e-commerce websites, ATM and point-of-sale networks and switches, airline ticketing websites, Uber, and many others.

3.2 Transactions

A transaction is a set of exchanges between two (or more) parties where services are requested by the first party and provided by the others.

An example of a transaction is online shopping. In this scenario, an online user requests to purchase a consumable product or service. At least three parties in the background work together to fulfil this request.

First, the warehouse management system checks if the item is in stock, which might create a shipping order to the user’s address. The shipping company is the second party. The third is the billing software system, which ensures that funds from the user’s banking account are transferred to all service-providing parties involved in this transaction.

In 1983, Andreas Reuter and Theo Härder coined the term ACID, which stands for atomicity, consistency, durability, and isolation. These properties characterize the transaction concept initially applied to database systems.

  1. Atomicity. Because a transaction can constitute multiple actions performed by several parties, an IT system must guarantee the atomicity of transactions, i.e., either all actions succeed or none. The applied actions must be rolled back to ensure data consistency if an exception occurs.
  2. Consistency. The IT system must ensure data consistency at all times. When transactions are applied, data changes from one valid state to another.
  3. Isolation. Two transactions applied to the system concurrently should not impact each other. The result would be the same as if the transactions were applied sequentially.
  4. Durability. This feature of IT systems ensures that committed transactions remain in that state.

The outcome of a transaction can be approved or rejected. A transaction can be declined in the following cases:

  1. The request is invalid.
  2. The data integrity is compromised.
  3. The user cannot be authenticated.
  4. A system malfunction has occurred.
  5. One of the parties involved in the transaction is unable to process transactions.

3.3 Multi-tier Architecture

A multi-tier (or n-tier) architecture is an application design pattern consisting of at least one data, business logic, and presentation layer.

The chief advantage of n-tier architecture is its functional separation of duties. Development, maintenance, upscaling, or other modifications can happen parallelly by different teams without impacting the layers.

three layer software application
Three Layer Application

The figure above shows an application with the familiar three layers mentioned earlier.

The same system can be represented using its tiers (instead of layers) as follows:

four-tier software application
Four-tier Software Application System

3.4 Operation Patterns

IT systems display specific characteristics that differentiate them from other systems like Online Analytical Processing Systems (or OLAP):

  1. Short Response Times. IT systems should typically respond quickly to retain a great user experience. In contrast, OLAP systems are geared more toward manipulating large datasets and performing CPU-intensive analysis.
  2. Close to 100% Uptime. Transaction processors are usually mission-critical and need to guarantee an uptime of close to 100%. Single points of failure are eliminated with redundancy and high availability in such applications.
  3. High concurrency. The system is expected to handle a very large number of concurrent users or sessions. Load-balancing is one way of achieving such demanding rates.
  4. Multi-User Access. The system should guarantee access, authentication, and validation for many concurrent users. The system must securely hold and manipulate user data for many users.
  5. High-level Transaction Volume. IT systems nowadays must be capable of writing, holding, and maintaining massive amounts of transaction data. Databases setup for OLTP use indexing, partitioning, and other advanced disk reading and writing techniques to achieve high performance.
  6. Frequent Data Modification. A singular aspect of IT systems is the high number of data modifications. Every transaction must add at least a few records to the persistence layer in a transaction or audit logs.

4. Performance of Online Systems

Various parameters can impact the performance of IT systems. In this section, we discuss some of the most influential ones.

4.1 Infrastructure

A few decades back, data centres consisted of racks, servers, and fibre optic cables to connect them. Applications ran on virtual machines that shared server resources amongst themselves.

Containerization, microservices, and cloud services have now changed the landscape dramatically. Despite running behind abstract layers of hosting, partitioning, and resource-sharing tools, infrastructure remains one of the first factors to consider when tuning performance.

4.2 Application Architecture, Design, and Technology Stack

The tradeoff in every programming language is usually between speed and productivity; high-level languages increase productivity by providing built-in libraries that perform labour-intensive tasks such as memory management on behalf of the developer.

The downside is that these built-in libraries are not optimal for performance and are easily outmatched by manual operations.

Our view, however, might be a bit controversial; because of the record cheap cost of hardware compared to the development effort, the technology stack does not seem to be an issue these days.

Prioritizing mature technology stacks that favour productivity and a gentle learning curve can edge the competition.

An innovative and modern design can compensate for the slow technology stack. In most cases, however, if you perform stress tests, the application has probably already been developed and tested. There is little you can do in terms of technology stack selection. It’s essential to recognize the difference in performance between high and low-level stacks.

4.3 Application Parameters

Enterprise and industry-grade applications typically have a whole set of parameters for performance tuning. Unfortunately, these parameters do not usually get the same attention as other system parameters that impact functionality.

If you miss the stress test or do not take it seriously, your application parameters will not be optimized for the best performance.

5. Stress Testing Profiles

This section describes the five different performance test profiles that can be used.

A profile is defined by:

  1. Duration: or how long the test will be run.
  2. TPS: This represents the number of Transactions per Second that will hit the system.

5.1 Stress Test

5.1.1 Objective

The Stress Test ensures the system can handle above-average transaction loads for short durations, typically 1-2 hours.

In real-world scenarios, such variations in load can occur in peak times such as holidays or end-of-month periods. In this scenario, the system is expected to handle the load without affecting response times or approval rates.

5.1.2 Setup Profile

This type of test can be set up as follows.

ParameterValue
Duration1-2 hours
TPSVariable, matches expected production load at peak times.
Users/SessionsVariable, matches expected numbers observed at peak times.
Stress Test Setup

5.1.3 Metrics and Success Criteria

The following metrics and success criteria are observed. The expected values, results, and any figures in the table below are just for illustration purposes and will vary from one system to another.

Metric/Success CriteriaExpected Value/Result
Average Response TimeLess than 3 seconds
95% of Response TimesLess than 5 seconds
Approval ratesGreater than 99.99%
System crashes, exceptions, failuresNone
Stress Test Metrics and Success Criteria

5.2 Load Test

5.2.1 Objective

The Load Test ensures the system can handle average transaction loads for long durations, typically 2-3 days.

The system must run 24/7 for months and years in real-world scenarios without needing a restart. Typical exceptions you might observe when running an IT system for prolonged durations can be memory leaks, disk filling up, or backup failures.

5.2.2 Setup Profile

This type of test can be set up as follows.

ParameterValue
Duration2-3 days
TPSVariable, matches expected production load at peak times.
Users/SessionsVariable, matches numbers observed on production in nominal times.
Load Test Setup

5.2.3 Metrics and Success Criteria

The following metrics and success criteria are observed.

The expected values, results, and any figures in the table below are just for illustration purposes and will vary from one system to another.

Metric/Success CriteriaExpected Value/Result
Average Response TimeLess than 1.5 seconds
95% of Response TimesLess than 3 seconds
Approval ratesGreater than 99.99%
System or process crashes, exceptions, failuresNone
Uptime100%
Maintenance activities (purging tables and logs files, table partitioning, reporting, and monitoring)Successful
Load Test Metrics and Success Criteria

Notes:

  1. When running in normal modes, you would expect response times to be shorter than peak times, hence the difference between the numbers in this table and the previous one.

5.3 Spike Test

5.3.1 Objective

The Spike Test ensures the system can handle sudden surges in transaction loads occurring for short durations. Once the spike ends, the system resumes operations in nominal mode.

Higher than-average spikes in transaction load occasionally occur in real-world scenarios. A long-running report may cause a bottleneck that gets unclogged once the report is completed. IT systems should respond to these spikes in demand without any incident.

5.3.2 Setup Profile

This type of test can be set up as follows.

ParameterValue
Duration1 hour
TPSVariable, matches expected production load at normal times.
Users/SessionsVariable, matches numbers observed at normal times.
Spike Count5 distributed randomly over the test period
Spike Height2x the peak TPS
Spike Test Setup

5.3.3 Metrics and Success Criteria

The following metrics and success criteria are observed. The expected values, results, and any figures in the table below are just for illustration purposes and will vary from one system to another.

Metric/Success CriteriaExpected Value/Result
Average Response TimeLess than 1.5 seconds
95% of Response TimesLess than 3 seconds
Approval ratesGreater than 99.99%
System or process crashes, exceptions, failuresNone
Spike Test Metrics and Success Criteria

5.4 Breaking Point Test

5.4.1 Objective

The Breaking Point Test is used to determine the system’s breaking point. Although not a very common test, it can be helpful to identify the point beyond which the system stops responding.

Another advantage of running such a test would be identifying the first one or two components that will fail and the consequences of those failures in terms of user experience, for example. This can help communicate valuable messages to end-users.

The Breaking Point Test is more of an academic exercise as you typically run your production systems well below the breaking point.

5.4.2 Setup Profile

This type of test can be set up as follows.

ParameterValue
DurationVariable, until the system stops responding
TPSVariable, ramps up until the system stops responding
Users/SessionsVariable, ramps up with TPS
Spike Test Setup

5.4.3 Metrics and Success Criteria

There are no metrics associated with this type of performance test. A few observations, however, may apply.

  1. The definition of a breaking point may vary. For example, we can define the breaking point as that beyond which 50% of transactions do not receive a response.
  2. Ideally, the breakpoint should be as far from the expected peak times as possible.
  3. The observations collected from running the test multiple times should be consistent.

5.5 Failover Test

5.5.1 Objective

Failover Tests are performed to ensure a cluster can handle the additional load if a node in that cluster suddenly goes down.

Four-tier Application with Webserver Cluster
Four-tier Application with Webserver Cluster

Clusters are, by definition, composed of several nodes, and a load balancer distributes the incoming requests evenly among the different nodes.

Nodes in a cluster can be brought down either due to failures or maintenance purposes, and additional nodes could be added if extra processing power is required.

The cluster is expected to be dynamic and flexible, and the purpose of the Failover Test is to ensure that that is the case.

5.5.2 Setup Profile

This type of test can be set up as follows.

ParameterValue
DurationUsually completed in less than an hour, depending on scenarios to be tested.
TPSNominal TPS rates
Users/SessionsNominal user sessions
SetupSystem setup in cluster mode
Node ManagementNodes can be brought down or up on demand
Failover Test Setup

5.5.3 Metrics and Success Criteria

The following metrics and success criteria are observed. The expected values, results, and any figures in the table below are just for illustration purposes and will vary from one system to another.

Metric/Success CriteriaExpected Value/Result
System crashes, exceptions, failuresNone (aside from the controlled ones)
Response TimesFollow nominal patterns.
Data InconsistencyRare. Happens only for transactions on the fly. The system offers manual correction utilities for invalid values.
Approval ratesClose to 100%. Losing inflight transactions when a node is shut down is expected.
Load HandlingDepending on the specific requirements, the cluster should comfortably bear the nominal load even when one of its nodes is down.
Load BalancingLoad balancers can detect fallen nodes quickly and must automatically reroute transaction requests to active nodes in the cluster.
Spike Test Metrics and Success Criteria

6. Guide to Running Stress Tests

This section will provide a detailed guide on preparing, executing, and troubleshooting a stress test. We will also give some guidelines on preparing a stress test report.

6.1 Prerequisites

Before conducting performance tests, the following prerequisites should be ready.

6.1.1 Environment and Infrastructure

The test environment, the system infrastructure, and how it’s set up should be identical to production. The architecture, OS platforms, and deployment methods (cloud, local) must be similar if you want maximum reliability from the data you get. Even the network layout and connections between the different tiers in the system should be the same.

Due to the setup’s complexity, the results you get in one environment might not necessarily hold in a different one. The ideal setup would be as close as possible to production.

6.1.2 Application Version

The application version you use in a performance test should be close to the one used on production. This requirement means that the ideal timing of performance tests should be close to the end of the UAT. You also want the application settings and parameters to be similar to production.

6.1.3 User Data

The user data should be prepared in amounts and compositions similar to production to prevent caching and record-locking effects.

Most layers in an IT system use caching to enhance performance. If your test environment contains small amounts of data, queries and query results will be retrieved from the cache instead of files on disk. Small amounts of user data will exaggerate the performance impact of caching, and your results will look rosier than they would be.

Another downside to small user datasets is high concurrency in IT systems. Tools that generate transaction requests select user data randomly from available datasets. If the datasets are small enough, the chances that the same user is used for two concurrent transactions will be high. Simultaneously, using the same data leads to record-locking, bringing performance down.

6.1.4 Production Statistics

Statistical data is collected before performance tests to generate representative mixes of production transaction types and numbers. Any expected future growth should be factored in.

Complex business logic applies to services of different types, with each transaction type taking separate paths in the system and consuming distinct amounts of resources.

To obtain meaningful results from performance tests, the transaction mix and the TPS rates should constitute a representative sample of what you usually observe in production.

To get this right, gather sample data from a typical day in your business life and make a statistical snapshot of what happened. Count the different transaction types and replicate the significant contributors. You don’t necessarily need to represent every service your system offers, just the ones with the most considerable volume.

6.1.5 Transaction Generation

Use transaction generation tools that can create accurate representations of production transactions. This requirement applies to data fields, values, transaction type mix, cryptographic data, and message transfer protocol (HTTP or HTTPS, for example).

If you use dummy data, you will inevitably turn off validation on the IT system, making the performance test look less like the production you want it to be.

6.2 Preparation

Before you kick off a performance test, here are a few things to consider.

  1. Start from a fresh environment for every run. Using containers and DevOps tools makes this task easy to accomplish.
  2. Make sure nobody else is using the system.
  3. Upgrade the OS, web server and application, and patch the database if necessary.
  4. Save the new image to your repository.

6.3 Execution

When executing the performance test, ensure you can immediately determine if something seriously wrong happens. Sophisticated stress testing tools or monitoring applications (like Zabbix or Nagios) should alert you when certain events get triggered.

6.4 Troubleshooting

By far, the most encountered issues in performance tests will be slow response times and an inability of the system to handle any additional load. Crashes, failures, and exceptions may also happen, but our experience says they will be less frequent.

In this subsection, we will focus on some of the most influential factors that can impact the performance of a system under load and which inevitably produce slowness in responses.

6.4.1 Computing Power

Today, the IBM Power series server can be fitted with 240 CPU cores. System admins can share these CPUs among tens or hundreds of virtual machines (VMs), and it is possible to have a VM running on a fraction (like 0.1, for example) of a CPU. If your performance tests experience issues, check the number of CPUs allocated to your VM.

Because CPU power can be shared (you may be allocated 4 CPUs, but they will be shared with other VMs should these need them), check the system parameters to verify how much CPU power you are entitled to. Ensure they match the vendor recommendations for the load you are running.

Tips:

  1. Use the top or htop commands for a real-time view of CPU and RAM usage and processing info.
  2. Use the lscpu command to list your VM’s CPU details.
  3. The first thing to look for when running a stress test is the CPU usage when the application is down. If nothing else is consuming your resources, which is what you would expect, you should see a tiny number in the order of a few percentages.

6.4.2 Memory

RAM should rarely be an issue. At any rate, you should always check that the available RAM matches the vendor recommendations for the amount of load you are running.

Tips:

  1. Memory caching makes it tricky to spot how much memory is being used. You will notice that your memory is full using the top command, but the application runs fine. The OS caches data in RAM but will free it up when required.

6.4.3 Disk I/O Speed

Disks come in various specs, and I/O speed will directly impact the performance of your application. For example, a good system admin will install and run the database filesystem on fast disks.

Sometimes, faster disks may not be the solution, and the application might write unnecessary information (such as extensive log files) to the disk. This problem can be easily solved by turning application logging off or lowering the log levels to an acceptable level.

Tips:

  1. Use disk I/O commands to inspect I/O usage details. If I/O is high, consider lowering unnecessary logging activities or upgrading to faster disks.

6.4.4 Application Parameters

Almost all enterprise applications have various parameters that can be tuned for better performance. Most of these parameters are autotuned by the application without any manual intervention.

What these parameters do depends wildly on how the application is designed. For example, let’s assume that the application uses a client/server model. The clients place all requests on a queue managed by a messaging system (like IBM MQ), and servers pick up those messages from the queue, process them, and place the response back in the queue.

One available option here is to increase the number of concurrent server processes.

Another option is to drop any further message beyond a certain maximum. The process is called throttling, and it prevents the system from choking.

Tips:

  1. Thoroughly understand how your application works vis-a-vis performance. The application design will significantly impact what can be done to improve performance.
  2. Familiarize yourself with the application parameters related to performance. You will probably find those in the user manuals.

6.4.5 Web Servers and Database Systems

The application autotunes most performance parameters in enterprise applications like databases and web servers, and only in rare cases would you ever need to adjust yourself manually.

What you can do, however, is ensure:

  1. Table indexes are active and healthy
  2. Queries running on large datasets are perfectly tuned for performance
  3. Large tables are partitioned
  4. Old data is backed up and purged
  5. Heavy reports are run after hours or on a different database server.

Tips:

  1. Familiarize yourself with the database or server parameters directly related to performance.
  2. Monitoring resource consumption (RAM, CPU, Disk and Network I/O) is always a good place to start if you suspect any issues on those platforms.

6.4.6 Network Latency

Network latency is not much of an issue today with ultrafast network devices on the market. However, problems can still happen if one application tier is on a different or slow network.

Tips:

  1. Ping is an ultrasimple but potent tool to inspect network latency. Any roundtrip time larger than a few milliseconds usually shows high latency.
  2. When security rules allow, ensure all your application tiers are running on the same LAN without any routers or firewalls in the middle.

7. Stress Test Report Template

Create a report to share your stress test results with management or other stakeholders. Use our guidelines and best practices for writing technical documentation.

Below is a template to get you started with the stress test report.

  • Overview or Introduction:
    • It can be easily read and understood by anyone.
    • Allows the reader to determine whether this document is helpful/accessible for them.
  • Executive Summary
    • Provides some context and background on why this performance test is being conducted.
    • Write a summary of the performance test results.
  • Requirement Details:
    • List the metrics that will be observed and their target values
    • Provide some background on why these values were chosen (for example, current production load and future growth)
  • Assumptions:
    • In case you have made any decisions that might need justification.
  • Stress Test Setup:
    • Describe the lab environment used for running the test. This includes the hardware, application, and configuration data. Most applications will allow you to export their configuration data. Attach the export to the report for future reference.
    • Explain how the user data was generated and the tables populated in case you have used any data migration tools.
    • Provide information on the setup of any tools used to generate the transaction requests. This includes the transaction types and mix, the user data, and the performance test profile.
  • Performance Test Execution Summary:
    • Describe the performance test execution and whether any problems were encountered and resolved.
  • Stress Test Results:
    • Include all data, graphs, and charts to explain the results obtained.
    • Add any raw files as attachments or tell the reader where to find them.
    • Provide ample analysis of the results.
  • Recommendations:
    • Any system changes that need to be applied are listed in this section.
    • Link the changes to the observations and analysis from the previous section.
  • Risks and Mitigation:
    • Document any potential risks arising from the recommended changes, how to monitor the changes, and how to mitigate any risks.
  • Appendices:
    • Additional information could be too detailed or irrelevant to the topics discussed. It can be presented for completeness and clarity.

8. Final Words

Performance tests are part of the Testing and QA activities. They require good technical knowledge of IT concepts, applications, and tools.

Some final advice on performance tests.

  1. Keep your performance test environment ready.
  2. Use containerization and automation to your advantage.
  3. Include stress testing in your DevOps pipelines if feasible.
  4. Use stress testing to gauge any performance changes to your system after significant changes or upgrades.
  5. Do not substitute performance tests for other types of testing, such as regression or functional tests; they are designed for different purposes and do not provide the same coverage.

Leave a Reply

Your email address will not be published. Required fields are marked *