Software Application Design — Part 2: Creating a Great User Experience

1. Introduction

In today’s complex environments, many layers come between the end user and the developer, especially in medium and large software organisations. It has become enormously challenging for developers to distinguish between genuine business needs articulated by customers and what the application managers, product managers, and system architects want.

To make things even worse, the end user’s requirements, vital for their business, are sometimes eclipsed by divisional, departmental, or personal agendas, feuds, idiosyncracies, and political ambitions.

All these deviations from the ultimate goals, which are providing top-value services at competitive prices, increasing the organisation’s value proposition, and achieving operational excellence, manifest themselves in the product’s design, cost, maintenance, and profitability.

This article will discuss the fundamentals of software application design that allow for a great user experience to emerge. It aims to help developers and software engineers understand what’s essential to the end user and how systems can be designed to suit end-user needs.

Let’s start with a few definitions, but before that, let’s look at what the series covers.

Application Design Series

Part 1: A New Paradigm for Thinking About Design

Application design, architecture, classic design models, MVC

Part 2: What End Users Want

User-friendliness, accessibility, security, aesthetics, performance

Part 3: A Developer’s Perspective

Modularity, loose coupling, proven design patterns

Part 4: Architecture

Preserving architectural integrity

Part 5: Application Quality Assurance

Extension, Maintenance, Performance

Part 6: A Product Owner’s Guide to Application Design

Extension, Maintenance, Performance

Part 7: A Day in a Product Manager’s Life

Roadmaps, profitability, user satisfaction

Part 8: The View from Operations

Stability, performance, troubleshooting, maintenance

Operational Excellence and the Structure of Software Development and Delivery

2. Defining User Experience

2.1 What Is Meant by End-Users

In software applications, the term “end user” refers to the ultimate consumer or beneficiary of the software product. An end user is an individual or entity that interacts directly with the software to perform specific tasks, achieve objectives, or obtain desired results.

The end user is typically the person or entity for whom the software is designed to provide value and functionality. The ultimate objective of acquiring such software is solving a business need.

End users can encompass various individuals with varying technical expertise and specific needs. They may include but are not limited to, customers, employees, or any other stakeholders who work with the software to get their jobs done.

Understanding end users’ preferences, expectations, and usability requirements is crucial in designing and developing software applications to ensure user satisfaction and the product’s overall success.

2.2 What Is a Software Service

In software applications, a “service” refers to a modular and self-contained unit of functionality that performs a specific task or set of these. Services in software applications are often designed to operate independently, providing a well-defined and encapsulated functionality that can be utilized by other parts of the application or external applications through defined interfaces or APIs (Application Programming Interfaces).

The use of services allows for a modular and scalable approach to software design, where different components or modules of an application can interact with each other by invoking the services they provide.

These services can encompass various functionalities like:

Data processing and manipulation
User profile management, such as maintaining user details, authentication data, and access rights
Communication with external systems, themselves providing services to other services

2.3 What Is User Experience

Improving user experience requires changes in the front-end and back-end of an application, covering various areas such as design, aesthetics, usability, availability, reliability, and performance.

In software applications, “user experience” (often abbreviated as UX) refers to a person’s overall experience while interacting with the software. It relates to every aspect of the user’s interaction, including the interface design, usability, accessibility, responsiveness, and overall satisfaction when using the system.

Critical components of user experience in software applications include:

Usability (functionality and user-friendliness)
Performance (response times and throughput)
Visual design and aesthetics
Reliability and stability

A positive user experience is crucial for the success of software applications, as it can influence user satisfaction, retention, and the overall perception of the software product.

Designing while focusing on user experience involves understanding user needs, conducting usability testing, and iteratively refining the interface and interactions to meet those needs effectively.

In the following paragraphs, we will explore influential user experience factors in more detail.

2. Response Time and Throughput

2.1 System Response Time

Response time is how long a user waits before a request is completed. This metric is tightly coupled with performance and is crucial for creating a great user experience, especially in web applications.

For example, Google ranking algorithms include page speed as a ranking factor in deciding which pages to show in their results. Their rationale is based on empirical evidence showing that a slow-loading page will drive most users away. While enterprise application users may not be able to abandon a slow application, it might impact their ability to do their jobs efficiently.

2.2 System Throughput

Another performance metric closely related to response time is throughput.

In software applications, throughput refers to the rate at which a system or a specific component can process and deliver a certain amount of work over a given period. It measures the software’s efficiency and capacity to handle and complete tasks, often expressed in transactions, processing cycles, or other relevant units.

Throughput is a key performance metric that assesses the ability of a software application to handle a particular volume of workloads within a specific time frame.

Throughput is impacted by:

Algorithm efficiency
Hardware performance
The overall system architecture

A great user experience demands decent response times, even during peak load times. But what about varying loads? Is it acceptable to have a dip in the response time on holidays, weekends, or when the number of new users increases significantly? Absolutely not. This opens up a new topic: the scalability of software systems. Scalability requires software systems to handle progressively more load without dramatically increasing cost.

Ideally, we want our software systems to have low response times, low costs, and constant throughput despite occasional but significantly varying loads.

2.3 Designing Systems for Low Response Times

System response times can be improved through the following:

Faster algorithms
Better caching
Specialized hardware
Database optimisations
Profiling and optimisation

2.3.1 Faster Algorithms

The speed of algorithms directly influences the response times of software systems. Here are vital factors to consider when examining the impact of algorithm speed on response times:

Computational Complexity: Algorithms with lower time complexity execute faster than those with higher time complexity. The Big O notation is commonly used to express the upper bound on the growth rate of an algorithm’s execution time versus its input size. You typically want to have polynomial (rather than exponential) relationships between input size and number of operations. Integer factorisation and matrix multiplication algorithms are prime examples of operations whose performance significantly varies depending on the algorithm used.
Data Structures: The choice of data structures can significantly affect algorithm performance. Using appropriate data structures, such as hash tables or balanced trees, can lead to faster data retrieval and manipulation, thereby improving response times.
Parallelism and Concurrency: Leveraging parallel processing or concurrency can enhance the overall speed of software systems. Algorithms designed to take advantage of multiple processors or threads can perform computations concurrently, leading to quicker responses. Parallelism, however, introduces significant complexity in the handling and orchestrating of multi-threaded tasks, making troubleshooting a lot more complicated.

Top 15 Algorithms Every Software Engineer Must Know

2.3.2 Better Caching

Caching is a technique used in computing to store and quickly retrieve frequently accessed or computationally expensive data to improve overall system performance. Caching helps reduce the time it takes to access data by storing a copy of the data in a location (RAM) that can be accessed more quickly than the source (filesystem).

Caching can be applied in the following cases:

Web Browsers: Web browsers store previously downloaded web pages, images, and other resources using browser caches. When a user revisits a webpage, the browser can retrieve these resources from the local cache, reducing the need to download them again from the internet.
Databases: Database systems often store frequently accessed query results or records in a cache. This reduces the need to execute complex queries or fetch records from disk whenever a request is made.
Operating Systems: Operating systems use caching to store frequently accessed files and application data in memory (RAM). This helps improve the speed of file access and application execution.
Content Delivery Networks (CDNs): CDNs use caching to distribute content across multiple servers in different geographical regions. Cached content can be served from a server closer to the user, reducing latency and improving content delivery speed.
Application-Level Caching: Some applications implement caching at the application level to store intermediate results or frequently used data. This can include caching the results of computationally expensive operations to avoid redundant calculations.

The basic principle behind caching involves keeping a copy of data in a location that allows quicker access than fetching the data from its sources, such as slower storage or a remote server. Here’s a simplified overview of how caching works:

Request for Data: The process begins when a system or application requests specific data. This could be a web page, a database query result, or any other piece of information.
Check if Data is in Cache: Before accessing the source (e.g., disk, database, server), the system checks if the required data is already stored in the cache. If the data is present, this is known as a “cache hit.”
Cache Hit: If a cache hit occurs, the system retrieves the data directly from the cache.
Cache Miss: If there is a cache miss (i.e., the required data is not in the cache), the system fetches the data from the source and stores a copy in the cache for future use. This helps in speeding up subsequent requests for the same data.
Eviction (if necessary): Caches have limited space, so when the cache is full and new data needs to be stored, a decision is made on which existing data to remove. This process is governed by a replacement policy, such as Least Recently Used (LRU) or First-In-First-Out (FIFO).
Use of Cache Keys: To efficiently manage cached data, each piece of data in the cache is associated with a unique identifier or key. The key is used to look up and retrieve the data quickly.
Cache Coherency: In systems with multiple caches, cache coherency mechanisms ensure that all caches have consistent and up-to-date copies of shared data. Protocols like MESI (Modified, Exclusive, Shared, Invalid) help maintain coherence.
Write Policies: Caching systems often have detailed policies determining how writes are handled. For example, write-back caches update the cache first and then write to the main memory, while write-through caches update both simultaneously.
Expiration and Invalidation: Some caching systems implement expiration policies, where cached data is considered stale after a certain period and needs to be refreshed. Invalidation mechanisms ensure that cached data is invalidated when the corresponding data in the source is updated.

Book Review: The Annotated Turing — A Guided Tour through Alan Turing’s Historic Paper on Computability and the Turing Machine

2.3.3 Specialized hardware

Specialized hardware is designed to perform specific operations more efficiently than general-purpose hardware. Load balancers, Hardware Security Modules (HSMs), and Graphical Processing Units (GPUs) are examples of specialised hardware. While none of these can run universal programs like personal computers, they boast much higher performances than the latter with load balancing, cryptographic functions, or graphical operations.

Hardware Security Modules (HSMs) are specialized devices that manage digital keys, perform cryptographic operations, and provide a secure key generation and storage environment. HSMs offload cryptographic operations from the central processor, reducing the burden on the software. They accelerate cryptographic functions, ensuring faster execution compared to software-based implementations.
Graphics Processing Units (GPUs) are specialized processors designed to handle parallel processing tasks, primarily associated with rendering graphics in applications like video games and simulations. GPUs excel at parallel processing, making them ideal for computationally intensive tasks. Offloading parallelizable tasks to GPUs can lead to substantial performance improvements in software systems, particularly those involving complex simulations or data processing.

Specialized hardware provides improved performance at a cost:

Pros

Improved performance in specialized tasks
Improved functionality (e.g. security)

Cons

Potential vendor lock-in
Additional integration efforts
Software optimisation to take advantage of hardware capabilities

However, specialized hardware comes at a cost:

First, they can lead to vendor lock-in and may have substantially higher costs than commodity hardware.
Second, specialized hardware may require specific integration efforts in software design.
Third, software should be optimized to take full advantage of the capabilities of specialized hardware.

In some cases, as in payment switches, HSMs are mandated by regulatory rules (PCI). In other cases, where specialized hardware is not mandatory, carefully considering integration challenges and optimization opportunities is crucial to maximize the returns on their investment.

2.3.4 Database Optimisation

Database operations are pretty resource-intensive, and the impacts of poorly written SQL statements or huge non-indexed tables are usually quick to manifest themselves. When performance slumps on production or during a load test, your best bet would be to look for long-running queries on your database.

Database optimization is, therefore, a critical aspect of system design, aiming to enhance database operations’ performance, efficiency, and reliability. Here are vital considerations for optimizing databases in system design:

Indexing involves creating data structures to improve the speed of data retrieval operations on a database. Well-designed indexes can significantly enhance query performance. However, over-indexing can lead to increased storage requirements and slower write operations.
Normalization and Denormalization: Normalization means organising data to eliminate redundancy and dependency, while denormalization re-introduces redundancy to improve query performance. The choice between normalization and denormalization depends on the use case and query patterns. For example, if you are doing big data analytics, then a denormalized table in memory is one way to go. On the other hand, if you are doing transaction-level processing, then normalization works best.
Query Optimization is an exercise that must become second nature to software developers, as a poorly written query can severely impact performance. There are various ways to optimize queries, such as ensuring queries hit indexes for efficient data retrieval, avoiding SELECT * and retrieving only necessary columns, thus reducing query processing time. Developers also must use appropriate join strategies and consider denormalization for performance gains.
Caching: Use prepared statements or stored procedures where applicable, as prepared statements can be cached more effectively by the database, improving performance.
Partitioning is dividing large tables into smaller, more manageable pieces. Its benefits lie in improving query performance by reducing the amount of data to be scanned. However, partitioning strategies should align with query patterns and access patterns.
Database Sharding means distributing a database across multiple instances or servers. Database sharding enables horizontal scaling, distributing the load across multiple database nodes. However, sharding introduces complexity in managing distributed data and ensuring data consistency.
Optimizing Storage and File Structures can be done through data compression and by choosing the right storage engine. Compressing data can reduce storage requirements and improve I/O performance at increased processing costs, while different storage engines have varying performance characteristics. Choose the one that aligns best with your workload.
Regular Maintenance includes index rebuilding and updating the database table statistics. Periodically rebuild indexes to optimize their performance. Keep database statistics up-to-date to help the query optimizer make informed decisions.

Effective database optimization in system design involves thoughtful schema design, indexing, query optimization, and advanced techniques like caching and partitioning. Regular monitoring and maintenance are crucial to adapting to changing workloads and ensuring sustained performance improvements.

2.3.5 Profiling and Optimisation

Profiling and optimization are crucial aspects of system design, aiming to enhance performance, efficiency, and reliability. How does that work?

In a series of iterative approaches, software engineers start by running performance tests (or using profiling tools) to identify bottlenecks, resource usage, and areas for improvement. Optimization steps are then taken to remove the bottlenecks and scale where applicable.

Running performance tests before major database, server, or software updates is very common and should be relatively quick with the tools and infrastructure at our disposal today. Performance tests allow immediate identification of performance regression issues and are critical to ensuring a smooth customer experience. Otherwise, performance issues might cause outages as software updates are rolled back.

Stress, Load, and Performance Testing in Software and IT Systems

2.4 Horizontal and Vertical Scaling

Sooner or later, you must upgrade your systems when the number of users passes a certain threshold and response times become unbearably high. As a first step, you can increase the machine resources your application is running on. This is called vertical scaling.

Vertical scaling means more CPUs, larger memories, and faster and more extensive storage. At some point, though, the machines will be at their technological limits, and vertical scaling will not be an option anymore. This is where horizontal scaling comes in.

Horizontal scaling involves adding more machines (or nodes) to a system to distribute the load across multiple instances. You can have multiple load balancers, web, application, and database servers, each handling a portion of the load.

With a distributed design, we have:

Improved redundancy
Cost-effective design
Easier maintenance
Complex architecture

The nature of the application and its requirements influence the choice between vertical and horizontal scaling. For example, stateless applications are often more amenable t

Principles of Operational Excellence in Software Development

3. Up-Time and Availability

3.1 Service Availability

Mission-critical systems typically require nearly 100% up-time. Think of hospitals, supermarkets, or train stations where people try to pay with their bank cards but cannot because the system is down.

It is typical for payment, mobile, and energy networks to serve a significant portion of the market, perhaps up to 20-30% in some cases. Even if the payment network is down for less than an hour, massive damage will be caused simply due to the number of people and businesses that depend on its service to complete their transactions.

Customer experience with software systems is therefore highly dependent on its availability. With today’s technology, it is possible to keep a system running 99.999% of the time, as is typical of most Service Level Agreements (SLAs) today.

3.2 Designing Software Applications for Availability

I remember the buzz it made when one of the global payment switch providers announced it was going to support an Active/Active configuration for its payment switches, with two active nodes, two load balancers, and all the relief that this would bring to banks and payment processors running their payment switch. The complexity that had to be paid to achieve this was acceptable, provided it slashed service downtimes, providing a better customer experience.

Designing a software application for minimal downtime or high availability involves implementing strategies and architectures that minimize the impact of failures and ensure continuous service availability. Here are fundamental principles and practices to achieve high availability:

3.2.1 Redundancy and Fault Tolerance

Redundancy and fault tolerance can be accomplished through distributed architecture. Design the application with a distributed architecture, spreading components across multiple servers or locations to reduce the impact of a single point of failure. Also, Introduce redundancy for critical components, such as load balancers, databases, and application servers, to ensure that backups can take over in case of failure.

3.2.2 Load Balancing

Use load balancers to distribute incoming traffic across multiple servers. This improves performance and ensures that others can handle the load if one server fails or needs to be maintained.

Load balancers operate at different OSI model layers, with Layer 4 (Transport layer) and Layer 7 (Application layer) being the most common.

Layer 4 Load Balancing

Layer 4 Load Balancing operates at the transport layer, focusing on routing decisions based on network and transport layer information. Layer 4 Load Balancing utilizes information from the IP header and transport layer protocols (TCP, UDP) and commonly performs load balancing based on IP addresses and port numbers. This type of load balancing is suitable for distributing traffic evenly across servers but lacks awareness of the application layer context. As we shall see in a second, routing based on context allows for smarter, more sophisticated load-balancing techniques.

The pros of this approach are faster processing (as it doesn’t inspect the application layer) and effectiveness for simple, stateless protocols. Its disadvantages lie in its limited understanding of application-specific requirements, which may not be suitable for scenarios requiring more granular control over traffic.

Layer 7 Load Balancing

Layer 7 Load Balancing operates at the application layer, making routing decisions based on data from the application payload. This works by analyzing the content of the message, making it more application-aware. Layer 7 Load Balancing can distribute traffic based on specific characteristics such as URLs, cookies, or HTTP headers, providing better support for applications with complex routing needs.

For example, imagine an online shopper who has already filled their cart with some products but decides to refresh their browser for some reason. In this case, it makes for a much smoother experience if their request is rerouted to the server where their cart is maintained and their session is still active instead of asking them to log in again and redo their shopping list.
Another example is routing based on server load. As not all requests require the same amount of server work, some servers will be busier than others, and a better load-balancing technique, in this case, would consist of routing new requests to the least busy servers instead of round-robin or random assignments.

Although more logic would need to be implemented on the load balancer, making for a more complex solution, the overall solution provided by Layer 7 Load Balancers will be more efficient.

3.2.3 Database Replication

Database replication, or database clustering, maintains multiple synchronized copies of the database on different machines. This setup enables failover to a standby database in case of a primary database failure. All major database engines provide replication off the shelf.

3.2.4 Automated Monitoring and Alerting

With many servers, applications, and platforms working in concert to provide the services users want, continuous monitoring is pivotal in the early detection of issues. The monitoring and alerting will be discussed thoroughly in the article on operations.

3.2.5 Rolling Deployments and Blue-Green Deployments

When running large IT systems, it is beneficial to implement rolling deployments, where updates are gradually applied to different system parts while maintaining overall service availability. Avoid deploying multiple changes simultaneously, as this reduces your chances of quickly detecting which change created the issue when things go wrong.

Also, a duplicate environment (green) that can seamlessly switch to the production environment (blue) while maintaining the latter minimizes downtime during updates. This is one significant advantage of distributed systems where one server can be brought down for maintenance at a time when the others can easily handle the load.

3.2.6 Graceful Degradation

When the availability of your services is at the mercy of third-party providers (telecom companies, payment gateways) or legacy systems prone to downtimes, you need a backup plan, which can come in two flavours:

Feature Flagging: Use feature flags to enable or disable specific application features. This allows for graceful degradation by turning off non-critical features during high-impact incidents. For example, high logging levels can consume plenty of resources, and you might want to turn logging off during peak load times.
Fallback Mechanisms: Implement fallback mechanisms so that, if a particular service or feature fails, the system can gracefully degrade to a simpler or cached version to maintain partial functionality. For example, an online shopping application paired with a third-party delivery service provider can have a simple but effective fallback mechanism. A simple pricing model can be applied if the delivery service booking system is unavailable.

3.2.7 Stateless Architecture

Design services to be stateless, where each request from a client contains all the information needed for processing. Stateless services make distributing requests across multiple servers and recovering from failures easier.

There are use cases, however, when stateless architecture is not very user-friendly. To use the online shopping application as an example, a user with a non-empty cart would be pretty frustrated if a connection problem means they must restart their shopping for that evening with an empty cart.

3.2.8 Cloud Services and Multi-Region Deployment

As a software business owner developing and operating your platforms, you probably are better off outsourcing infrastructure problems to specialized service providers who can do a better job than your teams at a lower cost. With the time and money you save, you can focus on developing your product and business.

This outsourcing is aided by cloud platforms becoming increasingly viable and affordable, distributed across the world’s regions.

Cloud Platforms: Leverage cloud services that offer high availability features, such as load balancing, auto-scaling, and distributed data storage.
Multi-Region Deployment: Deploy the application in multiple geographic regions to provide redundancy and improve resilience against regional outages.

3.2.9 Disaster Recovery Planning

Insurance is not vital until a disaster strikes. Disaster recovery planning in software businesses is crucial for ensuring business continuity and minimizing the impact of unforeseen events.

Here are some fundamental principles to guide effective disaster recovery planning:

Risk Assessment: Conduct a comprehensive risk assessment to identify potential threats and vulnerabilities. Understand the specific risks your software business may face, such as hardware failures, data breaches, natural disasters, or cyberattacks.
Business Impact Analysis (BIA): Perform a BIA to determine the potential impact of disruptions on critical business functions. Prioritize systems and processes based on their importance to the business, enabling a focused recovery strategy for essential functions.
Recovery Time Objective (RTO) and Recovery Point Objective (RPO): Define RTO (the maximum acceptable downtime) and RPO (the maximum acceptable data loss) for each critical system. Establish the recovery goals, allowing for a more precise disaster recovery plan tailored to business needs.
Data Backup and Redundancy: Implement regular data backups and consider redundancy in critical systems. Ensure that data is backed up consistently and can be quickly restored. Redundancy provides an additional layer of protection against system failures.
Documentation and Communication: Maintain comprehensive documentation of the disaster recovery plan, including roles and responsibilities. Clear documentation aids in effective execution during a crisis. Establish communication protocols to ensure coordination among team members.
Testing and Training: Regularly test the disaster recovery plan to identify and address any weaknesses. Conduct drills to familiarize the team with procedures and update the plan based on lessons learned from testing.
Offsite Storage: Store backups and critical resources in secure offsite locations. Offsite storage mitigates the risk of losing data due to on-site disasters and facilitates faster recovery.
Security Measures: Integrate robust security measures to protect against cyber threats. Cybersecurity is integral to disaster recovery planning. Implement firewalls, encryption, and access controls to safeguard systems and data.
Supplier and Vendor Assessment: Evaluate critical suppliers’ and vendors’ disaster recovery capabilities. Ensure that third-party dependencies have adequate resilience measures, minimizing disruptions caused by their potential failures.
Regulatory Compliance: Adhere to relevant regulatory requirements in disaster recovery planning. Compliance with industry regulations and standards is essential for avoiding legal consequences and maintaining the trust of stakeholders.

Disaster recovery planning in software businesses demands a systematic and disciplined approach, incorporating risk assessment, comprehensive documentation, and regular testing to enhance preparedness for unforeseen events.

3.2.10 Failure Testing (Chaos Engineering)

Failure testing, often called Chaos Engineering, is a systematic approach to testing a system’s resilience by intentionally introducing controlled failures. Chaos Engineering involves purposefully injecting faults or disturbances into a system to observe its behaviour under adverse conditions.

The primary goal is to uncover weaknesses, potential points of failure, and areas for improvement in a controlled environment.

Failure testing (Chaos Engineering) is helpful for testing system reliability, as the following shows:

Identifying Weaknesses: Failure testing helps identify weaknesses and vulnerabilities in a system. By intentionally causing failures, teams can uncover hidden or latent issues that might not be apparent under normal operating conditions. This proactive approach enables preemptive strengthening of the system.
Enhancing Resilience: Chaos Engineering aims to improve system resilience. By exposing a system to controlled failures, engineers can learn how to make the system more robust. This may involve implementing better error handling, redundancy, and failover mechanisms to ensure the system can gracefully recover from disruptions.
Real-world Simulation: Failure testing simulates real-world scenarios in a controlled environment. Introducing controlled failures mimics the unpredictability of actual production environments. This provides a more accurate assessment of a system’s performance and reliability under various adverse conditions.
Early Issue Detection: Chaos Engineering helps in the early detection of potential issues. By proactively introducing failures during development or testing phases, teams can identify and address issues before they reach production. This reduces the likelihood of unexpected outages or disruptions in a live environment.
Optimizing Resource Allocation: Failure testing aids in optimizing resource allocation. Understanding how a system responds to failures allows organizations to allocate resources more efficiently. This may involve scaling resources dynamically or optimizing infrastructure to handle increased loads during failure scenarios.

3.2.11 Stress Testing

Stress testing is a testing technique that evaluates how a system or component performs under extreme or abnormal conditions.

Stress testing involves subjecting the system to conditions beyond its standard operational capacity, such as high user loads, peak traffic, or resource limitations, to assess its stability and identify potential points of failure.

Why is stress testing helpful for testing system reliability?

Capacity Assessment: Stress testing helps assess the system’s capacity limits. By pushing the system to its limits, stress testing reveals the maximum workload it can handle before performance degrades or failure occurs. This information is crucial for capacity planning and ensuring the system can scale to meet user demands.
Identification of Weaknesses: Stress testing identifies weaknesses and bottlenecks in a system. Under stress conditions, latent issues that may not be apparent during regular operation become evident. This allows developers to address performance bottlenecks, optimize code, and enhance the system’s efficiency.
Performance Degradation Analysis: Stress testing helps analyze how performance degrades under extreme conditions. Understanding the rate at which performance deteriorates provides insights into the system’s behaviour during high-stress situations. This information is valuable for optimizing and fine-tuning the system to maintain acceptable performance levels.
Validation of Scalability: Stress testing validates the scalability of a system. Scalability is crucial for systems to handle growing user loads. Stress testing helps determine whether the system can scale horizontally or vertically to accommodate increased demand, ensuring a positive user experience even during peak usage.
Risk Mitigation: Stress testing aids in risk mitigation. By proactively identifying and addressing potential points of failure, stress testing reduces the risk of unexpected performance issues in production. This is particularly important for systems where reliability is paramount, such as financial transactions or healthcare applications.
Enhanced Reliability and Stability: Stress testing improves system reliability and stability. By systematically exposing the system to stress conditions, developers can implement improvements that make the system more resilient. This leads to a more stable and reliable system, reducing the likelihood of crashes or downtime during periods of increased demand.
User Experience Assurance: Stress testing assures a positive user experience under challenging conditions. Ensuring the system maintains acceptable performance levels even when stressed helps guarantee a seamless user experience. This is critical for applications where user satisfaction is closely tied to system responsiveness.
Compliance with Service Level Agreements (SLAs): Stress testing helps ensure SLA compliance. Many systems are required to meet specific performance standards outlined in SLAs. Stress testing verifies that the system can consistently meet these standards, avoiding contractual breaches and maintaining customer trust.

Stress, Load, and Performance Testing in Software and IT Systems

4. Software Platform Reliability and Stability

4.1 What Is Platform Stability?

A stable software platform preserves a high standard of service quality during peak load times. Platform stability can be seen in adequate response times and minimal downtimes due to faults or maintenance.

The number of times you breach the Service Level Agreements (SLAs) per unit of time tells you how stable your platform is.

If the end user’s first wish is to have access to their services without interruption (platform availability), their next wish is that this service is stable and of good quality. If your software offers mission-critical services, users want predictability, primarily when their businesses depend on it. When safety is an issue, they also want top reliability, which we discuss next.

4.2 What Is Platform Reliability?

A reliable software platform breaks less often than an unreliable one. What is precisely meant by “broken” in this context depends on the software, the business, the SLAs, and the client. A reliable car will get you from A to B without breakdowns. Reliable software gets you through peak load times without outages. High reliability leads to high availability and stability.

Reliability can be measured by the number of breakdowns per unit of time, cycles, operations, or other units. A “breakdown” can be hard to define in complex solutions involving many platforms, where some have a more immediate and visible impact than others. Here again, the SLA terms might come into play.

4.3 Improving Platform Reliability and Stability

You can improve platform reliability through redundancy. On the other hand, stability can be enhanced with robust designs. For example, the reliability of a storage device can be vastly improved through a “redundant array of independent disks” or RAID. Its stability can be enhanced through smart disk operating systems and early warning mechanisms.

Improving reliability through redundancy can be expensive, making the solution costlier and more complex; you must carefully weigh the costs and benefits. At some point, the law of diminishing returns starts to apply quite heavily.

Part 1: Solution Design — Introduction and First Principles

5. Visual Design and Aesthetics

5.1 Marketing Made Easy

Aesthetics greatly influence a prospective buyer’s decision to purchase the software. A visually appealing design gives buyers the impression that the software creators took their time putting their product together, trying to create something beautiful. It also provides the impression of modernity and progress.

While aesthetics alone may not help much when software providers fill a demanding Request for Proposal (RFP), unappealing software products might not even make it to that stage, all else being equal. While the latter (all else being equal) is rarely true, customers, when given a choice, will always prefer a good-looking design rather than one focusing solely on functionality.

5.2 Aesthetics and Engagement

Simple, responsive, consistent in style, and aesthetically beautiful software applications encourage end users to keep using them, exploring their functionality and looking at ways to improve them with enhancements or additional functionality.

The opposite is true for applications that are difficult to navigate, inconsistent, slow, or poorly designed. Users will find ways to avoid using them by, for example, creating bespoke front-ends and connecting them to the backend via APIs if these are supported.

5.3 Aesthetics Behind Nielsen’s Usability Heuristics

Nielsen’s Usability Heuristics primarily focus on general principles for usability, and while they touch upon aspects of aesthetics indirectly, they are not explicitly designed to address aesthetic considerations. However, with some creativity, one can provide insights into how some of Nielsen’s Usability Heuristics may relate to design aesthetics:

Visibility of System Status. Visual feedback on system status should be designed to be informative and visually pleasing. Using subtle animations or colour changes can contribute to a positive aesthetic experience.
Consistency and Standards: Providing aesthetically pleasing and clear navigation controls enhances the user’s sense of control and freedom within the application.
Recognition Rather than Recall: A visually intuitive design reduces the cognitive load on users, making it easier for them to recognize and remember elements within the application.
Flexibility and Efficiency of Use: Minimalist designs can contribute to the perceived efficiency of use. Well-designed interfaces with a pleasing aesthetic may be perceived as more efficient.

Informal Education, Soft Skills, and Timeless, Universal Topics You Often Miss at Engineering School

6. User-Friendliness

“User-friendly design” in software applications refers to the intentional and thoughtful creation of interfaces and interactions that prioritize the needs, expectations, and ease of use for the end-users. A user-friendly design aims to provide a positive and efficient experience, allowing users to interact with the software in a way that is intuitive, enjoyable, and effective. Here are key aspects and characteristics associated with user-friendly design:

Factors contributing to what makes a web application user-friendly.

6.1 Intuitiveness

User-friendly interfaces are intuitive, meaning users can easily understand how to navigate and interact with the software without extensive training or guidance.

6.2 Efficiency

Design considerations are made to streamline workflows and tasks, minimizing the number of steps and interactions required to accomplish everyday actions. This efficiency is crucial for user satisfaction and productivity.

6.3 Consistency

Consistent design elements, such as navigation patterns, terminology, and visual styles, contribute to a cohesive and predictable user experience. Users should be able to transfer their knowledge from one part of the application to another.

6.4 Feedback and Communication

User-friendly designs provide clear and timely feedback to users about their actions. This includes visual cues, messages, and responses that help users understand the system’s status and the outcome of their interactions.

6.5 Error Prevention and Recovery

Design considerations include mechanisms to prevent errors and guide users from making mistakes. In the event of errors, user-friendly designs offer clear error messages and pathways for recovery.

6.6 Flexibility and Customization

User-friendly software often allows users to customize their experience to some extent. This may include preferences for layout, themes, or other personalized settings that enhance the user’s comfort and engagement.

6.7 Clarity and Readability

Text, graphics, and other visual elements are designed with clarity and readability. Well-organized information, legible fonts, and appropriate contrast contribute to a user-friendly reading and navigation experience.

6.8 Accessibility

User-friendly designs consider accessibility principles to ensure that the software is usable by individuals with diverse abilities. This includes considerations for disabled users, such as providing alternative text for images or accommodating screen readers.

6.9 Simplicity and Minimalism

User-friendly interfaces often embrace simplicity and minimalism, avoiding unnecessary complexity. This makes the interface cleaner and reduces the cognitive load on users.

6.10 User Feedback and Iteration

User-friendly design is an iterative process that gathers user feedback and incorporates improvements based on their experiences. Continuous refinement ensures the software stays in tune with user needs and expectations.

7. User Behavior Analytics

User behaviour analytics (UBA) consists of collecting user behaviour data, uncovering insights, and optimising the design.

User behaviour analytics (UBA) improves the web application’s user experience (UX). UBA provides valuable insights into their preferences, pain points, and overall behaviour by collecting and analysing data on how users interact with your website or app. This information can then be used to make data-driven decisions that directly improve the UX and user satisfaction.

Here are some specific ways UBA helps improve web app UX:

7.1 Identifying Pain Points and Friction

Heatmaps: Visualize where users click, scroll, and hover, revealing areas of confusion or difficulty.
Session recordings: Watch user sessions to identify actions causing frustration or abandonment.
Clickstream analysis: Analyze users’ paths through your app, highlighting areas where they get lost or confused.
Error tracking: Identify and address technical issues that hinder user flow and completion of tasks.

7.2 Personalizing the user experience:

User segmentation: Divide users into groups based on their behaviour and preferences, allowing for targeted content and features.
A/B testing: Test different versions of your app’s design and functionality to see what resonates best with users.
Dynamic content: personalize content and recommendations based on individual user behaviour and preferences.

7.3 Optimizing conversion rates

Funnel analysis: Identify drop-off points in your conversion funnels, allowing you to pinpoint where users are abandoning the desired path.
Form abandonment analysis: Understand why users are not completing forms on your app and address any usability issues.
Search analysis: Analyze how users search within your app to improve search functionality and ensure users find what they need.

7.4 Enhancing engagement

Session length analysis: Understand how long users engage with your app and identify ways to increase engagement time.
Feature usage analysis: Identify which features are most popular and which are underutilized, enabling you to prioritize development efforts.
Push notification optimization: Send timely and relevant push notifications that motivate users to engage with your app.

7.5 Building Better Products

Identifying trends and patterns: Gain insights into overall user behaviour trends to inform future product development decisions.
Quantifying the impact of changes: Use UBA data to measure the effectiveness of UX improvements and track progress over time.
Prioritizing development efforts: Allocate resources to address the most pressing user needs and pain points based on UBA data.

UBA empowers businesses to make data-driven decisions that directly impact the user experience. Understanding how users interact with your web app and addressing their needs can create a more engaging, enjoyable, and successful user experience.

Here are some additional benefits of using UBA for web app UX:

Reduced bounce rate: By identifying and addressing user pain points, you can encourage users to stay on your app longer.
Increased conversion rates: By optimizing your app’s design and functionality based on user behaviour, you can lead more users to desired actions.
Improved user satisfaction: By creating a more user-friendly experience, you can build stronger relationships with your users and boost their overall satisfaction.
Competitive advantage: By staying ahead of the curve and constantly improving your UX, you can gain an edge over your competitors.

In conclusion, UBA is a powerful tool that can significantly impact your web application’s success. By leveraging its insights, you can create a user experience that is functional, efficient, enjoyable, and engaging for your target audience.

8. Final Words

This article summarised the main factors that impact user experience and how they influence the design of a software system. The aim is to ensure these factors always have a high priority and visibility at every level of the value generation stream.

Software Development and Delivery and The Story of an Engineer

9. Further Reading

Great video on system design.

Operational Excellence

Organisational Culture

Organisational Processes

Project Delivery

Soft Skills for Engineers

Computer Science and Engineering

Quick Links