Process Engineering: Essential Concepts from Lean, Agile, and Toyota for Effective Software Development Processes

1. Process Engineering: A Brief Overview

Dr Jeffrey Liker dedicates seven chapters in his seminal book The Toyota Way to explain Toyota’s lean processes. While Toyota is primarily about automotive and manufacturing, and this website is about software development, the parallels that could be drawn between processes in the two domains are strikingly numerous. On the top, osmosis across the two industries has gone beyond drawing parallels and sharing abstract ideas, and this can be readily seen through practices like flow, kaizen, and kanban that have crossed over body and soul.

Software development processes, in a nutshell, consist of inputs, outputs, and a transformation that operates on them.
In a nutshell, software development processes consist of inputs, outputs, and a transformation that operates on them.

Although this article is about software processes, these cannot be wholly appreciated without looking back at some basic questions like what constitutes a process, how processes are formed, and how they can be optimized. More specifically:

  • What constitutes a process
  • How are processes formed
  • What is waste
  • How is waste interpreted in software development
  • How do Waterfall, Scrum, and Kanban compare on the process level
  • How does the selection of a processing model influence team structure
  • How team structure influences software design

The answers to these questions will come from The Toyota Way – 14 Management Principles From the World’s Greatest Manufacturer, Melvin Conway’s beautiful paper How Committees Invent, and others which you can find in the reference section. The question of process optimisation will be treated in a separate article.

The reader might wonder why we are troubling ourselves with concepts and methods so remote from software development, the central theme of our discussions. The answer is as follows. I believe ideas of great originality, like Kaizen and Kanban, have been transformed, twisted, and turned to become barely recognizable compared to their original form.

A correct diagnosis and a subsequent strategy and action plan for process improvement must return to the first principles. This deep curiosity about “Where and how processes first originated?” is the main driver for placing the Toyota Production System at centre stage.

2. What Is a Process

2.1 Defining Processes

Pictures, blueprints, most diagrams, and chemical structural formulae are state descriptions. Recipes, differential equations, equations for chemical reactions are process descriptions. The former characterize the world as sensed; they provide the criteria for identifying objects, often by modeling the objects themselves. The latter characterize the world as acted upon; they provide the means for producing or generating objects having the desired characteristics.

— Herbert A. Simon, The Architecture of Complexity

In his seminal work, The Architecture of Complexity, Nobel-laureate Dr. Herbert Simon defines a process as the means of generating an object with specified qualities and characteristics. In addition to physical, chemical, and biological objects, this definition also applies to organisational processes that aim to produce business and customer value.

Processes at an abstract level consist of a business problem, an ingenious solution, and the engineering skills to make it a real thing.
Processes at an abstract level consist of a business problem, an ingenious solution, and the engineering skills to make it a real thing.

A process is a structured method to achieve a specific goal, typically transforming raw inputs (material, information) into products with added value. Products are valuable to customers; they solve a well-defined business need. Therefore, processes allow the creation of value out of the following ingredients:

  • Familiarity with customer needs, tastes, and preferences. An essential ingredient to any value-adding process is the intimate knowledge of the business problem we are trying to solve and, more specifically, the customer’s preferred solution design that satisfies generic constraints such as low cost and good quality or technical constraints like size, energy consumption, and performance.
  • Ingenuity and creativity. This ingredient allows the generation of new solutions by radical repurposing of existing solutions or their decomposition and recombination in novel ways. Ingenuity marries technology with concepts to create workable solutions. At the heart of software products is the assumption that computing technology can augment people’s capacity to process, persist, and generate insights from large amounts of data.
  • Engineering skills. This covers the theoretical and practical skills of applying science, mathematics, and technology to design, create, improve, and maintain various systems, structures, machines, devices, and processes. Engineering applies technical solutions to architectural concepts, the outcome of which is a practical, functioning, and efficient software product.

2.2 Why Is Structure Desirable

A structured approach to problem-solving offers the following advantages:

  • Structure eliminates ambiguity. People don’t have to guess what needs to be done; they just follow an established process. We can also argue that an overly structured approach stifles innovation by reducing variation in the system. Either way, ambiguity must remain within acceptable boundaries for people to operate comfortably.
  • Standard processes facilitate integration with other teams, systems, and processes. Integration is smoother if all teams and members (even new ones) speak the same language and have a similar understanding of the software development structure and processes.
  • Processes are memory structures, preserving past lessons and expertise. It’s much more efficient (and less costly) to explain to someone how a process works and what happens if they break it rather than letting them make all sorts of mistakes in the hope that they would learn from them and retain the learned lessons. Process standardisation helps learned lessons propagate across the organisation.

2.3 The Supplier, Input, Processes, Output, Consumer (SIPOC) Model

The Supplier, Input, Processes, Output, Consumer (SIPOC) model is a framework used in business process management and Six Sigma methodologies. It provides a high-level overview of a process, helping to define its key components and stakeholders:

  • Supplier: The supplier represents the entity or individual that provides the inputs or resources needed for the process. Suppliers can be internal or external to the organization.
  • Input: Inputs are the materials, information, or resources required for the process to function. They are typically received from the supplier and serve as the starting point of the process.
  • Processes are the activities or steps to transform the inputs into desired outputs. This section describes how the inputs are handled, manipulated, or transformed during the process.
  • Output: Outputs are the final results or deliverables produced by the process. They represent the outcomes or products generated as a result of the process.
  • Consumer: The consumer refers to the individual, group, or department that receives and utilizes the outputs of the process. Consumers can be internal or external stakeholders who benefit from the process’s results.

The SIPOC model is a simplified and highly abstract model of what happens in reality. Processes can be complex, with many inputs, outputs, and steps configured in sequence or iterations, with feedback loops, branching, and other complicated structures.

Reducing complex processes to SIPOC models in an attempt to understand and optimize them destroys many aspects of their value-creation capacity. This happens when small units focus on maximising their productivity locally rather than refining inter-group collaboration. More on this later.

2.4 Efficiency and Effectiveness: Two Essential Metrics for Measuring Process Performance

2.4.1 Process Efficiency

Measuring process efficiency can be done by looking at cycle time, throughput, cost per unit, error rate, productivity, and utilization rate.
Measuring process efficiency can be done by looking at cycle time, throughput, cost per unit, error rate, productivity, and utilization rate.

Process efficiency refers to how well resources (such as time, money, and materials) are utilized to achieve a desired output. It focuses on:

  • Minimizing waste
  • Reducing redundancies
  • Optimizing the use of resources

In other words, process efficiency is about doing things right. It aims to streamline operations, eliminate unnecessary steps, and increase productivity. Process efficiency can be measured using various metrics, such as:

  • Cycle time. The time required to complete a process or a specific task within it.
  • Throughput. The number of units or tasks processed within a given time frame.
  • Utilization rate. The extent to which resources are used effectively.
  • Error rate. The frequency of errors or defects in the process.
  • Cost per unit. The cost incurred to produce or deliver a single unit of output.
  • Productivity. The ratio of output to input, indicating the efficiency of resource utilization.

2.4.2 Process Effectiveness

KPIs, compliance, adaptability, outcomes, and customer feedback allow us to measure process effectiveness.
KPIs, compliance, adaptability, outcomes, and customer feedback allow us to measure process effectiveness.

Process effectiveness focuses on the degree to which a process achieves its intended goals or outcomes. It assesses whether the process is capable of delivering the desired results and meeting the requirements and expectations of stakeholders. In essence, process effectiveness is about doing the right things. It emphasizes the alignment between the process and its objectives.

Measurement of Process Effectiveness: To measure process effectiveness, you can consider the following indicators:

  • Key Performance Indicators (KPIs). These are specific metrics aligned with the strategic goals of the process. For example, customer satisfaction ratings, on-time delivery performance, or adherence to quality standards.
  • Customer feedback. Gathering feedback from customers or stakeholders to assess their satisfaction with the process outcomes and overall experience.
  • Outcome analysis. Evaluating the actual results the process achieves against the intended targets or benchmarks.
  • Compliance. Assessing the extent to which the process adheres to regulatory requirements, industry standards, or internal policies.
  • Adapting to change. A process is effective if it allows the organisation to survive external environmental threats and internal strife.

3. Cost of Inefficient or Ineffective Processes

The following symptoms are usually observed when poorly designed and implemented processes are in play:

  • Widespread confusion and lack of clarity. Inefficient processes often confuse employees, resulting in a lack of clearly defined roles, responsibilities, and expectations. Often, such confusion leads to conflicts, individual or departmental rivalries, and determinantal group behaviour like finger-pointing or passing the buck.
  • Bottlenecks and inefficiencies. Ineffective processes can create bottlenecks and inefficiencies, causing work to pile up in certain areas of the organization while others remain idle. This can result in a significant waste of time and resources.
  • Lack of accountability and ownership. Assigning accountability and ownership for specific tasks or outcomes can be difficult without well-defined processes. This can lead to a lack of responsibility and decreased overall productivity. When such practices persist over long periods, they become part of the norm, and rarely would individuals step up and assume ownership or responsibility for specific outcomes. On the other hand, individuals will be encouraged to claim credit for actions they have not partaken in.
  • Inability to adapt to changing priorities and internal or external pressures. While overly constrained processes stifle innovation and the ability to respond to new challenges, poorly engineered processes can have the same effect as they obstruct quality decision-making, clear communication, and effective problem-solving.
  • Variations of output regarding quality and capacity to deliver. Production variance refers to the variability or deviation from the desired target or specification in key process parameters such as dimensions, tolerances, cycle times, or material properties. In Six Sigma, the adverse effects of production variance are typically described as increased defects, rework, waste, customer complaints, and decreased process capability.
  • No continuous improvement (kaizen) is either ineffective or non-existent. Whatever lessons teams learn and process improvements they make remain local, sporadic, and short-lived; they are not properly assessed, standardized, or propagated to other teams. The net result of such efforts on a larger scale is negligible.
  • Scaling becomes more challenging. Scaling can be done by expansion, imitation, replication, and for complex systems by decomposition and recombination. All of these methods assume the original processes that are being copied or reused are functional. When they are not, scaling challenges exacerbate.
  • Fire-fighting vs fire prevention. Constant fire-firefighting and crisis management are signs of processes breaking down. These require significant management attention, distracting leadership from more important, long-term strategic problems.

4. Process Flow and Organisational Performance

4.1 Flow and Optimal Processing Times

Flow is at the heart of the lean message that shortening the elapsed time from raw material to finished goods (or services) will lead to the best quality, lowest cost, and shortest delivery time. […] A lean expression is that lowering the “water level” of inventory exposes problems (like rocks in the water), and you have to deal with the problems or sink.

— Jeffrey Liker, The Toyota Way

Lowering the water level to expose inefficiencies is a powerful metaphor. It provides insights outside manufacturing if applied beyond issues of excessive inventory. Flow is a hypothesis that positively correlates optimal processing times with quality, low cost, and speed of delivery. Our common sense tells us otherwise, mainly that speed is synonymous to haste and that taking shortcuts today is deferring the problem to the future rather than eliminating it. Think of technical debt, a design or implementation shortcut consciously made to make the deadlines.

If we can imagine processes as entangled ropes on a floor, achieving flow means disentangling them so that each one can be used to its maximum potential. Disentangling processes can drastically increase efficiency through minimal switching of people, machines, and tools, removing bottlenecks, and serializing tasks. Process improvement is a topic; we will look at it in detail in another article.

4.2 Flow in Software Development

What if we have entangled fishing nets instead of ropes? Is flow conceivable in this context? This question is relevant when we contrast manufacturing with services, particularly software development. In fishing nets, everything is connected to everything else, and disentangling them is far more complicated.

Manufacturing is complicated, while software development can be either complicated or complex. Here, complex and complicated are not the same and refer to two domains in the Cynefin framework. The best way to understand this difference is through the following:

  • Complicated. This is a problem where the solution is not apparent, but given enough information, an expert can identify a best practice. For complicated problems, engineering solutions exist.
  • Complex. In complex problems, the data (or evidence) will support multiple hypotheses, and best practices don’t exist. The approach is to probe the problem with multiple safe-to-fail experiments, amplifying those that work and dampening those that don’t.

It is vital to understand that following established processes and applying process improvement methods cannot occur in complex domains where the correct process has yet to be established. The complexity that comes with software requires practitioners to identify the domain they are in correctly.

Scrum, Kanban, Extreme Programming, and hybrid models offer different ways to reduce this choking interconnectedness. We will look at these software development frameworks a bit later. Nigel Thurlow has some interesting ideas on flow in software development, which are worth reading. In the meantime, let’s understand where process inefficiencies originate and how methods like flow can remedy them.

5. Production Processes: A Brief History

5.1 Mass Production Thinking

Mass production is a familiar pattern of organisational setups. It’s based on the premise that grouping entities (people, machines, tools) of similar types, roles, or competencies makes production tasks more efficient. Most organisations are set up this way, with separate departments for finance, accounting, engineering, production, sales, support, and human resources. The efficiency of this setup is assumed to come from the following:

  • Scheduling flexibility. If you have a group of product owners or developers, assigning incoming tasks will consist of finding the least busy resource and adding the new task to their backlog. If, on the other hand, you had one product owner, scheduling becomes more challenging as their time would need to be tightly managed to avoid cascading delays. A group of interchangeable technicians allows flexibility in scheduling future loads.
  • Built-in redundancy. Skilled workers can get sick, go on leaves, or move on to a new adventure. A group of employees with shared skillsets makes talent mobility less problematic as people can temporarily switch projects and tasks.
  • Straightforward knowledge management. Group knowledge management is significantly easier as the documentation and process improvement effort is divided amongst the group members. Redunancy again provides an additional layer of security on essential know-how. The opposite is to have a senior expert who is also a single point of failure.
  • Scaling potential. The ability to scale a team with an established culture, mature processes, and accumulated knowledge is much easier than throwing a new resource at a project without a supportive environment. Shared knowledge, tools, and best practices constitute the necessary learning scaffolding for a new member.

However, efficiencies afforded by mass production come at a cost, which can be summarized with the following points:

  • Admin, planning, and coordination overhead. A special team of experts is now required to plan and coordinate the activities of the expert teams. Synchronization challenges increase with scale as efforts from larger, more diverse, geographically distributed teams must be planned and coordinated.
  • Specialization and division of labour. While silos allow expertise to accumulate and emerge, they can create significant barriers to communication and collaboration. Inter-group rivalries may arise, especially when key performance indicators are set for individual teams, fostering competition instead of collaboration.
  • The batch and queue system. Having divided your resources into teams, you must now determine the optimal way of moving material, products-in-progress, and information between teams. The traditional method employed is the batch and queue system, which deserves special attention, and will be dealt with in more detail in the next section. In the meantime, it suffices to mention that batching and queueing are significant sources of delay and adept at hiding quality issues, as we shall discuss in a minute.

5.2 The Batch and Queue System

Imagine that you allocate a team of business analysts to a specific client project, and the work requires the analysts to be on-site for workshops. To maximize their time utilisation, you would want them to work on as many new requests as possible during their stay.

The same applies to specialised machinery, where switching from one mode of operation to another introduces costly delays while engineers reconfigure the machines. While the machines are being reconfigured, the workers are either switched to a different task or wait until the reconfiguration is done.

To minimize the overhead associated with switching projects or machinery (context switching) and the overhead of physically moving material or information between teams, we would like the payload to have the biggest size and the movers to do the least roundtrips. This operational pattern is called the batch and queue.

The batch and queue system requires that a request for new work reaches a certain threshold before a batch is moved to the next stage in the production process. The batch waits in a queue while more items are added and only moves to the next stage once it is full.

The batch and queue system is designed to maximize resource and time utilisation at the cost of higher lead times, lower quality, and lack of scheduling flexibility.

  • Higher lead and delivery times. A new item must wait for the batch to be ready before work can start. An item in progress must also wait for other items to be processed in the current stage before it moves to the next.
  • Quality issues. Imagine a set of requirements is ready, and development is scheduled to start soon. Imagine also that a faulty design is introduced in the third requirement, and the development of the remaining requirements is built around that faulty design. The faults will only be detected during testing, and by then, you will have not one but seven faults that need to be repaired.
  • Lack of scheduling flexibility. Because of its setup, a batch system makes it harder to shuffle tasks, machines, or resources around when priorities change without disrupting the batch in progress. Small batches quickly free resources so they can be assigned to other tasks, but as we said, they tend to be inefficient regarding resource utilisation.
  • Difficult to optimize. Optimisation of a batch and queue system end-to-end requires lowering the work-in-progress items to one. A one-piece flow affords optimal lead and delivery times. However, a single work-in-progress item is the least inefficient from any team’s perspective, as these typically work best when their batches are the largest.
  • High inventory levels. While not relevant to software, inventory management is crucial for manufacturing. Spare parts and products-in-progress take up space, make things difficult to find and hide defective items. Toyota has gone to great lengths to minimize inventory levels and organise their warehouses with 5S (Seiri/Sort, Seiton/Set in order, Seiso/Shine, Seiketsu/Standardize, Shitsuke/Sustain).

One-piece flow is how Toyota overcame the batch and queue system problems while sustaining low costs. In software, things got a bit more complicated and a combination of methods (Scrum, Kanban, eXtreme Programming, hybrids) were created to resolve those issues.

6. Sequences, Iterations, and Feedback Loops in Production Processes

6.1 Process Anatomy and Dynamics

At a sufficiently high level, any process can be described as a sequence of stages, with the output of the first feeding into the next. The Software Development Lifecycle (SDLC) is a prime example of how this reduction can be applied. The SDLC is usually depicted as a five-stage process of Analysis, Design, Development, Testing, and Deployment.

The reality, however, is much more complex, as a software developer or any other technician would gladly attest. The practitioner and the keen observer will distinguish the following structures and dynamics in a sophisticated process:

  • Sequences. Sequential processes are a type of structure where stages are wholly dependent on their predecessors and cannot be started until the previous stage has been completed. Sequential stages are the easiest to implement, understand, and optimize but are the least efficient, as they do not ensure optimum utilisation of resources. Also, delays in one stage can have immediate downstream impacts on its successors.
  • Buffers, inventories, backlogs. Any stage can have an allocated buffer, inventory, or backlog where items in progress can wait for the next available worker. Buffers, inventories, and backlogs alleviate the waiting problems associated with sequential stages by keeping items ready to be processed in a queue and pulling them out when workers run out of work. They change the dynamic of resource utilisation on the stage level from a push to a pull system.
  • Bottlenecks. Whenever stages differ in processing capacity, a new dynamic can be observed where bottlenecks arise. Lower processing capacity is typically associated with complex or laborious tasks, scarce resources, or faulty process design. Bottlenecks can be spotted when backlog or inventory levels continuously grow until they overflow.
  • Parallel processing. Processes can also be set up to include stages that run in parallel. Parallel processing structures present synchronisation and coordination challenges but can be vastly more efficient than their sequential counterparts. For example, source code merge conflicts are typical side effects of parallelising interdependent implementations.
  • Feedback loops. Feedback loops are mechanisms for relaying information between a process’s forward and backward stages. This information is typically associated with a specific output and, in most cases, is the difference between the real outcome and a target value. They appear when downstream processes detect an error that requires a redesign or a rework to be fixed. They can also be actively set up to collect vital information on a prototype’s utility and value in a real-world scenario.
  • Iterations. Iterative processes attempt to refine a specific output by launching early and fixing later when more information is available from feedback loops. Iterative processes make it difficult to predict timelines since the number of required cycles is not necessarily known in advance. Iterative structures are very flexible and work better under conditions of uncertainty.
  • Value-adding work. Dr Winston Royce’s classical paper on developing large software systems provides compelling arguments on what constitutes value-added work and what doesn’t. Anything that customers are happy to pay for (design, development) is value-added work. In contrast, waste is anything customers push back on. Waste elimination is where cost reductions can be made without hurting the product.

6.2 Interpersonal Processes and Their Impact on Organisational Processes

Interpersonal processes define the interactions of group members working towards a specific goal. They cover the following areas:

  • Communication
  • Functional roles within a group
  • Problem-solving
  • Decision-making
  • Leadership and authority
  • Intergroup processes

Interpersonal processes are crucial in shaping individual choices and judgments, both inherently uncertain facets of human conduct. Although psychologists have discerned recurrent patterns in human behaviour, these patterns are chaotic (although not absolutely random, the difference between chaotic and random is explained in the article on the Brusselator, a mathematical model of chemical reaction dynamics).

When examining production processes, it is tempting to focus exclusively on a process’ static structure (sequential, iterative, or parallel stage setup) or properties (feedback loops, backlogs, flow) and ignore influential aspects of human nature. While this approach is not incorrect, it is incomplete. This incompleteness is best described by a quote from Edgar Schein’s book Process Consultation: Its Role in Organisation Development (a highly-recommended read for any manager).

The network of positions and roles which define the formal organisational structure is occupied by people, and those people, in varying degrees, put their own personalities into getting their job done. The effect of this is not only that each role occupant has a certain style of doing his work but that he has certain patterns of relating to other people in the organisation. These patterns become structured, and out of such patterns arise traditions that govern the way members of the organisation relate to each other.

— Edgar Schein, Process Consultation: Its Role in Organisation Development

In other words, production processes can be designed to follow a blueprint, while interpersonal and group processes are emergent properties resulting from the interaction of agents in the system and, therefore, cannot be engineered.

Organisational processes exhibit complex behaviour and cannot be understood solely based on their structure while ignoring interpersonal processes. In that sense, organisations are complex adaptive systems, not machines.

The realisation that organisations are complex systems has wide-ranging implications, from management to performance, continuous improvement, knowledge management, and change management. Complexity levels will also determine whether Operational Excellence is feasible in a specific situation.

6.3 Codifying Complex Processes and the Problem of Extreme Specialisation

Have you ever been asked to document what you knew or the processes you followed for a handover or training session? How much of that knowledge would you have been able to codify in a reasonable amount of time? Probably not much; Dave Snowden, an expert consultant and the creator of the Cynefin framework, caps it at 10%.

Writing down a long list of server names, procedure calls, or product specifications is highly desirable. However, documenting processes or solutions to past problems assumes that future situations closely resemble or replicate historical ones, which is not entirely true. A problem to which we already know the answer is not a problem. Problems worth exploring are novel, complex, and highly dependent on the specific path a system has followed to arrive at its current state, making them unique. How does one cope with issues of novelty?

Software engineers use past expertise, technical and field knowledge, and critical thinking skills to make decisions in complex and novel situations. Their choices today are not textbook solutions read from a book but depend on the particularities of their journey of past mistakes, lessons learned, cultural traditions, and skills acquired.

Dealing with uncertainty and novelty brings us to the question of specialisation. While developers who are exceptionally skilled in one programming language have great value if the organisation relies on that technology, they will struggle quite a bit if it adopts a new one. Teams should have an ideal mix of specialists and generalists, the latter combining ideas from various disciples to solve new problems in novel ways.

7. Identifying and Eliminating Process Waste

7.1 Waste: The Toyota Definition

Muda is a Japanese word meaning “senseless or meaningless”. […] This is why using Japanese words to understand the true meaning is important because muda doesn’t just mean waste. […] It’s actually about the clumsiness or lack of professionalism leading to unnecessary costs.

— Nigel Thurlow, Have we forgotten our key denominator: FLOW? – Lean and Agile Summit 2022

Waste is tightly coupled with manufacturing (at least historically through Toyota and Taiichi Ohno), and sometimes it is difficult to identify the equivalent of waste in highly complex, rapidly evolving industries like software development.

Toyota defined waste as inefficiencies from non-value-adding tasks (muda), overburden of machines and people (muri), and unevenness of load (mura), the latter two increasing non-value-adding tasks.
Toyota defined waste as inefficiencies from non-value-adding tasks (muda), overburden of machines and people (muri), and unevenness of load (mura), the latter two increasing non-value-adding tasks.

In manufacturing, waste is mainly associated with excess inventory, moving material between locations, and delays when reconfiguring machines, fixing quality problems, or switching products on assembly lines. In contrast, software development has none of these issues but a score of other problems equally damaging to an organisation’s delivery capabilities.

This section will delve deeper into the three types of waste outlined in Liker’s work, The Toyota Way. Subsequently, we will establish notable connections between these types of waste and the software development sector.

7.2 Non-Value Adding Waste (Muda)

7.2.1 Muda in Manufacturing

This type of waste refers to any activity or process that does not add value to the product or service. Muda can manifest in various forms, such as:

  • Unnecessary movement
  • Overproduction
  • Waiting time
  • Excess inventory
  • Defects
  • Overprocessing

Toyota aims to identify and eliminate these wasteful activities to improve efficiency and reduce costs. One technique that Taiichi Ohno promoted is called the Gemba Walk or simply Gemba. Gemba is a Japanese term that translates to “the real place” or “the actual place.” It refers to the physical location where value is created, such as the factory floor, shop floor, or any other place where the work is performed.

The Gemba Walk involves going to the actual location where the work is being done to observe and understand the process. Ohno, a key figure in the development of the Toyota Production System (TPS), emphasized the importance of observing the Gemba to identify waste, inefficiencies, and opportunities for improvement.

Managers and leaders can gain valuable insights and make more informed decisions to enhance productivity and quality by directly observing the work environment and interacting with the people involved.

7.2.2 Muda in Software Development

In software development, we can think of the following activities as supporting the main effort of analysing a problem and writing the code contributing to the solution.

  • Project management (creating project plans, JIRA tickets, and progress reports)
  • Requirement gathering
  • Documentation
  • Prototyping
  • Meetings (aside from those necessary for producing code)
  • Refactoring
  • Testing
  • Infrastructure maintenance
  • Tool setup and maintenance
  • Redesign
  • Maintaining low-value features

Which of these items has a direct influence on the final shape of the product, and which are waste? Dr Winston Royce defined waste in his famous 1970 paper Managing the Development of Large Software Sytems (the same paper that ushered the Waterfall age while at the same time heavily critiquing it).

Two essential steps are common to all computer program developments, regardless of size or complexity. There is first an analysis step, followed second by a coding step […]. This sort of very simple implementation concept is […] the kind of development effort for which most customers are happy to pay since both steps involve genuinely creative work which directly contributes to the usefulness of the final product. […] Many additional development steps are required [for larger software programs], none contribute as directly to the final product as analysis and coding, and all drive up the development costs.

— Dr Winston Royce, Managing the Development of Large Software Systems

This implementation concept that analysis and development are the only truly value-adding activities, while the rest (testing, documentation, refactoring, and everything that supports them from automation infrastructure to CI/CD pipelines) are not, is key to any process optimisation strategy. The management’s job, as per Royce, revolves around the following:

  • Selling the idea of essential but non-value-adding activities to customers who would rather not pay for them and to the developers who would prefer not to implement them.
  • Enforcing compliance on developers to follow software development best practices and uphold the standards of quality, testing, documentation, and automation. As most of these activities can be mechanical and effort-intensive but add little value to the final product, developers are reluctant to do them.
  • Maximizing the ratio of value-added to non-value-added work by maximizing productivity while simultaneously maintaining high-quality standards and low costs.

This approach to performance improvement and optimisation is centred on industrial engineering concepts. We must remember that maximizing productivity through industrial engineering is only one of three ways to manage a business. In Operational Excellence and the Structure of Software Development and Delivery, we listed two other schools of thought on business management: Organizational Theory and Behavioural Science.

Organizational Theory and Behavioural Science have alternative views on what organisations and individuals require to perform, but given we are talking about production processes, we will focus solely on Industrial Engineering.

A simple heuristic that gauges whether an engineer’s effort is value-adding or otherwise would be to ask: How much do decisions this engineer makes during that specific process or task shape the final product? Gemba is one method that allows leaders to answer this question. Any manager wishing to improve a specific process must have firsthand information.

Only quality ethnography (watercooler stories) coupled with statistics, KPIs, and raw data from the Gemba will allow managers to interpret and diagnose situations correctly.

7.3 Unevenness (Mura)

7.3.1 Mura in Manufacturing

Mura refers to waste caused by unevenness or inconsistency in production or workload. It involves variations in demand, workflow, or resources that result in inefficiencies, such as:

  • Bottlenecks
  • Underutilization
  • Excessive stress due to workload fluctuations.

Toyota uses several techniques to reduce mura. These techniques are integral to the Toyota Production System (TPS) and lean manufacturing principles. Here are some key techniques employed by Toyota to address mura:

  • Standardized Work. Standardized work establishes clear and consistent procedures for each task and process. It helps eliminate variations (see Thoughts on Six Sigma for Developing Your Software Engineering Processes) caused by different operators or equipment setups.
  • Heijunka (Production Levelling). Toyota managers understood the adverse impacts of spikes and dips in production load and sought to eliminate it with Heijunka or production levelling. Heijunka aims to minimize fluctuations and avoid overburdening specific processes or resources. It is a method of levelling production by smoothing out the workload and balancing the production schedule.
  • Pull Systems: Toyota implements pull systems, such as Just-in-Time (JIT) production, to reduce mura. Pull systems ensure that production is driven by customer demand, with products or parts being produced only when needed. By synchronizing production with demand, it minimizes inventory and the variability caused by fluctuations in customer orders.
  • Takt Time. Takt time is the rate at which products or services must be produced to meet customer demand. Toyota uses takt time as a guideline to balance workloads and ensure a smooth production flow. By aligning the production rate with customer requirements, mura is reduced, and processes can operate consistently.
  • Kanban. Kanban is a visual scheduling system that helps manage workflow and inventory levels based on customer demand. It aims to create a smooth and balanced workflow, eliminating the unevenness and fluctuations that can lead to waste and inefficiencies. By implementing Kanban, teams can visualize their work and limit work in progress (WIP) based on available capacity.

Combined with lean manufacturing principles, these techniques help Toyota create a more stable and efficient production system, reducing mura and improving overall productivity, quality, and customer satisfaction.

7.3.2 Mura in Software Development

Have you ever experienced the stress of go-lives, end-of-release cycles, and looming deadlines? We all did, and we are prone to believing that this is the nature of the work and that the world has always been that way.

As one manager once told me, software activities are alternating cycles of “famine or feast”, and from that moment, I kept turning that statement in my head. Fast forward a decade, and the answer seems to have been partially posited and once again derived from the Toyota Production System (TPS). Let’s review some methods of levelling the workload implemented in software and see how they relate to TPS.

  • Release cycles. Delivering through releases is a method for managing schedules, setting stakeholder expectations, and levelling workloads. Release management considers the team’s capacity and the difficulty and size of the tasks and plans future releases accordingly. Release cycles of up to twelve months are typical and work well when ambiguity, uncertainty, and requirement volatility are low. Under conditions of uncertainty, you typically want shorter cycles (sprints of 2-4 weeks).
  • Backlogs. Releases are generally fixed-scope, fixed-schedule plans, making them less adaptable to changing priorities. To solve this problem, Agile frameworks introduced backlogs that can be defined as ordered lists of new features. Instead of a fixed schedule, we now have a dynamic list that product owners manage by shifting items up or down in response to changing priorities. Features completed within the release period are part of that release. Backlogs, therefore, kept the time constraint but removed the scope constraint.
  • The pull system. Instead of assigning work directly to a team with the potential of overloading them and creating mura, the pull system requires developers to pull an item from the backlog once they can take on new work. A vanilla pull system, however, does not consider the skill requirements of a specific task and assumes that any developer can implement any feature in the backlog.
  • Kanban. While the pull system prevents the overallocation of workloads, it does not stop a developer from pulling more than one user story and working on them simultaneously. Ideally, you want developers to perform minimal “context switching”, the inefficient switching between different tasks. Kanban is generally used to limit Work-In-Progress (WIP) so that, at all times, the amount of work allocated to a team matches the team’s capacity to deliver. Kanban ensures a smooth and levelled workload by visually signalling areas of over-allocation.
  • Standardized work. Unlike manufacturing, software delivery is highly dynamic, and standardization can be a double-edged sword. On the one hand, process standardisation allows teams to collaborate effectively and managers to shift resources easily, provides a common language, and ensures that learned lessons are propagated to all teams. On the other hand, standardisation requires effort to ensure compliance, so it comes with a price tag. You also don’t want absolute homogeneity; process variations allow people to use their creativity to explore multiple solutions.

7.4 Overburden (Muri)

7.4.1 Muri in Manufacturing

Muri refers to waste caused by overburdening or straining people, equipment, or processes beyond their capacity. It includes activities that place excessive stress or workload on employees, leading to errors, accidents, or inefficiency.

7.4.2 Muri in Software Development

Overburdening people can happen for any of the following reasons and results in stress, anxiety, absenteeism, low productivity, and low motivation.

  • Unreasonable deadlines
  • Lack of coherence in planning or design
  • Lack of unity of command leading to disconcerted visions and expectations
  • Poor technical understanding of the job requirements
  • Underspending on application maintenance and technical debt reduction
  • Low investment in infrastructure, tools, automation, and talent

Overburdening can occur in well-motivated people and is different from motivation.

8. Software Delivery Processes: Anatomy, Efficiency, and Success Stories

In the previous sections, we have described various concepts related to manufacturing processes and how they relate to their software counterpart. While the link between processes in both fields is undeniably there, manufacturing and software (or services in general) have different contexts, and practices in their fields have diverged significantly. Assumptions from manufacturing are valid only when applied at a very high level.

In the following paragraphs, we will focus on the pioneering work in software development methodologies by practitioners like Winston Royce, Melvin Conway, Kent Beck, Ron Jeffries, Ward Cunningham, Jeff Sutherland, and Ken Schwaber. The objective, however, is not to fully describe each methodology or method (entire books have been dedicated to each topic) but to break down the respective methods (Waterfall, Scrum, Kanban, XP) into their essential elements. This return to first principles will allow us to understand how these methods operate on their lowest levels and what issues each resolve or introduce.

8.1 Waterfall

What we all know about the Waterfall method is its five classical stages. What we rarely think about is which of these (Analysis and Development as per Royce) have an impact on the final shape of the product.
What we all know about the Waterfall method is its five classical stages. What we rarely think about is which of these (Analysis and Development as per Royce) have an impact on the final shape of the product.

The following ideas describe the Waterfall approach, process, and implementation, with its strengths and weaknesses:

  • Waterfall as an assembly line. I believe Waterfall became popular because its processes were easy to explain and understand. At its core, Waterfall is best described as sequential stages that cover requirement gathering, analysis, design, coding, testing, and operations. Under this regime, software development can be done by minimally-interacting teams of specialized competencies (business analysts, system analysts, architects, developers, testers, and operations) that interface only when the product-in-progress moves between stages.
  • Waterfall is linear. Waterfall assumes that each stage in the process can be fully closed before the next begins. To be successful, business requirements must be gathered before analysis, design completed before development, and testing started once a production version of the application is ready. Royce identified this problem in his 1970 paper on managing large software projects. In this paper, he critiques Waterfall for being unable to deal with issues that are not analyzable in advance (mainly performance-related).
  • Value creation and waste. Royce described analysis and coding as the primary processes responsible for value creation. Although the rest are essential to delivery, they contribute minimally to the shape of the final product. This means a significant portion of the investment effort cannot be capitalised or resold. Waterfall success greatly depends on quality documentation, another non-value-adding activity, due to the compartmentalisation of knowledge and resources.
  • Redesigns are costly. Since testing is only started at later stages and because the development of large and numerous features has already been carried out, a design fault can remain hidden for a long time. During this time, more features are created on top of the faulty ones. To paraphrase Royce, one hopes that iterations are limited to successive steps. Sadly, this is not always the case. Design faults, normally not remediable with minor code changes, might mean a 100% overrun.
  • Siloing of development and operations. Managers recognized early on that operations-oriented staff are better and cheaper than developers for running applications in production. This led to the segregation of these competencies into separate silos. While silos are neither good nor evil, they create information barriers and obstruct flow.
  • Prototyping. Although explicitly called out by Royce as the potential solution to avoiding design rework, prototyping never became an integral part of the Waterfall method.

8.2 Scrum

Scrum is a planning tool; it doesn’t make you Agile. But you can use it to achieve some outcomes that might lead to Agility.

— Nigel Thurlow, Have we forgotten our key denominator: FLOW? – Lean and Agile Summit 2022

Jeff Sutherland and Ken Schwaber developed Scrum in the early 1990s. Inspired by Lean Management and the Toyota Production System’s one-piece flow, Scrum presented a significant departure from the old ways of Waterfall. It all started when Sutherland inspected a large Gantt chart for a major project.

Scrum focuses on small changes, frequent deliveries, working software, customer involvement and self-organisation of Scrum teams.
Scrum focuses on small changes, frequent deliveries, working software, customer involvement and self-organisation of Scrum teams.

Sutherland saw on that Gantt chart a classical Waterfall implementation with many interdependent tasks, divided into batches of features to be analysed, developed, tested, and deployed as one single delivery. It didn’t take much for him to understand that a delay in any task will have a cascading effect on the rest, especially if it’s on the project’s critical path. Given the (typically large) number of tasks and people involved, delays were inevitable, and project failure was above 85%.

Sutherland’s solution, inspired by how pilots landed their planes on swinging aircraft carriers under various weather conditions (and six-legged robots negotiating obstacles, see his Google talk, link in References), was to break down the large project into features that can be delivered in two-week sprints and large silos into smaller self-organising teams.

Scrum’s fundamental assumptions were as follows:

  • Small changes, more frequent deliveries. No matter their size, Scrum assumes that projects can be broken down into smaller pieces that a Scrum team can deliver in one or two sprints. This radical shift from large batches of many features reduced the risk of faulty designs accruing and belatedly discovered. On the other hand, it required more skills from the team to perform at that level.
  • Team capacity and organisation. Scrum teams are cross-functional and, ideally, self-sufficient, enabling them to make quality decisions quickly. Scrum also recognizes that teams have limited capacity regarding what can be done in one sprint. Their capacity dictates the amount of work (measured in story points) they can take on per sprint.
  • Customer involvement in product design. Scrum emphasises early delivery of prototype-grade features that customers can operate in production-like environments. If the prototype is good, it can be enhanced. Otherwise, it can be modified or decommissioned. Early customer involvement eliminated the risk of building the wrong thing.
  • Emphasis on working software. Scrum teams under Scrum must deliver working software at the end of each sprint. A feature can be refined, enhanced, and updated iteratively in subsequent sprints if the original implementation was not good enough. Scrum, therefore, natively supports an iterative approach that works well with partially-defined or volatile requirements. Quick iterations mean faulty designs are detected as early as possible.
  • Scrum turns complex into complicated. Complex and complicated here refer to two separate domains (see Cynefin framework). By adopting an iterative approach and promptly and frequently engaging the customer, a complex problem can become complicated. In the latter case, engineering solutions typically exist already, and experts can be brought in to design solutions guided by best practices.

Scrum teams deal with waste as follows:

  • Non-Value Adding Waste (Muda). Scrum is very similar to one-piece flow in some sense, which is the pinnacle of what can be achieved in waste reduction. One-piece flow means no inventories, overproduction, large movements of material, or hidden quality problems. Similarly, because Scrum teams are cross-functional, they can design, build, test, and deploy (with DevOps) small features with minimal waiting times. Fast and frequent deliveries mean features can be tested and evaluated very quickly.
  • Overburden (Muri). Scrum teams estimate work difficulty through story points and try to match that with their sprint capacity (called velocity). They avoid taking on more work than could realistically be delivered and therefore avoid working over weekends and after hours. Scrum teams use burndown charts to visualize progress and deliver working software at the end of the sprint cycles. Through retrospectives, Scrum teams have time to reflect on their methods and apply steps towards continuous improvement.
  • Unevenness (Mura). Scrum teams typically pull work from a backlog and restrict the number of stories in progress to what can realistically be delivered in sprints. Backlogs are ordered lists of items ready for development. They are continuously updated by Product Owners so that high-value, high-visibility items are always on the top. Items that are in progress are typically not interrupted (except for emergencies). This means that Scrum teams can consistently deliver something at the end of their sprint.

8.3 eXtreme Programming (XP)

Kent Beck, Ron Jeffries, Ward Cunningham, and others were the key contributors to the development and evolution of eXtreme Programming (XP) in the late 1990s.

eXtreme Programming (XP) required planning on three levels: release, iteration, and daily. Development in XP was supported with Test-Driven Development and automated testing.
eXtreme Programming (XP) required planning on three levels: release, iteration, and daily. Development in XP was supported with Test-Driven Development and automated testing.

Extreme Programming (XP) and Scrum are both agile software development methodologies, but they have some differences in their approaches and focus. Here are a few critical differences between XP and Scrum:

  • Development Process. XP emphasizes engineering practices and specific development techniques like continuous integration, test-driven development, pair programming, and frequent releases. Scrum, on the other hand, is more focused on project management and provides a framework for organizing and managing work in iterations or sprints.
  • Roles and Responsibilities. XP does not prescribe specific roles but encourages a collaborative and cross-functional team approach. Scrum defines specific roles such as Scrum Master, Product Owner, and Development Team, each with its own responsibilities and areas of focus.
  • Planning and Estimation. XP places a strong emphasis on short, frequent planning cycles and continuous feedback. Planning in XP is done at different levels, including release planning, iteration planning, and daily planning. Scrum utilizes a backlog of user stories, and planning is typically done at the beginning of each sprint. Scrum also employs techniques like relative estimation using story points.
  • Documentation. XP values working software over comprehensive documentation. It emphasizes verbal communication and face-to-face interaction between team members. Documentation is often minimal, with a preference for lightweight and easily maintainable artifacts. Scrum, while also valuing working software, allows for more detailed documentation and encourages the use of artifacts like product backlogs, sprint backlogs, and burndown charts.
  • Change Management. XP embraces change and is designed to accommodate and respond quickly to changing requirements. It encourages regular feedback and frequent customer involvement. Scrum also handles change by allowing the Product Owner to modify the product backlog and reprioritize work during the sprint, but changes are typically deferred until the next sprint.

Kent Beck relates in his 2015 talk at the Lean IT Summit the impression that The Toyota Way – 14 Management Principles From the World’s Greatest Manufacturer made on him. eXtreme Programming (XP) was intended to be the Lean equivalent of software development, heavily inspired by The Toyota Way, and once again, traces of one-piece flow can be distinguished upon close inspection.

  • Waste (muda) elimination. Because XP is mostly engineering techniques and best practices (Test-Driven Development, pair programming, and short releases), I believe it focuses heavily on eliminating non-value-adding tasks of the muda type, not so much the others (overburden and unevenness).
  • Multi-level Planning. eXtreme Programming (XP) envisages the planning of releases, iterations (or sprints), and daily cycles. This strategic, operational, and tactical planning approach (if we can call them this way) requires continuous and extensive management and organisation effort. Compared to Scrum, XP is more complex to implement. While it reduces additional technical tasks to the bare essentials, it introduces significant management overhead.
  • What can go wrong with TDD. Test-driven development (TDD) is a powerful technique for increasing software quality and, more importantly, relocating software quality ownership back to the developer. While TDD is powerful and effective, it comes at a cost. First, it increases the size of the code base, and second, the code that supports it requires constant maintenance.

8.4 DevOps and Continuous Delivery

Agile and DevOps are two related but distinct approaches in the software development and IT operations domains. Let’s start by looking at some key differences between them:

  • Focus, objectives, and Scope.
  • Agile methodologies, such as Scrum or Extreme Programming (XP), primarily focus on the iterative and incremental development of software. Agile emphasizes delivering working software in short iterations, continuous customer collaboration, and adapting to changing requirements.
  • On the other hand, DevOps focuses on the collaboration and integration between software development (Dev) and IT operations (Ops). DevOps aims to streamline and automate the software delivery and deployment processes, fostering continuous integration, continuous delivery (CI/CD), and rapid and reliable software releases.
  • Team Structure and Collaboration.
  • Agile methodologies promote cross-functional, self-organizing teams with members from different disciplines (e.g., developers, testers, business analysts). The team collaborates closely with the customer or product owner, and there is a strong emphasis on face-to-face communication and teamwork.
  • DevOps emphasizes breaking down silos between development and operations teams. It encourages collaboration, shared goals, and a holistic view of the software lifecycle. DevOps teams often include members with diverse skills, including developers, system administrators, quality assurance engineers, etc.
  • Processes and Practices.
  • Agile methodologies typically follow a prescribed set of practices, such as iterative development, user stories, frequent feedback, and continuous integration. Agile teams often use frameworks like Scrum or XP to structure their work and ceremonies like sprint planning, daily stand-ups, and retrospectives to ensure collaboration and continuous improvement.
  • DevOps focuses on automating and streamlining processes across the entire software delivery pipeline. It emphasizes practices like continuous integration, continuous delivery/deployment, infrastructure as code (IaC), configuration management, and automated testing. DevOps teams use tools and technologies to enable fast and reliable software releases.

What conclusions can therefore be made regarding DevOps processes? Here are some suggestions:

  • Focus on technology. DevOps primarily focuses on leveraging tools and automation to enhance delivery speed and software quality. In that sense, it does not radically alter how software processes are implemented but instead augments the team’s delivery capabilities. While teams using Waterfall might still leverage automation and monitoring tools, DevOps works best with Agile.
  • Waste elimination. The tools and infrastructure required to support CI/CD pipelines come at a cost, and now we must talk about lean pipelines if DevOps is to remain efficient. Like TDD, maintaining DevOps tools and infrastructure can overtake the manual effort to build, test, and deploy a feature. DevOps engineers, however, are becoming more familiar with the tools, and best practices have emerged from their experience, keeping cost at a lower threshold.
  • Development and Operations. The paradigm shift brought about by DevOps’ was in providing the framework and tools to allow development and operations, historically different teams with different skills and objectives, to fuse into one. This fusion can be justified on the basis that people who built the software are most suited to run it, provided operational costs are kept low. While the additional responsibility of running the software feels like an extra burden, it can also be viewed as an incentive for developers to build better software. The process of delivering software has changed only slightly, but the fundamental assumption of mass production (grouping people with similar skills and responsibilities) has been overturned. Also, DevOps engineers require different technical skills (system and database administration, networking, security, hardware) besides programming.

8.5 Kanban

In manufacturing, Kanban is a scheduling system that helps manage and control the production flow. The term “Kanban” is Japanese and translates to “signal” or “card.” It was initially developed by Taiichi Ohno and his team at Toyota as part of the Toyota Production System (TPS), also known as Just-In-Time (JIT) manufacturing.

The Kanban system uses visual cues, typically cards or containers, to represent the status and movement of items within the production process. These cues signal when to produce or replenish a particular item and how much to produce. The primary goal of Kanban is to minimize inventory levels, reduce waste, and optimize the production flow.

In software development, the Kanban methodology was adapted from manufacturing and applied as a framework for managing and improving the software development process. It emphasizes

  • Visualizing work. The Kanban system in software development provides teams with a clear visual representation of their work. The Kanban board holds columns denoting “To Do”, “In Progress”, and “Done” items.
  • Limiting work in progress (WIP). It allows teams to focus on completing work items before starting new ones, which improves efficiency and reduces multitasking.

It is helpful to compare Kanban and Scrum to understand how each is different. Here are the main differences between them:

  • Workflow management and timeboxing. Kanban focuses on visualizing and optimizing the flow of work items through the development process. It provides a flexible, continuous flow of work with no predefined iterations. As we have seen earlier, Scrum divides the work into fixed-length iterations called sprints, where the team plans and commits to a set of user stories or tasks for each sprint.
  • Planning and commitments. Kanban does not have a specific planning phase or commitment to a fixed set of work items for a given period, making it challenging to manage stakeholders. Instead, the team pulls work items from a backlog based on available capacity and prioritization. During sprint planning in Scrum, the team commits to delivering a specific set of user stories or tasks within the sprint duration.
  • Roles and responsibilities. Scrum has well-defined roles, including the Product Owner, Scrum Master, and Development Team. The Product Owner prioritizes the backlog, the Scrum Master facilitates the Scrum process, and the Development Team is responsible for delivering the product increment. In Kanban, no specific roles are defined, and in most cases, a Scrumban type of approach is used rather than pure Kanban.

Kanban is probably best described as a technique for managing work-in-progress and visualizing work rather than a software development methodology. Its essential contribution is getting Agile teams to switch from a push to a pull system without necessarily implementing Scrum or other Agile methods.

9. Over-optimisation, Resilience, and Fragility

Management overly focusing on industrial engineering techniques of process optimisation often lead to problems. Severe optimization of production processes leads to fragility due to several factors:

  • Lack of redundancy. Over-optimized systems are brittle and fragile, breaking catastrophically when the load exceeds a (previously unknown, epiphenomenal) threshold. While optimisation can increase productivity in the short term, it often results in a lack of redundancy. Redundancy means having backup systems, resources, or processes to handle unexpected failures. When a highly optimized system encounters a problem or bottleneck, it may lack the flexibility to adapt or recover, leading to a breakdown in the entire production chain. Lack of redundancy also means creating single points of failure.
  • Inability to respond to change. Severe optimization prioritises efficiency over adaptability– a vital characteristic for systems trying to survive in dynamic and hostile environments. A highly optimized process is designed to operate within specific parameters, and any deviation from those parameters can lead the system to break. Over-optimized systems also reduce variation in methods through excessive standardisation, reducing the diversity of opinions and ideas. Moreover, it prevents the system from discovering new ways of doing things through serendipity.
  • Silo creation. Traditional mass-production thinking requires people to be grouped based on skills, roles, or responsibilities, allowing cultural and knowledge silos to emerge. While silos allow for specialisation and growth, it also creates communication barriers between teams. Cross-functional teams, while less efficient in resource utilisation, are more adapted to quick decision-making and faster progress.

Software is a fast-paced, highly-evolving field, and adaptability is key. While structured approaches are desirable, excessive constraints on a complex system that is naturally uncontainable make it fragile, brittle, and unable to respond to change.

10. Process Performance: When Metrics Become Objectives

It is not uncommon for managers and teams to confuse the metrics they are using to measure performance with performance itself. Under these conditions, the metric and the objective become disassociated, and an improvement in the former might even be at the expense of the latter. Let’s see how.

  • Narrow focus. When metrics become objectives, there is a risk of becoming narrowly focused on optimizing those metrics at the expense of other essential factors. Organizations start neglecting aspects that are difficult to measure but are critical for long-term success, such as customer satisfaction, employee well-being, or ethical considerations.
  • Gaming the system. When the focus is solely on KPIs, people are incentivised to manipulate or “game” the metrics to make them appear favourable. This can lead to behaviours that optimize the metrics superficially but do not truly improve the underlying processes or outcomes. For example, if standardized test scores become the sole objective in the education sector, teachers may teach to the test rather than provide a comprehensive education.
  • Unintended consequences. Any intervention in a complex system will lead to unintended consequences, and focusing on metrics instead of nudging the system in a certain direction will create unintended team dynamics. The result is undesirable behaviour not captured by the metric. For example, a company solely focusing on maximizing short-term profits may sacrifice long-term sustainability or damage its reputation.

Metrics can be compared to body temperature; lowering body temperature with paracetamol during a fever is good, but it doesn’t mean the infection is gone.

11. Conclusion

We started this article with a generic definition of processes and why it’s vital to have a structured approach to doing things, especially in large groups and complex situations. We then presented some ideas from Toyota, which, although very specific to manufacturing, have heavily influenced software methods like Scrum and eXtreme Programming (XP).

The conclusions that one can derive after such an examination are as follows:

  • Human systems are not machines. Discussing processes and performance must include employee motivation, skills, knowledge, compliance, commitment, and team structure. Human systems are complex and cannot be reduced to mechanical ones without significant loss of explanatory powers.
  • The manufacturing model. Although many fundamental assumptions of Lean, Scrum, and Six Sigma are rooted in manufacturing, especially Toyota, software development processes have taken a different form. This can be seen through new roles (Product Owner, Scrum Master, DevOps engineers), new tools and methods (iterations, user stories, backlogs), and new structures (cross-functional, self-organising teams).
  • Scaling Agile processes. While examining Kanban, Scrum, DevOps, and eXtreme Programming, we noticed that none of these methodologies is sufficient or universal and must be combined with others to form a viable methodology. Scaling Agile is not through Scrum of Scrums or SAFe but by decomposing Agile methodologies into their lowest coherent units and recombining them in novel ways. The Hybrid model is an excellent example of combining Waterfall with Agile techniques.

As Taiichi Ohno emphasized, success cannot come from analyzing reports and second-hand information but by observing a process in the Gemba through an experienced eye. We hope this (long) guide on software development processes has brought both managers and developers intimate knowledge of what’s required to set up the right processes for development teams.

12. References

Nigel Thurlow – Have we forgotten our key denominator: FLOW? – Lean and Agile Summit 2022
  • Scrum talk at Google by Jeff Sutherland
Jeff Sutherland discussing Scrum at a Google talk

Leave a Reply

Your email address will not be published. Required fields are marked *