Operational Excellence at Scale: The Challenges of Efficient Delivery Chains in Large IT Enterprises

I. Case Study: Company Z

Company Z is a (hypothetical) global supplier of drilling equipment in a niche industry. Its value proposition lies in its ability to offer enterprise clients end-to-end solutions. Over many years, Company Z became the uncontested provider of top-quality, cutting-edge drilling solutions refined by decades of experience in the field.

To further improve its value proposition, Company Z offers drilling services for its customers, shielding them from operational costs and the overhead of maintaining a specialist drilling team. Its clients, specializing in extracting oil or drilling wells, can then focus on their core businesses.

For a long time, business as usual in Company Z was stable. Its customer base grew steadily, and its number of employees followed suit. Although its engineering and production processes were state-of-the-art, combining optimal research methods with the most sophisticated tools, the performance of its project and delivery department steadily declined.

In fact, projects were constantly delayed, and customers and employees were equally frustrated by the cumbersome and slow progress. No projects were completed on time or budget, and management struggled to keep its strategic customers satisfied.

Achilles, the department’s head, veteran leader, and one of its earliest engineers, had long been concerned about its deteriorating performance. Finally, he decided it was time for action and gathered his top executives in a large auditorium. A single item was on the agenda: restructuring his department and its processes to meet the challenges of the present situation.

Interlude

The Delivery Manager’s Speech

Ladies and gentlemen, thank you for joining this conference on an urgent matter.

During the past five years, I have worked closely with most of you on three strategic initiatives (and countless small ones), all of which achieved suboptimal outcomes. Two of those strategic projects ran over their schedule and budget by at least 200%. The third had to be abandoned at an immense cost to the company. But that’s not what bothered me most. What bothered me most was our inability to understand the fundamental reasons for this failure and what to do about it.

In our retrospectives (the department is a firm believer in Agile at Scale and pioneered its implementation within its teams), the PMO squarely blamed the engineering teams, the engineering teams blamed the research team, and the latter blamed the customers and the engineering teams. The customers didn’t care much about the specifics; in their eyes, our entire delivery department was responsible.

Our continuous improvement attempts got nowhere. Every process reengineering exercise we did fixed one issue while creating another. The principal aim of any change was to make it harder for other teams to blame the team that made the changes, and not really to address the issue.

This situation must end immediately, and this is how it will happen. I am going to make a random selection from the audience to form the basis of a task force. The goal of this task force is to generate a number of alternative solutions to fix our delivery system. These solutions can be anything, but they must have the following properties:

  • Rational:
  • Proposals should be sound, relying on theoretical principles, practical wisdom, and common sense, and not on the latest fashionable trends.
  • Completeness:
  • They must address the problem end-to-end, leaving no open questions. They should cover organisational structure, processes, methodologies and principles.
  • Feasibility:
  • The solutions may only require additional resources within reasonable boundaries. Also, its implementation must not disrupt the business (significantly).

Achilles then selected seven people from the audience and nominated them as the official task force responsible for revamping the delivery system. The ultimate objective is to achieve Operational Excellence at the scale of the organisation and not just within individual teams.

II. The Task Force’s Plan

A. Finding the Right Problem-Solving Methodology

The task force met regularly during the following weeks. In its first meeting, it was agreed that Ingrid would be the best person at the helm. She had sound technical knowledge, solid expertise with the company’s products and services, excellent communication skills, and an in-depth understanding of the group’s culture. In fact, she had participated in two of the strategic projects that Achilles referred to, experiencing the frustrations first-hand.

Ingrid had a complex problem to solve, so she decided to start by setting some basic principles on which a more elaborate solution could be built. She queried the team, and the following principles were adopted.

ORGANISATIONAL RESTRUCTURES

Guiding Principles at Company Z

  • Any solution must look at redesigning the engineering and production so that the problems disappear, instead of eliminating the root causes individually.
  • The new solution must allow the different teams within the department to have the final word on how they organize their work, provided it fits into the overall solution.

The two guiding principles are based on the following concepts.

B. Solving Problems Through Design

Ingrid understood that the traditional (analytical) approach to problem-solving involved breaking a complex problem into its elementary components, understanding each element independently, and assigning the respective element to a specialist. Applying analytical problem-solving would violate her first guideline: that the system must be redesigned so that problems disappear rather than eliminating root causes individually.

On her way home, Infrid passed the Toyota salesroom in her neighbourhood and suddenly remembered the Toyota Sienna story she had read in The Toyota Way. The engineers back then grappled with a similar problem and solved it in a distinctive but brilliant way. She thought she could apply the same method to her situation.

Interlude

The Toyota Sienna Redesign

Before 2004, Toyota struggled to compete with other manufacturers in the North American minivan market. The Toyota Sienna minivan was designed for European and Japanese roads and performed poorly in North America. A complete redesign was required to give the Sienna its competitive edge.

Toyota’s managers and chief engineers could have relied on marketing, maintenance, or any other secondhand reports to find the faulty design components and address them. Instead, the chief engineer responsible for the design, Yuji Yokoya, requested to be allowed to drive the Toyota himself in Canada, the US, and Mexico. During his trips, he discovered a number of situations where the minivan performed poorly (high-wind areas, narrow streets, etc.) and redesigned the car to fit these conditions.

A more traditional method of dealing with such a problem would have been to assign it to the marketing or sales team, i.e. where the problem originated. These teams would then look at it as a marketing or sales problem and attempt to address it as such. An improved sales or marketing strategy would not have made the Sienna a better performer.

The traditional problem-solving method consists of finding the root-cause and removing it. A better approach is to redesign the system so that the problem goes away.

C. Autonomy and Self-Organization

Ingrid strongly believes in self-organisation in contrast to micromanaging from the top. Her justification comes from the following.

At the turn of the previous century, after the Industrial Revolution became a global phenomenon, industrialists realized their primary shortages lay in skilled workers. Before that, in the 1850s, labour consisted of repetitive tasks that people could easily do once they had been adequately trained.

As machines and processes became more complicated, the average employee had to grapple with situations that required advanced analytical thinking and technical skills. Still, during those times, business owners and senior managers could, given enough time and exposure, understand the intricate details of the employee’s job. This allowed them to instruct and supervise workers closely, and independent judgment on the latter’s behalf was not essential.

Fast-forward to the modern digital age, where specialization is so narrow that even direct line managers are often unable to assist their subordinates. In today’s modern organisation, employees enjoy a high degree of autonomy and are expected to exercise sound technical and business judgements in return. For a group to perform, it must first self-organize.

All that management can do in this case is encourage informal networks between silos, independent thinking among employees, and self-organisation within boundaries.

III. Preparing the Report

A. Problem Formulation

With these two principles firmly established, the task force moved to the most challenging task: formulating the problem. In fact, Achilles, the delivery manager, had already described the issue at hand in his famous speech: project delays, overruns, and unsatisfied customers. What Ingrid’s team had to figure out was the root cause(s) behind these facts.

B. The Fact-Finding Mission

As their next step, the task force decided to interview key people in the department and listen to their narratives. The goal of the interviews was to gather information on the following topics:

  • Product and services landscape:
  • What products are mission-critical, and what dependencies exist between them?
  • What engineering and production methodologies do the teams use?
  • What aspects of the environment in which these projects were run could have any bearing on the problem?
  • What considerations go into the planning, testing, and deployment of a product or service team?
  • Project management:
  • How are projects planned, efforts estimated, resources allocated, and timelines created?
  • How is progress tracked and monitored? What change management process was in place?
  • How are competing priorities resolved?

With these considerations in mind, Ingrid started her interviews. Soon enough, a wealth of data sat on her laptop, data that she and her team could begin to interpret and analyse, looking for insights.

C. Report Executive Summary

Essentially, Ingrid was looking for sources of friction, high cognitive load contexts, poor governance, and priority conflicts, all of which were plentiful.

She submitted her findings to the delivery manager in a concise report on February 28, 2023, covering the following three areas:

  • Section 1: Current State of Affairs
  • This section included a description of the product and services landscape and project management processes.
  • Section 2: Insights
  • This section contained insights gleaned from the data and covered four aspects: friction, high cognitive load contexts, priority conflicts, and project governance.
  • Section 3: Recommendations
  • The third contained a series of short-, medium-, and long-term recommendations to higher management.

IV. Current State of Affairs

The first section of the report was dedicated to understanding the product and services landscape and the release and project management processes.

A. Product Management

1. Product and Services Landscape

The report listed around 120 products of various natures and sizes. A dozen of these formed the backbone of the drilling systems, delivering 80% of the business value. The rest are auxiliary machines supporting the main ones. These included cooling, monitoring, transport, maintenance, sensing and data collection systems.

Company Z’s services portfolio included consulting, maintenance, and project management and execution. The organisation’s motto was “No project or client is too small.” This strategy served the business well for a long time, building trust and demonstrating reliability towards its customers.

2. Engineering Teams

The various engineering teams had very little in common. Each team had its processes for planning, development, quality assurance, operations, and support, which were formed and customized over many decades of work in the field.

Teams used variations of Total Quality Management (TQM) and TPS (The Toyota Production System), while others relied on production methods anchored on Six Sigma. Naturally, some were more advanced than others. The teams varied in size, culture, and organization, and different teams belonged to various divisions of the organization.

B. Project Management

1. Competing Projects

Ingrid already knew how projects were managed and was not surprised by the findings. Every unit ran internal and external projects. Internal projects covered product improvement, maintenance, and compliance, while external projects covered new products from a shared roadmap. A PMO office oversaw all external projects, while the teams managed internal ones.

2. Scheduling and Coordination

The diagram shows a feature development schedule with start and end dates for the different applications, A, B, C, and D.
The diagram shows a product development schedule with start and end dates for the different components, A, B, C, and D.

The diagram shows an example of the development schedule of components A, B, C, and D required for machine M. Notice how each team starts development at different times. For instance, while B kicks off in June, D is delayed until July. Furthermore, neither the start date nor the duration is fixed or predictable with enough accuracy, making it highly challenging for project managers to commit to an ETA.

3. Roadmap and Prioritization

Since multiple projects were constantly running in parallel, a prioritization committee was created that included major business stakeholders, project managers, and principal engineers. Business stakeholders and project managers continually pushed for commitment, while engineering teams equally insisted on better prioritization, more precise requirements, and a (practically) blank cheque for the development effort. Both sides showed little flexibility.

V. Analysis Outcome

After prolonged discussions, the task force agreed on four principal axes on which analysis would be directed. These were:

  • Friction between hierarchy levels and between teams
  • High cognitive load settings
  • Priority conflicts
  • Poor governance

A. Friction

Half the staff on Ingrid’s task force had been with Company Z for over ten years and knew the organisation and its people very well. Having rotated between different product teams and held various roles, they were pretty familiar with the peculiarities of the organisational culture and subcultures of each group. To illustrate some of the sources of friction, the report described three distinct groups as follows:

  • The R&D group exhibited a robust, technically oriented culture, with a minority of influential figures dominating the decision-making process. The team members shared a long history and had traditionally valued loyalty, dedication, and excellence in their work. This group was the hardest to deal with but produced outstanding quality deliveries.
  • One maintenance group showed passive interest in anything other than the day-to-day challenges. Engagement and motivation wavered constantly; their output fluctuated consistently. On the other hand, they cooperated very well with different teams, jumping in to help whenever needed. A hypothesis was put forward to explain the group’s behaviour and went as follows: since most of the team were contractors hired for 12-18 months, the group never had the chance to form a strong culture or prioritise long-term goals over short-term ones, predominantly that their contract renewals relied heavily on how well they delivered the tasks at hand.
  • A third engineering team looked after one of the less critical products. However, due to their product’s central role in the system, they were constantly pulled in various directions. Appreciating the high dependency that other teams had on them, they consistently used that as leverage when negotiations over priorities got tough. Overall, they prioritized actions that allowed them to avoid mistakes (and consequently blame) rather than swift and quality deliveries. The incentives and reward system that their management created reinforced this pattern of behaviour.
Conclusion

Report Summary on Friction

The report concluded that while internal team cohesion was good in most groups, intergroup feuds were alarmingly high, preventing the organisation from functioning as a well-oiled machine.

Instead, friction turned into heated debates, which in turn turned into conflicts. Most teams stayed on their own and avoided interdepartmental initiatives that required them to collaborate with other groups, hampering efforts for continuous improvement.

Furthermore, disagreements between the different levels of the hierarchy often manifested itself through passive interactions. Senior management created policies and guidelines that were subsequently ignored, either fully or partially by the lower echelons.

B. High Cognitive Load Settings

Ingrid divided the staff into two categories: the generalists and the specialists. The generalists included all staff whose jobs required an understanding of all products or services, working with all teams, and interacting with the upper and lower levels of the hierarchy. The specialists’ group included staff who focused on one product or engineering domain at a time and interfaced only with their department head.

Ingrid’s categorisation was as follows:

Generalists:

  • Senior managers
  • Project managers
  • Product managers
  • Safety and complicance officers
  • Sales teams

Specialists

  • Engineering
  • Research and development
  • Quality control engineers
  • Production managers
  • Technicians
Conclusion

Report Summary on High Cognitive Load Situations

The report concluded that generalists were in a disadvantageous situation.

First, they had to work with a far more considerable number of people and products than their specialist counterparts. Second, the number of moving parts they had to keep track of was remarkably high.

As such, the generalists felt the weight from the organisation’s scale more directly.

C. Priority Conflicts

The task force found that production development teams had to contend with competing priorities daily. The ultimate responsibility of what should be developed and when was decided internally within the development teams, with very little influence from business stakeholders or managers on how things progressed.

Furthermore, communication channels between development teams and outside stakeholders were poor and unreliable, effectively nullifying the latter’s ability to manage dependencies on their end. Non-existent intra-team communications exacerbated the situation further when dependencies between production jobs had to be coordinated and managed. As a result, delays were poorly managed and communicated; stakeholders were unable to mitigate the risks on time.

Conclusion

Report Summary on Conflicting Priorities

The report concluded that priority conflicts were a major source of miscommunication and apprehension between the teams.

The report found that a project manager was only aware of a delay if they were actively notified (which rarely happened) or when the due date has passed without a successfully delivery.

As a result, project management was reduced to constant and vigorous chasing game between project managers and developers and business stakeholders and project managers.

D. Project Governance

Project managers relied on various tools to track progress. Some used spreadsheets, others used dashboards, and the rest captured important data in slide decks. Business stakeholders were at a loss trying to digest the various forms of (sometimes conflicting) data. Either way, the data never kept up with reality. By the time the data was aggregated and shown to senior management, it was already outdated.

Project meetings were primarily ceremonial. The various teams provided updates in vague formats, which the project managers, who knew very little about the realities on the ground, could not assess or challenge. They had to take the teams’ input at face value and hope that issues would auto-resolve.

Conclusion

Report Summary on Project Governance

The report concluded that PMO faced serious challenges in effectively governing project schedules and budget. When information was received, it was outdated and obsolete, rendering the active PMO participation in decision-making, control, and coordination virtually impossible.

As a result, PMO was struggling to catchup with events, unable to exert much influence on project direction or expenses.

VI. Solution Outline and Recommendations

The task force compiled its report and presented it to Achilles and his key staff. Some people were surprised by the findings, but for many, the report only crystallized their intuitions. The four main principles on which the report revolved—friction, high cognitive load contexts, priority conflicts, and project governance—resonated with the majority. Everybody looked forward to the recommendations and solutions.

The report’s recommendations came as follows:

  • Sources of friction had to be dismantled without disrupting the business or causing significant anxiety among the staff. Restructuring efforts had to be implemented with the objective of achieving flow.
  • The cognitive load had to be reduced, and there was only one way to achieve that: reducing the context in which people had to operate and increasing specialization in teams and individuals. The cost of this approach would be increased siloing, and ways around it had to be found.
  • Priority conflicts had to be eliminated. Two ways can be suggested. The first is to allow teams to prioritize their work as they see fit, and conflicts would be resolved internally. The second solution is to mandate one and only one priority list managed jointly between external and internal stakeholders. Both alternatives had their pros and cons, and no clear winner was recognized.
  • Project management and governance were particularly challenging. Smarter (yet simpler) ways had to be found to track progress and avoid the fallacies of old-school planning methods. Lessons from mega-project management would need to be adapted and tested in the field.

VII. Final Words

The problem of scale, whether in manufacturing, software development, or services, is genuine and has made itself felt heavily in these industries. How does one coordinate the activities of hundreds and thousands of technicians or developers, each specialized in a narrow area of technology, applications, or business? How can managers “manage” in such settings when management is nothing but instant decision-making based on irremediably insufficient (but current) data? Despite what some experts say, no evident solutions exist.

While all the details of Company Z are hypothetical, its main storyline and principal details are inspired by actual events occurring daily with some tech company or another. The challenges outlined are only a token sampled from a much larger pool.

If there had to be one key takeaway from this story, it would be the following.

Critical for the success of any solution system is its context-specificity. This means that solutions must take into consideration A) the unique circumstances in which they were raised, B) the system of processes that allowed them to emerge, and C) the human behavioural patterns under which they flourished.

VIII. References

Leave a Reply

Your email address will not be published. Required fields are marked *