Notes

Cloud Native Transformation

Study notes are captured below for personal use. Please refer to https://learning.oreilly.com/library/view/cloud-native-transformation/9781492048893/ or https://www.amazon.com/Cloud-Native-Transformation-Practical-Innovation/dp/1492048909 to purchase this book.

1. What Is Cloud Native?

Though the terms are often confused, cloud computing and cloud native are two entirely separate entities.

Cloud native is an architecture for assembling all of the cloud-based components in a way that is optimized for the cloud environment. It’s not about the servers, but the services. So cloud native is also an organizational destination: the current goal for enterprises looking to modernize their infrastructure and process, and even organizational culture, carefully choosing the cloud technologies that best fit their specific case (at least, the goal for now—eventually, even quite soon, cloud native will be replaced by another paradigm that once again completely changes our way of doing things).

For enterprises ready to undertake their own cloud migration, staying on track means focusing on the architecture: understanding and prioritizing design before jumping into full-on implementation and deployment.

We, however, believe cloud native is actually about adopting five architectural principles (which is hard) plus two cultural ones (which is even harder):

Containerization
Dynamic management
Microservices
Automation
Orchestration

The two cultural principles are:

Delegation
Dynamic strategy

Ultimately, cloud native is about how we create and deliver, not where.

Cloud native works well with fast, modern software delivery methods like continuous delivery to provide faster time to value; it scales horizontally and effortlessly; and it can be very efficient to operate. Cloud native architecture also allows us to create complex systems by dividing them into smaller components built by independent teams. This differs from traditional monolithic application complexity, which is limited by the ability of the developers and architects to fully understand it even as it chains them together to deliver in unison.

Most importantly, cloud native can help reduce risk in a new way: going fast but small, limiting the blast radius in case changes ever go wrong, and rolling them back instantly if they do. So how and where do we start building?

The five (technical) principles, constructed in the proper order, are all essential supports in a cloud native architecture. One, however, may be even more important than all the others: microservices. Microservices occupy a central role among the five principles. In order to get microservices right, you must have a mature approach to all four of the other principles. At the same time, containers, dynamic management, automation, and orchestration are truly powerful only when combined with microservices architecture.

The cloud now seems to be disrupting every industry it touches, and companies are understandably eager to migrate operations and embrace cloud native as fast as they can.

What can possibly go wrong? Well, rather a lot of things, actually. But the reasons behind cloud native transformations that go wrong tend to fall into one of three main categories:

Difficulties due to the complexity of distributed systems
The relative immaturity of the cloud native ecosystem with its Wild West landscape of tools and platforms
The failure to adapt and evolve organizational culture to keep pace with changing technologies and delivery expectations

To fulfill these expectations and keep the users happy, we have evolved distributed systems to manage the inevitable fluctuations and failures of the complex, behind-the-scenes services required to run it all. Many of cloud native’s superpowers are granted by the sheer merit of its distributed-systems architecture.

All these benefits, though, come with a significant side effect: complexity. When distributed systems become complex, then design, construction, and debugging all get much, much harder.

Because you believed your vendor that their platform was a full solution, you haven’t allocated people or budget or time to build or buy the missing or incomplete pieces. And it is highly unlikely that you have people inside your company who can handle assembling all of it anyway.

In short, to get the best from the cloud, use cloud native architecture: microservices, containers, orchestration, automation. Since the first three may introduce new problems, automation should be among the very first things put in place—you want the sanitation system laid down before you build your city in the clouds.

Cloud native is a powerful, promising (and, yes, much-hyped) technology. Enterprises are understandably eager to get there as fast as they can. But reaping the full benefit of the cloud means first looking past the hype to build a solid foundation—one based upon the principles of cloud native architecture. And, meanwhile, evolving your organization’s culture, its structure and processes, to an equally new way of working.

2. The Human Challenge of Cloud Native

Cloud native is more than just a technology or tool set. It is a philosophical approach for building applications that take full advantage of cloud computing. This new paradigm requires not only embracing new technology but also a new way of working.

We call it a cloud native transformation because, in order to make a successful, effective, and above all worthwhile migration to the cloud, the entire organization—not just the tech stack—must change.

Quite simply, your organization’s culture needs to change along with the technology. If it doesn’t, you are very likely about to waste a lot of time and money on an ultimately unproductive transformation.

Cloud native organizations are built to take optimum advantage of functioning in cloud technologies; the cloud of course will continue to evolve and will look quite different in the future, but we also build to anticipate this. Applications are built and deployed in a rapid cadence by small, dedicated feature teams made up of developers who also know how to build in networking, security, and all other necessities so all parts of the distributed system become part of the application. Meanwhile a platform team sits off to one side, designing and assembling the platform the devs will deploy to. This highly automated platform uses an orchestrator like Kubernetes to manage the complexity.

However, taking maximum advantage of cloud technology means moving to approaches like minimum viable product (MVP) development, multivariate testing, abandoning specialist teams for DevOps, and embracing not just rapid iteration but rapid delivery/deployment (i.e., continuous integration and continuous delivery, even continuous deployment).

Cloud native architecture centers on microservices architecture (i.e., de-composing applications into small, loosely coupled services that operate completely independently from one another). Architecture then influences process: each of these services maps to smaller, independent development teams that release iterative software updates as soon as they are ready.

So, then, what matters is not culture itself, but the type of culture. So when a “right” solution (or, sometimes, a mishmash combination of solutions) gets applied in the “wrong” culture, the solution and the culture conflict, undermine, and ultimately gridlock each other.

Trying to do cloud native by using methods from your previous paradigm will wreck your initiative. This is in fact the root cause of most of the problems we find when we get called into a company to rescue a stalled cloud native transformation.

Technology itself doesn’t deliver velocity, though it can certainly help. Changing your culture—the way you work every day—is how an organization gains true velocity.

So you can see how in some ways the cultural differences between Waterfall and Agile are actually compatibilities, rather than true differences! Working in cloud native, however, requires a completely new and different culture.

The most common culture problem we find when we are called in to help save an attempted cloud native migration gone wrong is where a company has tried to add a single element of cloud native while not changing anything else.

There is nothing to be gained in simply re-creating a monolith on the cloud—yet companies try do it all the time.

Knowing your culture means being able to choose the path that best fits your organization…or not choosing some hot new tech that, however promising, conflicts with your organization’s fundamental nature.

3. What’s the Pattern? Architecture, Pattern Languages, and Design

Cloud native can be deceptively easy to implement at first. Anyone with a credit card can log onto a public cloud provider and have an initial instance up and running in a matter of hours.

This unfortunately creates the hopeful illusion that full migration to cloud native will be equally easy: just install a few tools and go live! However, implementing full-scale enterprise cloud native is in fact very difficult due to the complexity of distributed systems, which increases exponentially with scale.

This next step, after the easy first experiment, is in fact where cloud native migrations usually go wrong. This is a very new technology, and there simply is not enough knowledge within most organizations to successfully navigate cloud native’s complexities.

The most enduring and valuable thing we can take from a transformation is the ability to change and adapt to new circumstances, whatever they are.

This is how we de-risk change: by building continual evolution and learning into the architecture itself. If you do it all the time, suddenly change is no longer scary.

A design that ignores context will almost certainly be a painful one to deliver and difficult to live with—if it works at all.

What kind of contexts should we consider when making software design choices? There are a lot! For example:

The existing skills of your teams
The time frame and goals of your project
The internal political situation (how much buy-in exists for the project)
Budgets Legacy products and tools
Existing infrastructure
Emotional or commercial tie-in to vendors or products
Ongoing maintenance preference

This is why it is vitally important that, before selecting which cloud native patterns to implement, enterprises first understand the current context for their organizational needs and aspirations as well as identifying the proper target context.

It is important to note that not all cloud native contexts are concerned with technology. Migrations are not just about software; psychological and social forces within the organization heavily influence success or failure as well. The context of an organization’s management process, team structure, and internal culture must be assessed before assigning patterns.

No matter the business model, fast innovation has become essential for survival in a marketplace that is becoming increasingly global yet customized.

Simply put, a high level of customized services, delivered with little or no downtime, is what customers now expect. This means that, for a business, velocity, time to market, and ease of innovation are more important than cost saving.

Given that these enterprises, no matter what their core business, all originate from similar circumstances and are driven by similar needs, it is not surprising that they face similar difficulties:

Decisions are typically made according to existing practices. This is appropriate for stable tech ecosystems like virtual machines. In cloud native, though, currently there are few established practices.
In a traditional hierarchy, the top managers decide, but they don’t fully understand the complexity of building and maintaining distributed systems—and therefore allocate insufficient resources.
For the next three to five years, cloud native will still require a lot of investment due to the technology’s relative immaturity.
Dealing with immature tech requires more experimentation and research than straightforward project management typically allows.
Large enterprises are optimized to preserve the status quo and embrace change slowly and reluctantly, while cloud native requires quick changes and the ability to work in an ambiguous environment.
There is not enough cloud native knowledge in enterprises, or indeed in the current tech sector overall, to support effective widespread migration. Companies don’t even know how much they do not know.

4. Beyond Patterns: Behavior, Biases, and Managing Evolution

Unfortunately, it does happen. The cloud native equivalent occurs distressingly often when companies attempt to transition to the cloud. We have observed over and over that, ironically, these migrations can contain a great deal of internal resistance to the very changes being sought. As a result, companies rebuild their operations and infrastructure in the cloud but then start trying to build cloud native applications exactly the same way they’ve built software for the past 10 or even 20 years.

These human-centered changes are the areas we most often see causing the greatest problems for a company undertaking a cloud native migration.

Going cloud native requires giving up those old familiar ways no matter how successful they have been for us previously. For many organizations, this means no longer operating as a hierarchy while they construct monolithic applications.

This is a transformational change due to the distributed nature of cloud native, which is based on an architecture of small, loosely coupled modular components, and it leads directly to more decentralized organizations.

Trouble arises when the brain perceives problems to be simpler than they actually are. System 1 thinks, “I can handle this!”—even though it actually can’t—and we end up making a bad decision or reaching an erroneous conclusion.

Even more importantly, however, is that transforming into a cloud native organization requires getting comfortable with uncertainty and change.

For example, ambiguity provokes anxiety, which in turn leads us down a well-worn path to many different biases. Knowing this, in the cloud native context we can counteract with an abundance of information and small, manageable experiments to build certainty and familiarity while reducing anxiety.

We have identified the 24 cognitive biases that, in our experience, most commonly show up during cloud migration projects. Most of these biases fall into the category of decision-making, belief, and behavioral biases, with a couple of social biases thrown in. We have also included any related nudges that can flip a bias from being a problem to being a force for positive change.

Ambiguity effect: The tendency to avoid options for which missing information makes the probability of the outcome seem “unknown.” An example of ambiguity effect is that most people would choose a regular paycheck over the unknown payoff of a business venture.

Authority bias: The tendency to attribute greater accuracy to the opinion of an authority figure (unrelated to its content) and be more influenced by that opinion.

Availability heuristic: The tendency to overestimate the likelihood of events with greater “availability” in memory, which can be influenced by how recent the memories are or how unusual or emotionally charged they may be.

Bandwagon effect: The tendency to do (or believe) things because many other people do (or believe) the same thing. Related to groupthink and herd behavior.

Bystander effect: The tendency to think that others will act in an emergency situation.

Confirmation bias: The tendency to test hypotheses exclusively through direct single testing, instead of testing multiple hypotheses for possible alternatives.

Congruence bias: The tendency to test hypotheses exclusively through direct single testing, instead of testing multiple hypotheses for possible alternatives.

Curse of knowledge : When better informed people find it extremely difficult to think about problems from the perspective of less well-informed people.

Default effect: When given a choice between several options, the tendency is to favor the default one.

Dunning-Kruger effect: The tendency for unskilled individuals to overestimate their own knowledge/ability, and for experts to underestimate their own knowledge/ability.

Hostile attribution bias: The “hostile attribution bias” is the tendency to interpret others’ behaviors as having hostile intent, even when the behavior is ambiguous or benign.

IKEA effect : The tendency for people to place a disproportionately high value on objects that they partially assembled themselves, such as furniture from IKEA, regardless of the quality of the end result.

Illusion of control : The tendency to overestimate one’s degree of influence over other external events.

Information bias: The tendency to seek information even when it cannot affect action.

Irrational escalation (also known as sunk-cost fallacy): The phenomenon where people justify increased investment in a decision, based on the cumulative prior investment, despite new evidence suggesting that the decision was probably wrong.

Law of the instrument: An over-reliance on a familiar tool or methods, ignoring or under-valuing alternative approaches. “If all you have is a hammer, everything looks like a nail.”

Ostrich effect: Ignoring an obvious (negative) situation : Ignoring an obvious (negative) situation.

Parkinson’s law of triviality (“bikeshedding”): The tendency to give disproportionate weight to trivial issues. Also known as bikeshedding, this bias explains why an organization may avoid specialized or complex subjects, such as the design of a nuclear reactor, and instead focus on something easy to grasp or rewarding to the average participant, such as the design of a bike shed next to the reactor.

Planning fallacy: The tendency to underestimate task completion times. Closely related to the well-traveled road effect, or underestimation of the duration taken to traverse oft-traveled routes and overestimation of the duration taken to traverse less familiar routes.

Pro innovation bias: The tendency to have an excessive optimism toward an invention or innovation’s usefulness throughout society, while failing to recognize its limitations and weaknesses.

Pseudocertainty effect: The tendency to make risk-averse choices if the expected outcome is positive, but make risk-seeking choices to avoid negative outcomes.

Shared information bias: The tendency for group members to spend more time and energy discussing information that all members are already familiar with (i.e., shared information), and less time and energy discussing information that only some members are aware of.

Status quo bias: The tendency to like things to stay relatively the same.

Zero-risk bias : Preference for reducing a small risk to zero over a greater reduction in a larger risk.

A few of these biases are particularly hazardous to cloud native transformations. In particular we see the status quo bias operating in many client migration initiatives, especially in long-established companies.

5. Knowing Thyself: The Cloud Native Maturity Matrix Tool

So how do you go about launching a successful migration, patterns and all?The first crucial step, as we saw in Chapter 2, is Know Thyself. This means truly understanding your company’s existing architecture, processes, and organizational culture.

Cloud service providers are no help in this area. They don’t even acknowledge its existence. Why would they? Their business model centers upon getting you to sign up and use their systems, not analyzing your own existing one.

However, the complexity inherent within cloud native’s distributed systems architecture is relentlessly exponential. An “unexamined” organization will inevitably reach a point where its existing systems and culture will clash with—and, ultimately, short-circuit—its transition attempt.

A Cloud Native Maturity Matrix, a unique framework for evaluating and understanding your company where it is right now.

In other words, the Maturity Matrix is how we create the custom map for each company’s unique migratory path to the cloud. And it’s also how we monitor the process to remain on track.

The nine different areas on the Maturity Matrix and how to identify your organization’s status in each one.

Culture: The way individuals in your organization interact with one another
Product/Service Design: How decisions are made within your organization about what work to do next
Team: How responsibilities, communication, and collaboration works across and between teams in your organization
Process: How your organization handles the execution of work and assigned projects
Architecture: Describes the overall structure of your technology system
Maintenance and Operations: How software is deployed and then run in a production environment in your organization
Delivery: How and when software from your development teams gets to run in your live (production) environment
Provisioning: The processes by which you create or update your systems in your live production environment
Infrastructure: The physical servers or instances that your production environment consists of—what they are, where they are, and how they are managed

But the Cloud Native Maturity Matrix does not end with a successful migration onto the cloud! As we’ve discussed, cloud native is not only focused on what to do now—it is just as much about building in the ability to easily adapt to whatever comes next.

The Maturity Matrix begins with Culture because it is the toughest transition axis to progress—no matter the organization. Culture is abstract, hard to transform, and evolving it is a slow process. The other axes are faster and easier to achieve because, ultimately, they are mainly code and planning. Changing culture also requires a lot of buy-in across the entire organization, while the other axes can generally function in a more independent way.

We predict the next type of organization will be a Generative one. An extension of a collaborative organization, in a generative organization IT will co-create solutions as equal partners with the business.

The final say on which features stay in a product is based on data collected from real users. Potential new features are chosen based on client requests or designs by product owners without a long selection process. They are rapidly prototyped and then developed and delivered to users with copious monitoring and instrumentation. They are assessed against the previous features (better or worse?) based on A/B or multivariate testing. If the new feature performs better, it stays; if worse, it is switched off or improved.

Traditionally, developers/engineers have been responsible for building software and then handing it off to the operations team for deployment. A DevOps team joins the two in a single team capable of designing and building applications as part of a distributed system, and also operating the production platform/tools. Across the organization, each team has full responsibility for delivering an individual set of microservices and supporting them. DevOps teams typically include planning, architecture, testing, dev, and operational capabilities.

Design Thinking and other research and experimentation techniques are used for de-risking large and complex projects. Many proofs of concept (PoCs) are developed to compare options. Kanban is often then used to clarify the project further, and finally Agile methods like Scrum can be applied once the project is well understood by the entire team. Highly proficient organizations might choose to follow the Lean model.

Microservices architecture is highly distributed. It comprises a large number (usually more than 10) of independent services that communicate only via well-defined, versioned APIs. Often, each microservice is developed and maintained by one team. Each microservice can be deployed independently, and each has a separate code repository. Hence, each microservice team can work and deploy in a highly parallel fashion, using their own preferred languages and operational tools and datastores.

In full observability and self-healing scenarios, the system relies upon logging, tracing, alerting, and metrics to continually collect information about all the running services in a system.

Continuous delivery describes an organization that ensures new functionality is released to production at high frequency, often several times per day.

Applications in production are managed by a combination of containerization (a type of packaging that guarantees applications are delivered from development with all their local operational dependencies included) and a commercially available or open source orchestrator such as Kubernetes.

Here, individual machines don’t matter: they are called “cattle” because there is a big herd and they are interchangeable. There is usually full automation of environment creation and maintenance. If any piece of infrastructure fails, you don’t care—it can be easily and almost instantly recreated.

The cloud native approach empowers enterprises to design their product exclusively around the user, with no concern for the needs of the underlying system. This lets them deliver better products with less risk, which is the true heart of cloud native. That they also can now deliver them faster and cheaper is a pleasant corollary outcome.

https://info.container-solutions.com/cloud-maturity-matrix

6. Tools for Understanding and Using Cloud Native Patterns

https://landscape.cncf.io/ Cloud Native Interactive Landscape

https://www.cncf.io/ Cloud Native Computing Foundation

Why is WealthGrid having such a difficult time building a new cloud native system, even when they are putting tons of money and people into the project?The short answer: cloud native is new, complex, and requires a very different way of thinking. Alas, this is apparent only once you have already moved a significant distance down the migration.

In both scenarios there is no actual adoption strategy, and the decision to move ahead is made based on lack of understanding coupled with the misbelief that the process will be fairly simple and straightforward.

Most traditional companies prize proficiency, the ability to complete well-defined and predictable tasks with maximum efficiency. This is how you deliver maximum value in a relatively stable context with few, if any, unknowns or surprises. In this context, creativity and innovation are viewed with skepticism. Innovation introduces unknowns into a highly regimented system—and with unknowns come risk.

They are really good at what they do, but in the process of becoming exactly that good at exactly that thing, they forgot how to work any other way.

Unfortunately, when businesses evolve from a scrappy new startup to become a proficient and as-algorithmic-as-possible operation, they often forget how to be a startup—that is, how to re-enter the mystery state over and over in order to research and introduce new ideas to their business.

So, yes, you need both proficiency and creativity. One is not better than the other, or more important than the other—it’s the balance that is important. You need both, but not at the same time. This is because they both need to be managed differently.

Proficient teams require high repetition to deliver the same thing, over and over, very efficiently and reliably, and at the highest quality possible. High repetition, high feedback, small set of very specific rules. The emphasis is on skills and repetition.

Creative teams, on the other hand, have no specific list of tasks. Their work requires open-ended thinking that is more like puzzle solving. This doesn’t mean that creativity equals chaos: there is still a guiding purpose behind it, and tools to use. To effectively nurture innovation there must be a goal and the strong support and safety of a space that allows open-ended experimentation. Autonomy is crucial: once the goal is established, let the team find solutions in whatever way they can discover.

Both types of teams are just teams, composed however your organization’s team structure works. It’s their jobs that are different: the proficient teams are your bottom-line delivery workers, the creative teams are focused on research and next steps.

It is difficult to achieve this, but possible. Striking the balance requires maintaining the separation of proficient and creative teams while closely coordinating between them. Different styles of management are required for each, and a designated champion (or two, or even more) is needed to act as a kind of translator to manage whenever there are handoffs between their respective efforts.

Once you understand the differences between creativity and proficiency, and the relationship between them, we use the Three Horizons model to understand how to blend and balance investment in developing new products within a company, while still delivering efficiently and reliably.

H1: Horizon 1 represents your current core business—presumably, the things that provide your cash flow and main profits. This also includes logical next-step development of/iteration upon any product or service you are making right now.

H2: Horizon 2 is investment in innovation: taking ideas that have been shown to work in the H3 incubator and productizing them for real customer use.

H3: Horizon 3 is research. Pure exploration of new ideas, research projects, pilot programs.

An adaptive business constantly evaluates and recalibrates the relationship between its three horizons, pursuing the optimal balance between proficiency and creativity. To manage this across an organization, it’s important to have people called “champions,” who understand those different horizons and move the technology across those three horizons.

The champion is the person who keeps a firm fix on the bottom line while also pushing the likely next step—and keeping an eye on whatever crazy future thing could be coming next.

Dedicating 5% of a company’s resources to research may not sound like a lot, but it’s crucial. That 5% is where you are gaining knowledge and retaining your ability to be creative when necessary. Where this goes wrong, where companies end up in trouble, is when they are not thinking—there is no strategy—and they try to move straight from research to delivery.

For example, in a cloud native transition, a company’s engineers might try to adopt a microservices architecture when they know nothing about it, have no background, and so end up approaching it all wrong. They researched enough about microservices to recognize that this is the right thing to do, but they are rushing to get things working, which means they skip the middle stage and try to squeeze it directly into delivery. Skipping H2, which is the pragmatic development phase for creating heuristics around delivering new ideas, might sound like a way to speed things up. But instead it’s a recipe for failure, since they haven’t taken time to understand how it works and fits together. An innovation champion’s job is to prevent just this sort of short-sighted corner cutting.

Cloud native is a new way of doing things. It is not predictable, at least not to people who lack a good understanding of how it works. Most of the people at WealthGrid did not have this knowledge. Probably no one, not even Jenny, truly understood the full intricacies of cloud native architecture; certainly, no one at WealthGrid had any experience actually building a cloud native system. Lacking this understanding and experience, they of course used what they knew, the tools and techniques at hand. They didn’t know what they didn’t know.

This was the third crisis point. WealthGrid was still committed to becoming a fully cloud native company. But it also needed a way to continue delivering value to customers while it worked to find the right path—the middle path between proficiency and creativity—to finally deliver the long-delayed new platform.

Digital transformation, ultimately, requires a balance between innovation and pragmatism, between creativity and proficiency. Some companies attempt to innovate but do so by trying to deliver creativity using proficient processes—that is, long-held practices and beliefs that worked well for them historically but don’t work with cloud native architecture. This leads to failure, or at best a low-functioning, improperly implemented attempt.

Others go all in on innovation, attempt to abandon the old system completely to build a new one from scratch, and still get lost. Many times these companies are trying to be like Google, one of the most creative (not to mention fully cloud native) organizations around. The common misbelief is that being like Google means being all in on creativity—let’s say something like 98% creative. Google’s real focus, however, is very much on proficiently delivering their existing products and services while investing very intentionally in small but targeted and highly impactful creative initiatives.

The point is, they do have a balance, and it is what works for them. The real problem for most established and successful companies is that they have no idea how to be creative at all, in an effective way, at any number.

Proficiency is important. Creativity is important. Neither is better, and both are necessary. Proficient teams need to be managed in a way that supports their focused, stable and efficient delivery of bottom-line core business value for the company. Creative teams are managed for open-ended exploration of next steps, so the company stays innovative and ready to take responsive and adaptive next steps whenever needed.

7. Patterns for Strategy and Risk Reduction

What is a chapter about strategy and business risk reduction doing in a book about cloud native patterns?

The biggest risk factor enterprises now face is not being able to respond fast enough to a changing environment.

Risk reduction today is the ability to respond to sudden or unexpected changes in market conditions when you don’t have much notice, in time to meet or beat the competition. And you achieve this ability through strategy.

In this chapter we will introduce patterns that specifically shape and drive overall strategy in a cloud native organization. And, better yet, how to use them to reduce risk and build for long-term success, both during a transformation and then on into the future. We will examine patterns for:

Dynamic Strategy

Context: Not responding quickly enough to market changes or new information may lead the company to continue building products according to an old strategy that is no longer fully relevant. The original strategy could be realized in its totality, but in the meanwhile competitors could come up with better products, technology could change, and much better opportunities could be missed.

Ultimately, the company may end up with exactly what was planned in the beginning of the project—only to find that this is not what they actually need when they finally go to market.

Therefore: Continually re-evaluate circumstances as the initiative moves forward.

Consequently: The executive leaders are aware when the environment changes and adjust strategic goals to keep the company heading in the right direction.

Today’s tech-driven marketplace, no matter what business you are in, is the ultimate uncertain environment. If you have a strategy, it needs to change all the time. Dynamic Strategy is the transformation pattern that teaches us to observe how the world is changing and to continually evolve and adjust strategy to match. And this responsibility sits squarely with the company’s executive leadership.

Since decisions regarding strategy now need to be made quickly and since the power—and responsibility for making all other kinds of decisions—is being distributed to middle management and execution teams, it’s essential to share a consistent set of values and priorities that guide decision making across the organization.

Value Hierarchy

In This Context: Without a clear understanding of the company values and the priorities, people have no easy way to connect their daily work to the company strategy. In such a situation, different teams may make conflicting decisions or waste a lot of effort on low-priority tasks.

Therefore: Create an ordered list of clearly stated values to simplify decision making and guide behavior in an uncertain environment.

Consequently: Teams and individuals in an organization are able to make decisions with confidence.

Business Case

In this Context: Cloud native transformations are a big commitment, requiring significant investment of budget, time, and team talent. Too many organizations, though, get caught up in the hype of the cloud conversation and make decisions without understanding how exactly a transformation fits their business needs and goals. The risk is especially high for organizations that have already established rapid and significant internal momentum toward making this move.

Therefore: Create a formal business case to help educate the organization’s executive team, to evaluate how the transformation will serve the company’s goals, and to create a clear vision for where the organization is headed.

Consequently: The business case for a cloud native transformation is clear. The company’s decision makers have a clear understanding of the initiative and the advantages it will confer when complete. They are ready to move forward.

Executive Commitment

In this Context: Cloud native transformations require significant changes in all areas of an organization, from infrastructure to processes to culture. These changes place large demands on the organization’s budget and time.

Therefore: Announce the cloud native transformation as a high-priority strategic initiative that is important to the company and that has explicit support from the executive management.

Consequently: The company is aligned around common goals and everyone understands priorities for the transformation.

Common Pitfalls: The focus is on technical changes only, without including the organizational changes that are also essential for a cloud native transformation to succeed. Or the initiative gets treated like just another tech/infrastructure upgrade, instead of a true paradigm shift. In such cases Executive Commitment exists, only for a wrong scope of the initiative.

Establishing executive commitment to the initiative is essential, but you’re still going to need a designated hands-on person to lead it. That person is the transformation champion.

Transformation Champion

In this Context: Successful established enterprises focus on proficient delivery of their core products or services and often forget how to be innovative. When a disruptive competitor appears, it is difficult for them to respond quickly and effectively. There are always a few people within the organization who see the future better than others. An even smaller subset of these are willing and able to take organized action, but many organizations ignore them and waste the opportunity to encourage healthy leadership. Without such motivational leaders, the initiative often falls flat and keeps going only after management exerts some bureaucratic pressure to push it forward.

Therefore: Recognize the person (or group) who has triggered the movement and name them transformation champion. Authorize them as designated advocate for the initiative. Name a different person to this role only if there is a very compelling reason to do so.

Consequently: The transition has a focal point for organizing the transformation initiative and a champion in charge of driving it forward. The transformation champion is connected with both the proficient and innovative branches of the transformation and can act as a bridge between them.

Vision First

In this Context: The company needs to define a clear and achievable vision that can be translated into specific executable steps.

Therefore: Define and visualize the organizational structure and architecture of the whole system upfront.

Consequently: All teams have a clear guiding principle for the implementation phase.

Objective Setting

In this Context: There is commitment to and a vision for transformation, but concrete steps for getting there still need to be defined.

Therefore: Executives need to hand over the high-level strategy to middle managers to translate it into specific and tangible objectives for their teams. Keep redefining the strategy and the objectives based on known information, not guesswork.

Consequently: The initial strategy is continually improved, adjusted, and translated into clear and tangible objectives. The relevant teams in the company know what they need to achieve and are constantly providing new information to upper management.

Involve the Business

In this Context: When developers are running quick iterations without involving customer-facing people, the value could be limited to tech solutions only. Business people, however, can’t run full-tech experiments.

Therefore: Create close collaboration between dev teams and the business to define experiments for testing new customer value and quickly executing them.

Consequently: Your products (or services) can change quickly in response to actual customer needs and desires.

Related Biases: The technical people make decisions because they think they know what the business needs. Not only is this not their area of expertise, but tech teams are generally inward-facing. They are not engaged in the kinds of customer-facing roles that would grant them the perspective to also deeply understand business needs.

Periodic Checkups

In this Context: Teams focused solely on execution without pausing to assess and reappraise the direction they’re going in might achieve what they originally planned—but not what they ultimately needed, because circumstances changed along the way.

Therefore: Make sure that you and your team are still on the right path. Assess the current situation with regard to initial strategic decisions.

Consequently: The Core Team meets regularly to assess current conditions and can adjust direction as circumstances require.

Data-Driven Decision Making

In this Context: Managers make decisions based on their expectations from previous experience, which might not apply in the new and unknown environment of a cloud native system.

Therefore: Make product decisions based on data collected from actual users (observability, measure what matters).

Consequently: The team can quickly make decisions based on objective measurements.

Learning Loop

In this Context: Learning happens in a three-part cycle: goal-setting, execution, and reflection. The first stage is identifying a challenge or problem and devising a likely solution. The second stage is carrying out the plan until it succeeds or fails. The third is studying the result—thinking back over what happened and how it worked out. In a very long delivery process this cycle is of limited use when lessons learned can be applied only months later, when the information is not fresh in the developer’s minds or perhaps no longer relevant.

Therefore: Build mechanisms for collecting user feedback and feeding it rapidly back into the delivery cycle, enabling responses to flow back from the customer so the business can make better-informed decisions.

Consequently: Apply Data-Driven Decision Making to cloud native’s rapid delivery cycle so that the output of the system continually goes back in to improve the system (you can go fast without breaking things).

Learning Organization

In this Context: Organizations migrating from Waterfall or Agile paradigms to cloud native don’t typically have the skill set for working in a highly uncertain and ambiguous environment: open-mindedness, a willingness to experiment and tolerate risk, and above all the ability to enter into the transformation process without a detailed map.

Therefore: Take an honest look at your current culture. Build in the willingness to accept ambiguity and risk as part of your daily organizational process.

Consequently: Teams are co-creating solutions and challenging each other with Productive Feedback as they experiment their way toward the right answers.

Measure What Matters

In this Context: People tend to optimize their work output based on what is measured. Incorrect measurements will result in flawed deliveries (delivering the wrong things) and suboptimal performance.

Therefore: Always adjust performance measurements to fit the organization’s strategic and tactical needs. Keep measuring the most important KPI and stop when specific behavior becomes routine. Only measure a few KPIs at a time, choosing ones related to the current worst bottlenecks. Prioritizing customer value as the main metric helps to focus on customer needs.

Consequently: Managers set up KPIs in conjunction with goals and adjust them as the goals are changing.

Research Through Action

In this Context: In a new or unfamiliar environment, people can analyze things too much and fail to make progress: analysis paralysis.

Therefore: Run small experiments instead of full analysis and research; choose action over extensive contemplation and exhaustive research.

Consequently: You are making minor yet tangible progress through taking small, iterative steps.

Gradually Raising the Stakes

In this Context: Making major decisions before having enough information to understand the parameters carries a great deal of risk. However, in the uncertain environment of an early cloud native transformation when there is not yet much knowledge or a clear path, grabbing right away for a “big bang” solution is very tempting.

Therefore: Avoid making big decisions early; do a series of small projects, always growing slowly in size, until you have enough information to make a big bet.

Consequently: The project has been gradually refined/decided without taking disproportionately high risks, and appropriate budget and resources have been allocated to each stage based on its level of uncertainty.

No Regret Moves

In this Context: Lacking adequate information, the team has no practical way to make an educated decision—and essentially will have to gamble on a semi-random solution and hope for the best.

Therefore: Take first-stage risk-reduction actions that are quick, low-cost, and benefit the company no matter what.

Consequently: The organization has gained self-awareness and knowledge without investing huge amounts of time or money. Risk has been incrementally lowered, and the company’s leaders are ready to take the next step in setting the transformation path.

Options and Hedges

In this Context: Your research has given you a better understanding of what is going on, but major decisions are still not obvious. Commitment to a large solution at this point still carries serious high risk of choosing the wrong solution, while running additional tiny experiments that uncover no new information is just a waste of time.

Therefore: Make small tactical decisions aimed at creating and understanding a new path forward. They can be rolled back or forward, ramped up or down, and will at least eliminate some options while you create new plans.

Consequently: You have uncovered the majority of the important information required and are reasonably certain where you are going next.

Big Bet

In this Context: Continuing research and experimentation without ever making any big decision leads to significant waste of resources as the teams are not focused on solving the problem and the direction is not chosen yet. It means that there is no clear alignment across teams regarding a solution, and no stable and focused delivery process has been established.

Therefore: Make a commitment to a large-scale solution, like a large rebuild, architectural change, migration, purchase of new products, etc., bearing in mind that it might require organizational change.

Consequently: There is full commitment to the chosen direction. It is clear to everyone that this is a commitment moment: at this time we stop experimenting and move forward. Unless there is a significant change in market or strategy conditions, teams stay committed to the chosen path.

Reduce Cost of Experimentation

In this Context: There are significant barriers to experimentation in the organization: permission is required, and the related planning, documentation, and coordination meetings take a lot of time. Then actually getting the results afterward typically requires a significant wait. As a result, engineers will often skip experimentation and move directly to execution.

Therefore: Put in place a simple, straightforward, and seamless process for doing experiments. When experimentation is central to an organization’s process and progress, it needs to be an inexpensive and easily accessible action.

Consequently: More experiments take place. Instead of extensive research and guessing when a complex problem arises, a rapid process of hypothesis/results/analysis provides the solution.

Exit Strategy Over Vendor Lock-in

In this Context: Committing to a single vendor (or simply a single large solution) creates reliance upon their ongoing stability and availability and pricing, but the cost of maintaining active alternative/backup options is prohibitive.

Therefore: Instead of blindly refusing to commit to a single vendor, explore the options for a second migration, if necessary, and what they would cost. Then make an educated decision based on the tradeoff between short-term gains from a vendor with the best tool and the long-term risk of migrating out of it if needed. Often lower costs and higher productivity outweigh the risk.

Consequently: The team can focus on getting the maximum performance out of and benefit from each tool, and they are aware of what it would cost to migrate to alternative solutions should the need arise.

Three Horizons

In this Context: In general, companies seldom keep the right balance between delivery and innovation. Enterprises tend to allocate almost all resources to Horizon 1, proficient delivery of core business product/service, which eventually leads to stagnation. Startups tend to overcommit to innovation, Horizon 2, which leads to poor product quality and lack of focus on delivery value to customers. At the very far end lies Horizon 3, researching ideas that are promising but will not lead to any practical solutions in the foreseeable future.

Therefore: Always allocate resources to delivery (current products or services), innovation (refining new products/services or significantly improving existing ones, relevant within 12–24 months), and research (long-term ideas and technologies). Champions are responsible for moving technology and knowledge across the teams.

Consequently: The company is always prepared for whatever the future brings while still delivering existing products frequently and at high quality.

Reflective Breaks

In this Context: When you run as fast as you can, you stop looking around to evaluate the situation and focus on only a single point—the finish line. Most modern delivery processes are designed to create stable pressure to help people focus on delivery. There is no planned and structured time set aside for periodic strategy reviews and for creative thinking.

Therefore: Build periodic planned “time-outs” into the business cycle across the entire organization.

Consequently: Teams are focused on execution but also have the regular opportunity to review and adjust on all levels of the company.

Designated Strategist

In this Context: Once your transformation strategy has been defined, the team will tend to enter full execution mode—and stop refactoring the goals. This leads to the achievement of the original goals, but with no ongoing evaluation as circumstances evolve/change. People under stress of delivery can’t look around and re-evaluate the situation since they are pressured to focus solely on the set goals. The problem is that you might arrive at the finish line only to find that the problem has changed completely while you weren’t paying attention and your original solution no longer applies.

Therefore: Free one of the experienced architects or managers to focus solely on the future and evaluate all the scheduled tasks based on long-term goals.

Consequently: Teams can focus mainly on delivery while the company still maintains a strategic perspective.

But banks are not just banks anymore; they are tech companies—or at least they need to act like one, if they want to stay in business.

The core business of a bank, after all, is to buy and sell money while taking a percentage in the middle, in the form of loan rates.

But he is also a product of the environment where he built that experience: the traditional hierarchy using Waterfall delivery methods. In that environment, the executives make all the decisions—not just strategy, but many execution details as well. They bring in both inside and outside experts and architects, create many reports and documentation, and in general take a long time to think things through. Once the plan is set, it is handed off to middle managers to oversee the plan’s execution, exactly as created with no diverging allowed, at the team level.

Now a strategic leader’s job is to create dynamic strategy. This means watching the company’s market, competitors, and other environmental factors both current and emergent in order to make comparatively quick and short-term decisions about how to respond and which direction to go next.

The patterns and material in this chapter, by contrast, won’t really change. They exist to teach you how to do these things. Together, they form a roadmap and a cognitive tool set for managing change no matter where, and no matter when—whether in the middle of your transformation, when a sudden new competitor appears, or a few years from now when the next paradigm shift arrives to replace cloud native. These patterns for strategy and risk management aren’t simply tools to make you ready for cloud native: they help make you ready for next.

8. Patterns for Organization and Culture

Attempting to deliver on a new cloud native system by following the old ways is, however, a disaster in the making: things will break down very quickly because a hierarchical organizational structure simply can’t keep up with a cloud native delivery approach.

Core Team

In this Context: Making an existing team or teams responsible for delivering the new cloud native system while still requiring them to work on their regular duties means they will have conflicting priorities—and struggle to deliver either of them at all, much less do it well.

Therefore: Create a single Core Team of five to eight engineers and architects to lead the transformation.

Consequently: The Core Team rapidly works through the most challenging parts of the transformation (identifying the best migration path, tools and technologies, and then implementing a minimum viable product version of a platform) and paves the way for the rest of the teams in the company toward successful cloud native adoption.

Build-Run Teams (“Cloud Native DevOps”)

In this Context: When development teams are responsible for building an application and supporting it in production, if they also try to build the build the platform to run it, the organization can end up with multiple irreconcilable platforms. This is unnecessary, expensive to operate (if even possible), and takes time away from teams that should be focusing on delivering features, not the platform they run on.

Therefore: Create teams that each have their own capability to build a distributed system with microservices managed by dynamic scheduling.

Consequently: There is strong separation of defined responsibilities: Build-Run teams handle the applications. The Platform Team is responsible for building and maintaining the operating platform.

Platform Team

In this Context: If there is no single team in charge of creating an official cloud native production platform for the transformation, each team responsible for different microservices will have to build its own platform.

Therefore: The Platform Team will handle the platform itself—everything between the cloud and the apps (or, to put this in terms of the current technological landscape, Kubernetes and below)—while developers are responsible only for building the applications themselves (again, in the current tech landscape, Kubernetes and above).

Consequently: Developers are able to focus on building individual application services, features, and functionality while the Platform Team takes care of running the platform itself. Developers are allowed to introduce custom tools but they will have to support them as part of their own application unless the tools are proven stable and/or requested by other development teams.

SRE Team

In this Context: Once a platform is built and in production, attention is often directed away from improving internal processes and runtime performance. This can cause degradation over time, reducing quality and performance.

Therefore: Create a team that is focused 50% on reliability and 50% on continuous improvement of internal processes and development practices.

Consequently: The runtime stability and quality is continuously increasing, and automation is also increasing.

Remote Teams

In this Context: In many organizations remote team members may rarely, or even never, meet face to face. That works as long as the problems being solved by those teams are reasonably well-defined and not very complex. In the complex world of cloud native, however, problems are often messy and difficult, and require a more open-ended and collaborative approach.

Without a strong aim for collaborative co-creation, the team’s ability to generate innovative solutions is typically limited to the creative abilities of individual team members working separately.

Therefore: Put programs in place to connect remote teams and bring them together in every way possible, both physically and virtually.

Consequently: Teams see each other regularly in person and in between stay engaged via multiple channels and practices that promote fluent communication. Ideas are created, validated, and implemented in groups instead of coming from individuals.

Co-Located Teams

In this Context: When team members are located in different places, they tend to communicate less and focus on their jobs rather than personal relationships. This hobbles team problem-solving, because individuals will attempt to solve problems separately and then contribute that solution back to the team—rather than solving them collaboratively with teammates.

Therefore: All members of a given dev team will work in the same physical location and meet daily.

Consequently: High level of trust and proximity naturally increases collaboration.

Communicate Through Tribes

In this Context: In a changing cloud native world, with ownership for application services divided across teams, managers don’t know enough to provide effective advice, much less make good decisions. At the same time, managers have the illusion of knowledge and control—they don’t know what they don’t know—so the team’s abilities and effectiveness are only as great as its manager’s capability.

Therefore: Create domain-specific tribes that operate outside of normal management hierarchy.

Consequently: The company has groups that cross-cut traditional organizational units. This helps those people who are closest to and most knowledgeable in a particular domain subject identify areas for running experiments and making changes.

Manage for Creativity

In this Context: Teams charged with identifying and building promising future products are often managed using the same known methodologies that are popular in the enterprises. One of the most common, Scrum, helps clarify what’s going to be built and creates pressure to build it as fast as possible. Running endless numbers of strings without much reflection on the way drives most of the creativity out of the development project.

No inventor can tell when and what exactly she is going to invent.

Therefore: Manage the teams responsible for innovation by stating a purpose or desired outcome, which gives the team a direction toward which they will be creating new ideas. The team will require time, funding, and other resources to support its work, safety to fail, and autonomy to explore. Team dynamics will be more important than deadlines and delivery management.

Consequently: Innovation thrives in the company, and the innovative teams are separated from the proficient teams.

Manage for Proficiency

In this Context: When the teams responsible for delivering stable and scalable products are given too much freedom to innovate and explore, they introduce new risks that harm product quality and increase the cost of development.In many cases the new cloud native platform is not yet ready or stable enough to accommodate all the new teams, while most of the teams have to maintain old systems and release new incremental updates. Allowing those teams to start innovating too early may come at a significant cost to productivity and product quality.

Therefore: Run the execution teams the way they have always been run. Focus on repeatability and optimize on quality and speed of delivery.

Consequently: Teams in charge of delivering the company’s profit-generating products/services in a proficient way are being managed to optimize this. Proficient and creative teams are loved and appreciated equally.

Strangle Monolithic Organization

In this Context: Migrating an existing company to the cloud can take years, and it happens very gradually.

Therefore: Move the teams from the legacy organizational structure to the new one gradually (Gradual Onboarding pattern). Restructure teams and change from hierarchy shortly before the new onboarding to the cloud native platform when it is fully ready.

Consequently: The old system keeps working as always while the new one is built, and teams are gradually moved over. Teams get restructured and retrained only when it is time for them to actually move. While you are on the old platform you keep delivering with excellence; then you move to the new one and deliver equally well there.

Gradual Onboarding

In this Context: Onboarding too many teams at once will stress the Core Team and reduce its ability to continue improving the platform. Educating people too early, however, will create anxiety in the teams and desire to start as soon as possible (and frustration if there is a long wait before the new system is available).

Therefore: Start small organizational changes early in the cloud native transformation. Prepare materials to onboard other teams and execute it slowly when teams are ready.

Continue onboarding the rest of the organization gradually over a period of 3 to 12 months.

Consequently: The Core Team can support teams as they onboard and improve the process as they go. The first few teams onboarded to the platform can help educate and support the teams onboarded later.

Design Thinking for Radical Innovation

In this Context: When faced with a problem, people typically spend only the minimum time required to find the first satisfactory solution, even if it’s not the best one or doesn’t involve all stakeholders. This leads to reasonable solutions, but misses the opportunity to find excellent ones.

Therefore: Take the basic first idea and run it through a series of divergent and convergent thinking exercises to rethink the idea’s boundaries and explore alternatives.

Consequently: Ideas are thoroughly explored. Cost of initial exploration is still low, as it requires little to no actual development (No Regret Moves).

Design thinking, explained https://mitsloan.mit.edu/ideas-made-to-matter/design-thinking-explained

Agile for New Development

In this Context: Teams are either endlessly researching and collecting information or, conversely, starting to deliver and optimize very early. In the first case, value is delivered to customers late, ends up being of poor quality, or never gets delivered at all. In the second case, the solutions are too simple and underdeveloped and miss the opportunity to solve customer problems.

Therefore: Run alternating iterations of research and development.

Consequently: Delivery and innovation are separate and in balance.

Lean for Optimization

In this Context: Innovation and evolution are inevitable in technology. However, there is little need to innovate when a proficient system is delivering stable value and maintenance cost is low. And we often see that, in an otherwise proficient system, the team continues to introduce new tools and solutions that constantly destabilize the product while needlessly consuming time and budget.

Therefore: Reduce work in progress, focus on optimizing delivery process, measure quality and speed of delivery, and aim to improve both.

Consequently: Delivery is fast and proficient. System is stable, and quality is consistently going up.

Internal Evangelism

In this Context: When there is little information about an ongoing cloud native transformation, people don’t automatically assume it’s a good idea. People don’t resist the transformation because they think it is the wrong thing to do—they resist because it is new and scary. Change creates anxiety, and most people just don’t know much about cloud native in general. Without clarity, people tend to fill in the gaps with imaginary negative information (negative attribution bias), and they may fear their jobs will be dramatically different—or even eliminated.

Therefore: Share positive, clear, and abundant information about the transformation to create acceptance, support, and even excitement across the company.

Consequently: The transformation is understood across the organization, and people feel motivated to join and support it. There is plenty of time and opportunity to mitigate any resistance originating through fear of uncertainty.

Ongoing Education

In this Context: People are joining the organization’s cloud native initiative without fully understanding the possibilities it offers or the wide variety of solutions available. New technology is introduced all the time that renders current tools and techniques out of date. When this happens, productivity suffers, and change slows down.

Therefore: Build and continuously run an education program about cloud native for everyone in the company, from basic education for newly onboarded or new joiners to continuous and more advanced trainings for more experienced engineers.

Consequently: Team knowledge is constantly refreshed and updated.

Exploratory Experiments

In this Context: Committing too early to a solution you don’t yet fully understand. Teams are likely to do one of three things: Choose a known solution that is not a good fit for the problem, because they are familiar with this solution; undertake a lengthy analysis of the solution that leads nowhere (analysis paralysis); or else jump on the first available solution (availability bias).

Therefore: Explore the problem space. Mitigate the risk by delaying critical decisions long enough to run a series of small scientific-style experiments to uncover missing information and evaluate alternatives.

Consequently: The team is granted time and given a process for experimenting with solutions when it encounters a complex problem.

Proof of Concept (PoC)

In this Context: Once some initial experiments have uncovered a likely transformation path, it is time to test it. Committing to it right now could, if it is wrong, cause large problems and expense as the initiative continues.You simply don’t know enough to make a large commitment at this point. Any full commitment right now carries massive risk because switching to an alternative later will be very difficult. Adopting a solution you don’t fully understand too early in the process compounds the risk, because you will continue to build further functionality on top of this solution.

Therefore: Build a basic functional prototype to demonstrate the viability of a solution. Define the questions that need answers before starting the PoC and stop the work once the questions are answered.

Consequently: Risk is reduced for the overall project in the early stages of the migration. Critical knowledge and experience are gained during the experimentation process.

MVP Platform

In this Context: Trying to add too many functions to the first release will make it very protracted and delay any release to production.In the traditional Waterfall approach a new platform will be used in production only if it’s fully finished and everything is ready. Building a fully featured cloud native platform may take up to three years, but since cloud native components work independently, the platform can be running apps in production even before it’s completely built out. Not using the platform or any of its features until it is 100% complete would be a lost opportunity.

Therefore: Define and build a system with minimal useful—and production-ready—functionality on a quick timeline. It’s important to release something quickly in order to start getting real feedback so you can then improve things based on that user response.

Consequently: The first MVP of the platform is ready and can be used by a small number of development teams to deliver a few apps to production. The Platform Team has a plan to expand the scale and functionality of the platform as it continues rolling it out to the rest of the organization.

Decide Closest to the Action

In this Context: Decision making via chain of command is not sustainable in cloud native. Using hierarchy to resolve conflicts and agree on decisions takes too long, and solutions are limited to the capabilities of the managers making that level of decision. Engineers might find a superior solution that never gets implemented because it takes too much time and effort to navigate the bureaucracy to get permission. So they will give up and just move on with whatever they have.

Therefore: Push the decision power as close as possible to any change as it is happening. It is typically best if the dev team itself makes the decisions.

Consequently: Executives delegate the power to create the vision and objectives to middle management, and middle managers delegate power over technical decisions to the execution teams.

Productive Feedback

In this Context: People are often blind to their own biases and live in their own bubble without realizing it.

During a cloud native transformation, teams that have always worked in a proficient way now are tasked with innovation. They have no experience being creative, so they will keep using past solutions to attempt to solve new problems.

Therefore: Create a safe environment and simplify ways for people to give feedback—positive, negative, even confrontational—in a constructive way.

Consequently: Productivity goes up because people can learn and improve their work and behavior, and because they feel seen and appreciated.

Psychological Safety

In this Context: In traditional enterprises that are mostly designed to support stability, people fear exposing themselves by asking a “stupid” question, suggesting a “crazy” idea, or giving difficult feedback to a teammate, let alone a manager. All such actions typically are dismissed and ridiculed or, even worse, punished.

Therefore: Create the shared value that no group member will ever be punished or humiliated for speaking up with ideas, questions, concerns, or mistakes.

Consequently: People can propose new methods or approaches knowing that their ideas will be treated respectfully.

Personalized Relationships for Co-Creation

In this Context: In uncertain environments, what worked in the past may not work here, so you need to invent rather than attempt to reuse existing solutions.

Therefore: In complex environments where there is no clear path forward, a strong team needs personalized relationships to collaborate on creative solutions. Creativity is not the goal—co-creation is the goal. Creativity is open-ended and may not lead to anything, but co-creation generates results.

Consequently: The team has established trust and a relationship that helps people share information effectively, which leads to co-creating new solutions.

Blameless Inquiry

In this Context: When no inquiry is done after a problem occurs or an experiment fails, the team doesn’t improve and is likely to keep making similar mistakes. In many organizations, fault-seeking occurs, and blame gets assigned to anyone involved with a problem. This leads to mediocre performance, since most innovative actions carry significant risk.

Therefore: Understand what went wrong by focusing on the problem instead of the people involved.

Consequently: People have the autonomy and the confidence to try and fail, and try again.

Links

On Pioneers, Settlers, Town Planners and Theft.

https://blog.gardeviance.org/2015/03/on-pioneers-settlers-town-planners-and.html

Design thinking, explained

https://mitsloan.mit.edu/ideas-made-to-matter/design-thinking-explained

9. Patterns for Development and Process

Cloud native processes are still being fleshed out because cloud native itself is still so emergent. This is not yet a beaten path so much as one that’s being actively created by the people walking it. What we do know, however, is that it’s critical to make sure that the foundation is right and the system architecture can support future growth, extension, and constant change.

The patterns in this chapter address how to approach designing, building, and delivering your business’s products or services in this new paradigm. This is where we look at the architecture and processes that support cloud native’s fast, dynamic, and responsive delivery model: microservices, continuous integration, and other process-oriented tools and methods that empower teams to be independent, proactive, and self-sufficient while delivering rapid, iterative changes on a daily basis.

Open Source Internal Projects

In this Context: When a project is strictly internal, there is a tendency to cut corners to save time. Meanwhile, the open source community is constantly coming up with new tools to solve business use cases in the cloud native world.

Therefore: All software that does not address company core business (“secret sauce”) can be open sourced from the start.

Consequently: If there is a gap in functionality, instead of building a new solution internally, use existing open source projects and contribute back to them. Alternatively, create your own open source solution and invite others to use, contribute to, and improve it.

Distributed Systems

In this Context: Once the system has grown beyond the capacity of a single architect/engineer to understand, it becomes difficult and time-consuming to add functionality. With the growth of the old software systems, more people join the team and the system constantly collects technical depth, which leads to fragility and unpredictable side effects that come with every change. This creates fear of adding new functionality and stagnates development.

Therefore: Build the software system as a number of independent components (microservices) running on different computers and communicating through APIs. Development, delivery, and scheduling of each component is completely independent, and any component can fail without affecting the others.

Consequently: Higher complexity through many decoupled components makes a more resilient and scalable system.

Automated Testing

In this Context: Humans are too slow and inconsistent to be a blocking factor in the pipeline for deployment to production.

Therefore: Automate all the testing required to take any product change to production.

Consequently: The team can trust that the delivery process will catch most issues and that changes will flow to production quickly.

Continuous Integration

In this Context: When a team of developers works on a set of features that integrates only when all features are finished, the integration process tends to be very complex. The codebase change is large, and in the meantime other devs have integrated separate large changes that can further complicate the integration. To increase productivity, devs often delay interim integration—which leads to a single “big bang” integration just prior to release. A minor bug or conflict that could have been easily caught in an interim integration can now end up delaying the entire release.

Therefore: All developers integrate their changes at least once per day.

Consequently: Integration is a nonevent. Products are always in a releasable state.

Reproducible Dev Environments

In this Context: Shared environments and databases are difficult to keep in good shape and create dependencies that lead to delays.

When developers can’t create their own test environments, they may avoid running proper tests before submitting the code or run them on shared environments that may affect the work of their teammates. This affects other developers by making interpretation more difficult.

Differences between development environments and the eventual production environment may lead to the introduction of bugs that happen only in production and are related to those differences.

In all of these scenarios, product quality and developer productivity suffer.

Therefore: Establish a fully automated and fast process to create development environments where devs can test-run their apps. Each developer should be able to have their own environment, or multiple environments, that resemble the eventual production environment.

Consequently: Each developer can run tests on their own without delays or disturbing the rest of the team.