Polaris Blog

Polaris is a light-weight SDK to collect real-user measurements, derive real-time signals of performance, and set goal-based objectives.

State of DevOps 2023

The 2023 State of DevOps has been published by DORA (DevOps Research & Assessment) in collaboration with Google Cloud. At the heart of this year’s State of DevOps report was two topics that I want to hit on: establishing a healthy culture and reliability.

The State of DevOps report continues to explore research that drives organizations towards achieving organizational performance, team performance, and employee well-being. All three of these work in unison to drive the technical teams ability to delivery software. Delivering software effectively requires that organizations improve delivery frequency, the lead time for changes, change rate failure, and failed deployment recovery time.

The Four Key Performance Indicators

To understand how this annual report studies the effectiveness of software delivery, we need to break down each of the key performance indicators (KPI) that the report uses.

The first indicator of effective software delivery is delivery frequency, or sometimes referred to as deployment frequency. This is measured by how often new software is released. If our goal is to delivery software, then frequency is a primary indicator of how well we are achieving that goal. Software releases prioritize getting the latest code into the hands of the user.

The second indicator of effective software delivery is the lead time for changes. This is a measurement of the time between when the code is completed by the development team until the code is running in production. Working in the software consulting space for many years, I’ve learned that organizations can vary widely, from minutes to days and even weeks. This can be a reflection of the complexity of the software, the complexity of the infrastructure, QA & UAT requirements, compliance requirements, and more.

The third indicator of effective software delivery is change failure rate. This measures the percent of code changes that result in a failure. A failure can be any adverse impact on the system or users - though, we often think of failures as service disruptions or incidents.

The fourth indicator of effective software delivery is the mean time to recovery. This is the average time it takes to restore a service or system after a failure back to normal operations. This is where incident detection, response, and recovery is critical.

Software Delivery Performance

These four KPIs enable us to measure how well our organization is delivering software to the end-user. Delivering software is often key in our teams and organizations, and will impact how well the organization and team is performing, and the well-being of the employee. I think it’s important that we recognize this last point. For software delivery organizations, our employee well-being is greatly impacted by the effectiveness of our software delivery. According to the 2023 State of DevOps survey high performing teams “adapt to change, rely on each other, work efficiently, innovate, and collaborate”.

Many of us, myself included, have worked in organizations that lack effective software delivery that has led to a reduction in employee well-being, which is clearly felt by the team in the form of burnout, reduced productivity, and poor job satisfaction.

So, how do we avoid burnout, reduced productivity, and poor job satisfaction that can be attributed to ineffective software delivery in an organization? Let’s circle back to two of the key takeaways from the report: establishing a healthy culture and reliability.

Let’s talk about establishing a healthy culture. And, before we go and futher, let’s also call it out: defining, identifying, and measuring culture is hard. The report refers to the article A typology of organisational cultures (Westrum, R, 2004) that was published in 2004. While that might sound a bit outdated, the articles outlines what is termed Generative Culture. Generative culture is a type of culture within an organization that fosters creativity, collaboration, learning, and continuous improvement. In a generative culture, individuals and teams are encouraged to take risks, experiment with new ideas, and learn from both successes and failures. The report found that “organizations that have a generative culture, as defined by Westrum, continue to perform well.” I think that while “generative AI” is predictably the Time Magazine 2023 word of the year, we shouldn’t loose focus on generative culture. In a generative culture employees are encouraged to safely take risks and experiment with new ideas. This fits perfectly into the Site Reliability Engineering model of an error budget. While setting organization performance objectives we intentionally allow for errors - and part of that budget is set aside for experimentation, exploration, and pushing the boundaries and features of the products and software we are building.

Reliability

This brings me to my final point: reliability. Again, I think reliability is a critical component of establishing a healthy culture in an organization that is delivering software.

For a moment, think to yourself - in your team and organization what is the result of unreliable systems and software? Pause and reflect back on that question in the context of the previous points I’ve touched on in response to the 2023 State of DevOps report. Ok, what do you think?

I think that reliability is a cross-functional component of software delivery organization that impacts the effectiveness of software delivery, the well-being of employees, and the culture.

An organization where the reliable of software is the most important feature result in above-average and high organization performance, team performance, and employee well-being.

Don’t believe me? Check out the 2023 State of DevOps report that states:

The data shows that the effects of improving these practices follow a nonlinear path— that is, there might be times when performance improvements seem to stall as organizations build stronger capabilities. However, over time, staying committed to these practices still predicts good outcomes.

My Takeaway

The takeaway for me from the report is the reliability follows what we’ve observed as the “J-curve of SRE” in that implementing the best practices of Site Reliability Engineering results in some immediate positive impacts, however, the focus must be on the long-term success of the organization, the team, and the individual by staying committed to building a culture of reliability.

Measure Web App Performance

Sign up and get $25 in credits