Test Series: Part 1 – Understanding Testing Strategies

One of the challenges with incorporating DevOps culture for teams is understanding that greater speed yields better quality. This is often foreign to teams, because conventional logic dictates that “the slow and steady win the race”. Yet, in every State of DevOps report (https://puppet.com/resources/report/state-of-devops-report/) since it began Puppet (https://puppet.com/) has consistently found that teams which move faster see higher quality that those that move slower – and this margin is not close and the gap continues to accelerate. Why is this? I shall explain.

The First Way: Enable Fast and Increasing Flow

DevOps principles (and Agile) were born out of Lean Management which is based on the Toyota Production System (https://en.wikipedia.org/wiki/Toyota_Production_System). Through these experience we identify The Three Ways, and the first of these specifically aims for teams to operating on increasingly smaller workloads. With this focus, we can enable more rapid QA and faster rollback as it far easier to diagnose a problem in one thing than in 10 things. Randy Shoup of Google observed:

“There is a non-linear relationship between the size of the change and the potential risk of integrating that change—when you go from a ten-line code change to a one-hundred-line code change, the risk of something going wrong is more than 10x higher, and so forth”

What this means is, the more changes we make the more difficult it is to diagnose and identify problems. And this relationship is non-linear meaning, this difficulty goes up exponentially as the size of our changes increase.

In more practical terms, it argues against concepts such as “release windows” and aims for a more continuous deployment model whereby smaller changes are constantly deployed and evaluated. The value here is, by operating on these smaller pieces we can more easily diagnose a problem and rollbacks become less of an “event”. Put more overtly, the aim is to make deployments “normal” instead of large events.

This notion is very hard for many organizations to accept and it often runs counter to how many IT departments operate. Many of these departments have long had problems with software quality and have devised release and operations plans to, they believe, minimize the risk of these quality issues. However, from the State of DevOps reports, this thinking is not backed up by evidence and tends to create larger problems. Successful high functioning teams are deploying constantly and moving fast. Speed is the key.

The secret to this speed with quality is the confidence created through a safety net. Creating a thorough safety net can even create enough confidence to let newest person on the team deploy to Production on Day 1 (this is the case at Etsy).

Creating the Safety Net

In the simplest terms, the safety net is the amalgamation of ALL your tests/scans running automatically with each commit. The trust and faith in these tests to catch problems before they reach production allows developers to move faster with confidence. It also being automated means it does not rely on a sole person (or group) and can scale with the team.

Ensuring the testing suite is effective is a product of having a solid understanding of the breakdown of testing types and adopting of the “Shift Left” mindset. For an illustration of testing breakdown, we can reference the tried and true “Testing Pyramind”:

As illustrated, unit tests comprise the vast majority of tests in the system. The speed of these tests is something that should be closely monitored as they are run the most often. Tips for ensuring speed:

  • Do NOT invoke external dependencies (database, network calls, disk, etc)
  • Focus on a UNIT, use Mocking libraries to fulfill dependencies
  • Adhere to the AAA model (Arrange, Act, Assert) and carefully examine tests for high complexity

Unit tests must be run frequently to be effective. In general, a minimum of three runs should occur with any change: Local run, run as part of PR validation, and a run when the code is merged to master. The speed is crucial to reduce, as much as possible, the amount of time developers have to wait for these tests.

At the next level we start considering “Integration tests”. These are tests which require a running instance of the application and thus need to follow a deploy action. Their inclusion of external dependencies makes them take longer to run, hence we decrease the frequency. There are two principle strategies I commonly see when executing these tests:

  1. Use of an Ephemeral “Integration” environment – in this strategy, we use Infrastructure as code to create a wholly new environment to run our Integration tests in – this has several advantages and disadvantages
    • Benefit – avoids “data pollution”. Data pollution occurs when data created as part of these tests can interfere with future test runs. A new environments guarantees a fresh starting point each time
    • Benefit – tests your IaC scripts more frequently. Part of the benefit in modern development is the ability to fully represent environments using technologies like Terraform, ARM, and others. These scripts, like the code itself, need exercising to ensure they continue to meet our needs.
    • Negative – creating ephemeral environments can elongate the cycle time for our release process. This may give us clues when one “thing” is more complex than it should be
  2. Execute against an existing environment. Most commonly, I recommend this to be the Development environment as it allows the testing to serve as a “gate” to enable further testing (QA and beyond)
    • Benefit – ensures that integration testing completes before QA examines the application
    • Negative – requires logic to avoid problems with data pollution.

What about Load Testing?

Load Testing is a form of integration testing with some nuance. We want to run them frequently but, their running must be in a context where our results are valid. Running them in, let us say, a QA environment is often not helpful since a QA server likely does not have the same specs as Production. Thus problems in QA with load may not be an issue in higher environments.

If you opt for the “ephemeral approach” you can conduct load testing as part of these integration tests – provided your ephemeral environment is configured to have horsepower similar to production.

If the second strategy is used, I often see Load Testing done for staging, which I disagree with – it is too far to the right. Instead, this should be done in QA ahead of (or as part of) the manual testing effort.

As you can see above in the pyramid, ideally these integration tests comprise about 20% of your tests. Typically though, this section is where the percentage will vary the most depending on the type of application you are building.

Finally we do our Manual Testing with UI testing

UI tests and/or acceptance testing comprises the smallest percentage (10%), mainly because the tests are so high level that they become brittle and excessive amounts will generate an increased need for maintenance. Further, testing here tends to be more subjective and strategic in nature, thus exploratory testing tends to yield more results and inform the introduction of more tactical tests at other levels.

QA is a strategic resource, not a tactical one

A core problem that is often seen within teams and organizations prior is how QA is seen and used. Very often, QA is a member of the team or some other department that code is “thrown over the wall to” as a last step in the process. This often leads to bottlenecks and can even create an adversarial relationship between Engineering and QA.

The truth is, the way QA is treated is not fair and nor is it sensible. I always ask teams “how often has QA been given 4 features to test at 445pm the day before the Sprint Demo?”. And each time, this is not an exception, it is consistent. And of course, QA finds issues and results in the whole team staying late or “living with bugs”. The major mistake that is made

The truth is, this creates a bottleneck with QA, a rather unfair one at that. How often has QA been asked to work long hours the day before the sprint ends after being given 5 features that “just finished and need testing”? This is not acceptable and underlines the misunderstanding organizations have for QA.

QA is not responsible for testing, per se, they are responsible for guiding testing and to ensure it is happening. Testing, ultimately, falls to developers as they are the closest to the code and have the best understanding of it. This is why automated testing (unit in particular) is so vital to the “safety net” concept. Getting developers to understand that testing and writing tests is their responsible is vital to adopting DevOps culture.

This is not to say QA does NO testing, they do. But it is more strategic in nature; aimed at exploratory testing and/or ensuring the completness of the testing approach. They also lead in the identification of issues and their subsequent triaging. Key to high function teams is, whenever an issue is found, the team should remediate it but also create a test which can prevent it from appearing in the future. As the old saying goes “any problem is allowed to happen once”.

Moving away from this relience on the QA department/individual can feel rash to teams that have become overly dependant on this idiom. But rest assured, the best way forward is to focus on automation to create and maintain a suitable safety net for teams.

Safety Nets Take Time

Even introducing 1000 unit tests tomorrow is not going to immediately give your developers the confidence to move faster. Showing that you can deploy 6x a day is not going to immediately see teams deploying 6x a day. Confidence is earned and, there is a saying, “you only notice something when it breaks”. DevOps is not a magic bullet or even a tool – it is a cultural shift, one that, when properly done, touchest every corner of the organization and every level, from the most junior developer to the CEO.

The culture implores participants to constantly challenge themselves and application to ensure that safety measures in place work correctly and complete. High functioning teams want to break their systems, notable Netflix will often break things in Production intentionally to ensure failsafes are working properly.

More canonically, if a test never breaks, how do we know it works at all? This is the reason behind the Red-Green-Refactor development methodology (https://www.codecademy.com/articles/tdd-red-green-refactor). I see a lot of teams simply write tests with the assumption that they work, without actually creating a false condition to test if they break.

But the effort is worth it to move faster and see higher quality. In addition, adopting this aspect of DevOps culture means teams can have higher confidence in their releases (even if they are not deploying all the time). This makes for decreased burn out and better morale/productivity. Plus, you get a full regression suite for free.

I plan to continue this series by diving more deeply into many of the concepts I covered here with unit testing likely being the next candidate.

One thought on “Test Series: Part 1 – Understanding Testing Strategies

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s