Understanding Infrastructure as Code (and DevOps)

The rise of IaC as a model for teams aligns very naturally with the high adoption of teams using the cloud to deploy applications. The ability to provision infrastructure on-demand is essential but, moreover, it allows teams to define the service and infrastructure applications use within the same VCS (Version Control System) where their application code resides. Effectively, this allows the team to see the application not just as the source code but also be inclusive on the underlying support service which allow the application code to work.

Difference from Configuration as Code

Infrastructure as Code is a rather broad term. The modern definition is, as I stated above, more aligned with provisioning infrastructure from scripts on-demand. Configuration as Code is more aligned with on-premises deployments where infrastructure cannot be created on-demand. In these environments the fleet is static and so it is the configuration of the relevant members of the fleet which are important.

It is with Configuration as Code where you commonly see tools like Octopus Deploy (link), Ansible (link), Chef (link), Puppet (link), and others. This is not to say these tools CANNOT spin up infrastructure on-demand, they most certainly can, but it is not their main use case.

In both approaches, and with IaC in general, the central idea is that your application is not merely its code, but also the infrastructure (or configuration of certain infrastructure) which need to be persisted.

Why is it important?

Regardless of the flavor, IaC is vitally important to the modern application development team. Why? Well let me ask you a question: Where is the best place to test an application?

The only viable answer here is: Production. But wait, testing in production is risky and can cause all sorts of problems!! You are clearly crazy Jason!!.

Yes, but let me amend the answer: the best place to test is an environment that is Production-like. This answer is the reason we have things like Docker, Kubernetes, and IaC. We need the environments we develop in and run tests in to be as close to production as possible.

Now, I don’t mean that your developers should have an environment that has the performance specs or disaster recovery features of Production but, from a configuration standpoint, it should be identical. And above all, the team MUST HAVE the same configurations in development as they do in Production. That means, if developers are planning to deploy their application to .NET 5.0.301 on Windows Server 2019, ideally their development environments should be using .NET on Windows Server 2019 – or at least when the application is deployed for testing that environment should be running Windows Server 2019.

Mitigating Drift

The principal goal of placing infrastructure or configuration into VCS as code is to ensure consistency. This aids in ensuring a GUARANTEE that environments are configured in a way that is expected. There is nothing worse than having to find a flag or setting that someone (who is no longer around) applied three years ago when setting up a second server and trying to figure out why “it doesnt work on that machine”.

With proper IaC control, we ensure that EVERY configuration and service is under source controlled so we can quickly get an accurate understanding of the services involved in supporting an application and the configuration of those services. And the more consistent we are, the lower the chance that a difference in environments allows a bug to manifest in production which can not be duplicated in any other environment.

Production is still Production

All this being said, it is important to understand that Production is still Production. That means, there will be additional safeguards in place to ensure proper function in case of disaster and the specs are generally higher. The aim of IaC is NOT to say you should run a Premium App Service Plan at the cost of thousands of dollars per month in Development. The aim is to ensure you are aiming for the same target.

That said, one of other benefits of IaC is the ability to spin up ephemeral environments to perform testing with production style specs – this can include variants of chaos testing (link). This is something done, usually ahead of a production release. IaC is vital here as it allows the creation of said environment easily and guarantees an EXACT replica of production. Another alternative is blue/green deployments which conforms to the sort of shift right testing (link) that IaC enables.

Understanding Operating Models

As you begin the IaC journey it is important to have an understanding of the sort of operating models which go along with it. This helps you understand how changes to various parts of your infrastructure should be handled; this is often the toughest concept for those just getting started to grasp.

Shared vs Bespoke Infrastructure

In many cases, there might be infrastructure which is shared for all applications and then infrastructure which is bespoke for each application. This understanding and differentiation is core to selecting the right operating model. This understanding also underpins how IaC scripts are broken apart and how frequently each is fun. As an example, when adopting a Hub and Spoke deployment model, the code which builds the hub and the spoke connection points is run FAR less frequently than the code which builds the services applications in the spoke rely upon.

Once you understand this differentiation and separation you can choose the operating model. Typically there are considered to be three operating models:

  • ManualOps – this is where the IaC scripts in question are run manually, often by an operations team member. The scripts are provided either by the application team or by a central operations teams. This approach is commonly used when organizations and teams are just getting started with IaC and may not have the time or knowledge of how to work Infrastructure updates into automated pipelines
  • GitOps – coined by WeaveWorks (link) this model centralizes on kicking off infrastructure updates via operatins in Git, usually a merge action. While not essentially driven by a Continuous Integration (CI) process, it is the most common. The key to operating with this model is ensure ALL change to infrastructures are performed via an update to source control, thereby guaranteeing what is in source represents what is deployed.
  • NoOps – NoOps is a derivation of GitOps which emphasizes a lack of operations involvement per se. Instead of running scripts based on a Git operations or manual, it is ALWAYS run with each check in. Through this, application teams take over the ownership of their operations responsibilities. This is the quintessential model for teams operating in a DevOps centric culture.

Which operating model you select is impacted, again by the nature of the infrastructure being supported, but also your teams maturity. DevOps and IaC is a journey, it is not a destination. Not all teams progress (or need to progress) to the same destination.

Centralized Control for Decentralized Teams

In DevOps, and DevSecOps, the question is first, how to involve the necessary disciplines in the application development process such that no specific concern is omitted or delayed – security often gets the short end of the stick. I cannot tell you how many projects I have seen save their security audit for near the end of the project. Rarely does their audit not yield issues and, depending on the timeline and criticality, some organizations ignore the results and recommendations of these audits at their own peril.

I can recall a project for a healthcare client that I was party to years ago. The project did not go well and encountered many problems throughout its development. As a result, the security audit was pushed to the end of the project. When it happened, the auditer noted that the application did not encrypt sensitive data and was not compliant with many HIPAA regulations. The team took the feedback and concluded it would take 2-3 months to address the problems.

Given where the project was and the relationship with the client, we were told to deliver the application as is. The result was disastrous. The client ended up suing our company. The result was not disclosed. But it it just goes to show that security must be checked early and often.

How DevOps, and DevSecOps, approach this is a couple key ways:

  1. The use of the Liaison model (popularized by Google) in which the key areas of Infrastructure, Quality, Security, and Operations delegate a representative who is part time on projects to ensure teams have access to the resources and knowledge needed to carry out tasks.
  2. Creation of infrastructure resources is done through shared libraries which are “blessed” by teams to ensure that certain common features are created.

IaC can help teams shore up #2. Imagine if each time a team wanted to create a VM they had to use a specific module that would limit what OS images they could use, ensure certain ports were closed, and installed standard monitoring and security settings to the machine. This brings about consistency while still allowing teams to self service as needed. For operations, the image could require a tag for the created instances so operations can track them centrally. The possibilities are endless.

This is what is meant by “centralized control for decentralized teams”. Teams could even work with Infrastructure, Operations, and Security to make changes to these libraries in controlled ways. This lets the organizations maintain control over the decentralization necessary to allow teams to operate efficiently.

Using Modules with Terraform

Most IaC tools (if not all) support this modularization concept to some degree, Terraform (link) is no exception. The use of modules can ensure that the service that teams do deploy conform to certain specifications. Further, since Terraform modules are simply directories containing code files, they can easily be zipped and deployed to a “artifact” server (Azure Artifact or GitHub Packages to name a couple) where other teams can download the latest version or a specific version.

Let’s take a look at what a script that uses modules can look like. This is an example application that leverages Private Endpoint to ensure traffic from the Azure App Service to the Azure Blob Storage Container never leaves the VNet. Further, it uses an MSI (Managed Service Identity) with RBAC (Role Based Access Control) to grant specific rights on the target container to the Identity representing the App Service. This is a typical approach to building secure applications in Azure.

# create the resource group
resource azurerm_resource_group this {
name = "rg-secureapp2"
location = "eastus2"
}
# create random string generator
resource random_string this {
length = 4
special = false
upper = false
number = true
}
locals {
resource_base_name = "secureapp${random_string.this.result}"
allowed_ips = var.my_ip == null ? [] : [ var.my_ip ]
}
# create the private vnet
module vnet {
source = "./modules/virtualnetwork"
depends_on = [
azurerm_resource_group.this
]
network_name = "secureapp2"
resource_group_name = azurerm_resource_group.this.name
resource_group_location = azurerm_resource_group.this.location
address_space = [ "10.1.0.0/16" ]
subnets = {
storage_subnet = {
name = "storage"
address_prefix = "10.1.1.0/24",
allow_private_endpoint_policy = true
service_endpoints = [ "Microsoft.Storage" ]
}
apps_subnet = {
name = "apps"
address_prefix = "10.1.2.0/24"
delegations = {
appservice = {
name = "appservice-delegation"
service_delegations = {
webfarm = {
name = "Microsoft.Web/serverFarms"
actions = [
"Microsoft.Network/virtualNetworks/subnets/action"
]
}
}
}
}
}
}
}
# create storage account
module storage {
source = "./modules/storage"
depends_on = [
module.vnet
]
resource_group_name = azurerm_resource_group.this.name
resource_group_location = azurerm_resource_group.this.location
storage_account_name = local.resource_base_name
container_name = "pictures"
vnet_id = module.vnet.vnet_id
allowed_ips = local.allowed_ips
private_endpoints = {
pe = {
name = "pe-${local.resource_base_name}"
subnet_id = module.vnet.subnets["storage"]
subresource_names = [ "blob" ]
}
}
}
# create app service
module appservice {
source = "./modules/appservice"
depends_on = [
module.storage
]
resource_group_name = azurerm_resource_group.this.name
resource_group_location = azurerm_resource_group.this.location
appservice_name = local.resource_base_name
storage_account_endpoint = module.storage.container_endpoint
private_connections = {
pc = {
subnet_id = module.vnet.subnets["apps"]
}
}
}
# assign the identity to the storage account roles
resource azurerm_role_assignment this {
scope = module.storage.storage_account_container_id
role_definition_name = "Storage Blob Data Contributor"
principal_id = module.appservice.appservice_identity_id
depends_on = [
module.appservice,
module.storage
]
}
view raw main.tf hosted with ❤ by GitHub

For this particular script, the modules are all defined locally so, I am not downloading them from a central store but, doing so would be trivial. The modules of Terraform give the ability to also hide certain bits of logic from the callers. For example, there are a variety of rules which must be followed when setting up a Private Endpoint for an Azure Storage account (creation of DNS zone, usage of the correct Private IP, specific names which must be used) all of which can be encapsulated within the module.

There is even validation rules which can be written for Module Input parameters, again, allows Infrastructure or Security to enforce their core concerns on the teams using the modules. This is the power of IaC in large organizations. Its not an easy level to achieve but, achieving it helps team gain efficiencies which were, previously, difficult, if not impossible, to achieve.

Full source code is available here: https://github.com/jfarrell-examples/SecureAppTF

Lessons to Learn

DevOps is always an interesting conversation with clients. Many managers and organizations are looking for a path to get from Point A to Point B. Sadly, DevOps does not work that way. In many ways, as I often have to explain, its a journey, not a destination. The way DevOps is embraced will change from team to team, organization to organization, person to person.

One of the key mistakes I see clients happen upon is “too much, too soon”. Many elements of DevOps take a good amount of time and pivoting to get used to. Infrastructure as Code is one such element that can take an especially long time (this is not to imply that DevOps and IaC must go together. IaC is a component of DevOps, yes, but it can equally stand on its own).

It is important, with any transformation, to start small. I have seen organizations hoist upon their teams a sort of mandate to “codify everything” or “automate everything”. While good intentioned, this advice comes from a misunderstanding of DevOps as a “thing” rather than a culture.

Obviously, advice on DevOps transformations is out of scope for this post and is unique to each client situation. But, it is important to be committed to a long term plan. Embracing IaC (and DevOps) is not something that happens overnight and there are conversations that need to take place and, as with any new thing, you will need to address the political ramifications of changing the way work is done – be prepared for resistance.

Leave a comment