Serverless Proxy Pattern: Part 1

Today, I am going to introduce something I have been experimenting with over the last few weeks in my spare time. I really appreciate the benefits of serverless but, I have always felt there was so much more to it than simple Lambda functions and the like. So I wanted to explore how I could create entire systems that featured little to no code at all and still support complex functionality.

This is will be the first part in a multi-part series where I use CloudFormation to build a Thumbnail Creator / Image Analyzer app in AWS that leverages this new approach to using Serverless.

The API Gateway

One of the central pieces of this pattern, particularly for web apps is API Gateway (API Management on Azure) and its ability to integrate with backend services such as S3 and DynamoDB (Storage and Cosmos on Azure) allowing for pass through calls to these services to handle common operations.

In our example, we will proxy both S3 and DynamoDB with API Gateway to support both the reading of image data stored in DynamoDB and the storing of raw images in an S3 bucket. All created using Cloud Formation so that it can be stood up again and again as needed.

DevOps

To make deployments easier, I leverage CI/CD services via Azure DevOps, as it provides a superior experience to the CI/CD tooling offered by the AWS platform. Here I utilize the YAML based pipeline syntax for Builds and build the two Lambda functions in tandem and publish my Cloud Formation YAML template. A Release pipeline uploads my code artifacts to S3 (best place to source the Lambda binaries) and Create/Updates the stack represented by my Cloud Formation template.

Getting Started: Create our Role

In AWS, Role’s play a vital role in ensuring security for applications interacting with AWS services. Amazon recommends using Roles over credentials since it eliminates the need to keep passwords floating around and are, generally, more flexibile. Here is the starting point for the role we will use throughout:

As you can see, this Role features A LOT of policies. There is a solid case to be made that we would be better off splitting this role into smaller roles so we lessen the amount of damage that can be done if an attacker were to somehow gain access to a service with this role.

On the flip side, one of the advantages to serverless is a decreased attack surface for attackers in general. As a general rule, the less of “my” stuff in the wild, the less chance there is for an attack – I trust Amazon (and Microsoft) more with security than I do myself.

The other issue with this role definition is it is very open – for example:

  • The role is given access to ALL permissions for S3, XRay, Lambda, Logs, and CloudWatch for ANY resource

This is very bad since it means anyone can use this role to look at (or access) anything. Obviously, when we build applications we want to constrain them to only the things that they care about.  As a rule, we should find ourselves rarely, if ever, using *.

Be that as it may, I am starting this way to remove permission concerns from our plate as we develop this application. Towards the end, we will come back and update this permissions to only what we need and only on the resources that are part of our application. It is called out here as a warning to not use this in a production system.

Create the buckets

For our application we will need two buckets: One to handle raw images and one to handle the generated thumbnails of those images. Here is the YAML template for our bucket creation:

This being Cloud Formation, we will of course allowing the calling process to dictate what names we should use for our buckets. Keep in mind, if you decide to run this, bucket names MUST be unique globally so, you might have to get creative.

For the most part I assume this template is pretty self-explanatory, We are creating resources named RawBucket and ThumbnailBucket each of type AWS::S3::Bucket. We use the value passed in via the RawBucketName and ThumbnailBucketName parameters. I will point out that the names really are for display purposes, the resource names are the main block for configuration and what you will reference throughout the template (RawBucket and ThumbnailBucket in this case).

Where this might get a bit fuzzy is with the NotificationConfiguration section under RawBucket. If you are not aware, AWS allows you to configure notifications from S3 buckets when certain events happen. This was really the birth place of Serverless, responding quickly to these internal events. By taking this approach, we can build very complex systems without needing to write a lot of code ourselves, we just plug services together.

S3 Buckets support a number of Notification types including Lambda, Topic, and Queue each of which has valid use cases. One limitations to keep in mind with S3 Notifications is THEY ARE DELIVERED ONCE. This means, if you want to do fan out actions, you MUST use something that can do that for you – SNS is used most often for this case (TopicConfiguration). This is what I will be using since I will want to perform Image Analysis AND Thumbnail Creation via Lambda when a new Object is PUT into the raw bucket.
Docs: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-s3-bucket-notificationconfig.html

Looking at the above source you can see we reference the Topic (SNS) ImageUploadTopic and we send the notification ONLY for s3:ObjectCreated events.

Create the SNS Topic

Here is the YAML for the SNS Topic:

Here we define the subscriptions that will denote which services SNS will notify and how when a message is received.

We have not created these Lambda functions yet, we will do so in Part 2.

If you want to skip ahead here is the complete source I am using:
https://github.com/xximjasonxx/ThumbnailCreator/tree/release/version1

Part 2 here

 

 

Reflecting on a DevOps Transformation

At West Monroe Partners, one of my responsibilities is helping clients and teams  understand how better DevOps can facilitate greater value and productivity. Recently, I was given the opportunity to assist one of our teams in transforming their DevOps process and correct certain inefficiencies that were holding the teams back in maximizing their efforts.

I joined the team with the aim of assisting its overall development as well as addressing these inefficiencies such as they were.

Phase 0: Gain an Understanding

In any engagement, knowledge is paramount. We must be able to speak with our clients on the level and work to gain their trust so as to better understand their problems. At its heart, DevOps refers more to culture than any specific toolset and within each organization will always be different. This being the case, we must respect these difference and work to create a process that emphasizes what the client needs rather than sticking to a formulaic approach.

In this case, I was joining the team 10 months into the project and thus would have a  limited understanding of the architecture and client relationship. It was paramount that I relax and listen to gain this knowledge and allow for the accurate identification of areas that I could impact to provide maximum value. By doing this, I was aiming to establish a trust with the client rather than becoming a “talking head”. Trust with DevOps is crucial since without it, you cannot expect the organization to take your ideas seriously and thus act on them.

Thus, I spent a large portion of my initial time talking with the team leaders and sitting in on meetings, offering only limited input. This allowed me to gain a better understanding of the overall situation and to better assess the problems areas. Ultimately, I came to four goals that I felt our teams had to address to be put them in a position to be successful:

  • Reduce Dev build times from 30+ minutes to more manageable levels allowing developers to use Dev to validate their changes
  • Create better consistency by promoting a simpler build and release model for developers to make building applications simpler
  • Break apart the large mono repo whose size had become an hindrance on my development teams and clouded clients desired deployment startegy
  • Emphasize Infrastructure as Code to enable to quick creation of environments including those that would be ephemeral in nature

The build times were the main issue here as the teams had made the decision to combine the Dev and QA builds into a single serial pipeline. While this did aid in promotion, it bottlenecked the process by holding the Dev artifact until the QA artifact completed, the later had a much longer process.

Phase 0a: Establish What I CAN do

With my goals established, the next step was to talk our client through the process and articulating the perceived value each change would bring. Based on these conversations, I ruled out the later two goals. The end proposal that was accepted basically flowed as such:

  • Existing structure of individualized Dev/QA builds would be broken apart and only unit tests would be run to enable greater speed for developer validation. Client also agreed to drop signing, obfuscation, and whitesource scanning requirements from these builds
  • Segregated builds would then be merged back into a parallelized build that would create a “package” that could be delivered to the Azure DevOps release pipeline. This would also include a refactoring effort to reduce duplication and promote reusability

With that agreed to, it was time to get started.

Phase 1: Divide the builds (1wk – against active pipeline)

The first phase was accomplished fairly quickly with the main challenge being marking our tests as “unit” to ensure we still could maintain quality without running every test. The results from this were very positive:

  • Project 1
    • Average build time went from 35m for Dev to 17m
    • For the release build times remaining unchanged since speeding them up was not the objective
  • Project 2
    • Average build time went from 45m for Dev to 20m
    • For the release build times remaining unchanged since speeding them up was not the objective

As you can see, the build times decreased heavily and become more manageable for both teams.

Note:

There is nothing inherently wrong with combining your Dev/QA builds since it does enable easier promotion. However, it is important to be mindful of this choice since there are usually more steps involved in creating the QA/Release build than the CI/Dev build. If you opt for this approach, your builds will take as long as the longest build.

These were solid results so overall people and I was pleased with the progress. But a drawback was the split doubled the number of builds and added extra pressure on the team when creating builds; not something that is ideal, particularly as Project #1 was already a tricky project to build correctly and this move seemed to move the goal posts. However, these problems were expected since the move was more designed as a preparatory move for future work.

With that out of the way, I began preparing to move to Phase 2. Before I could begin, however, I was told to hold off as the client was reconsidering their position on the mono Git repository.

Break apart the Mono Repo

I am not a huge fan of large mono repos as I believe they indirectly communicate a level of monolithic thinking that is progressively finding itself in the minority within our industry. That said, I will point out that there is nothing inherently wrong with using a single repository for multiple applications. In fact, large projects at Microsoft, Google, and other big shops are often housed in single repos and this is true for many legacy applications, particularly those built before the invention of Git.

Within our project, the large repo had consistently been a roadblock as it made it difficult to work due to the sheer size and number of files being tracked; this was pain the development teams on their side and our side both experienced first hand. I think it was this realization that led them to backtrack on the single repo design and explore a more segregated organization for the codebase.

In the end, the agreement was made to split the mono repo out into smaller repos to alleviate the performance problems. Our teams became responsible for setting up and configuring the following repos:

  • Repo for Project #1 – containing code that was to be deployed on-prem to clients. It contained the most complex build requirements as it was comprised of 6 separate modules
  • Repo for Project #2 – Containing code for cloud hosted applications which lived in Azure.
  • Repo for Frontend UIs supporting all Projects – these were written using Angular and supported users interacting with each of the applications
  • Repo for Automated UI-Testing for all Projects – these are tests written using Selenium that interacted with the UI of each Project to verify functionality

The most difficult element in this process was identified as the refactoring of code previously shared into a suitable form. It was decided that we would tackle this as it was encountered, opting for either duplication or Nuget packaging as the means to break it out.

I cannot understate how beneficial this break up was if for no other reason than showcasing that Git for Windows (or perhaps NTFS in general) is abysmal at handling large numbers of files. Our speed of interaction with Git went up by at least two orders of magnitude. It contributed immensely to our team reaching a record velocity. For myself, it underscored the importance of critically approaching project organization as a core part of the planning process.

Phase 2: Merge the builds (2wks)

With the code repos split and the teams enjoying better productivity than they had ever before over the course of the project I turned my attention to the final portion of this work: creation of a merged build to facilitate the efficient build of Project #1.

This effort was aimed at refactoring the existing segregated pipelines into a merged version designed to fulfill the following goals:

  • Removed duplicative tasks such as restores and builds
  • Remove “baggage tasks”, that is tasks that use “baggage” stored in source control and aim to use the standard tasks offered by Azure DevOps.
  • Leverage parallelism to maximize the flow of builds and remove any bottlenecks

“baggage tasks” are tasks that are supported by programs or scripts stored in source control. I consider this an anti-pattern as it unnecessarily ties you to your source control and needlessly bloats the size of your code base. Most modern DevOps platforms have built in tasks that can handle these tasks for you while removing its maintenance from the realm of your responsibility. In addition, as you move to a multi repo approach, you end up duplicating these scripts adding another layer of maintenance to your responsibilities.

The goal here is to effectively utilize our build platform and its features to increase simplicity and maintainability while reducing ceremony and “inherent knowledge”. In doing so, teams find the pipelines can more easily be maintained by anyone and problems area easier to discover because everyone understands the process.

Completing the Merged Build: The results

The results were, in a way, phenomenal. What once took 15m to create now took 5m (CI Build). And what once took 35m now took 15m (Release build). All of this resulting from a build script that was cleaner and more efficient than anything that had been there previously.

Further, since everything was under a single build we went from 12 separate build pipelines to 2, further enhancing developer productivity. This was the culmination of the goal we had set out to do.

The lone area I was not able to address was the notion of “true promotion”. That is, developer still had to kick off release builds by specifying the “commit hash”, though it only had to happen once now, instead of 6 times – this let us achieve our goal of a consistent build. This is a shortcoming in the way Azure DevOps approaches the concept of a build pipeline.

 Closing Thoughts

Overall, I am very pleased with how the new process is working and how it enables our teams to perform better. Perhaps more important though is the lessons it has taught us and our leadership just how important is to be clear and committed to DevOps from Day 0 and ensure the patterns and practices for the project are designated up front and are implemented from the get-go.

Testing the Water with EKS

Elastic Kubernetes Service or EKS is Amazon offering supporting managed Kubernetes (akin to Azure Kubernetes Service and Google Kubernetes Engine). As is often the case, Amazon takes a more infrastructure heavy approach than Azure meaning you will need to understand VPCs and Subnets when setting things up, since you will be defining things.

The good news is, Amazon offers a quick start tutorial here https://docs.aws.amazon.com/eks/latest/userguide/getting-started-console.html that will also provide the Cloud Formation scripts to set up the infrastructure for both the Control Plane and the Data Plane.

Before you get started

Amazon imposes limits on EC2 instances that an account is allowed to create, often this is very low – too low to support EKS. This is done to prevent users from accidentally spinning up resources they cannot pay for. You can see your limits here: https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#Limits: (If the link does not work, select EC2 and you will see Limits listed near the top of the left hand navigation).

The MINIMUM instance size that EKS can be used with is t2.small – you may have to request a limit increase, takes a day or two. If you do NOT do this, you will run into errors when you setup the data plane. The documentation does not call this out and it can be quite frustrating.

Part 1 – Initial Setup

Part 1 of the tutorial is divided amongst a few prerequisite actions: Creation of VPC and related Subnets via Cloud Formation, IAM Role creation, and verifying installation of the AWS CLI and kubectl.

If you have worked with Amazon before you will be familiar with the need to use VPCs to contain resources. This helps things stay organized and also provides a minimum level of security. I will say that, what is not called out is the need to raise your EC2 limits before executing this tutorial.

At a minimum, you will need to be able to deploy EC2 instances of type t2.small to support the Data Plane, if you are a new console user, you will likely have to request a limit increase. This action does not cost anything and the limit is purely in place to prevent new users from racking up charges for resources they do not need.

I should point out that this limit increase is ONLY needed for the Data Plane, not the Control Plane so, it is possible to get through half of the tutorial without it. However, you will find yourself unable to deploy Kubernetes resources without a working data plane. You can view your limits here: https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#Limits: (If the link does not work look for the Limits option under EC2)

Performing the Steps

Performing the steps is rather easy since most of the nitty gritty work will be handled by Cloud Formation. I encourage you to look over the scripts and get a sense of what is being created. For the most part its the standard VPC with subnets and an internet gateway.

Create the Control Plane

When we talk about managed Kubernetes, what is actually being referred to is a managed Control Plane. The Control Plane monitors and governs everything going on in a cluster so, it must be maintained at all costs. Kubernetes is designed to recover, automatically, from the loss of resources within the cluster. It can achieve this because the control plane responds to and addresses these problems.

Regarding the tutorial, it is straightforward and should be relatively easy. The one caution I would say is ensure the user that creates the cluster in the console is the SAME user your AWS CLI is configured to connect as (this is the default). If you fail to do this, you can receive authentication errors, provided additional configuration is not applied.

Update your kubectl context

The primary way to deploy resources to Kubernetes is via the managed API hosted on the Control Plane. This communication is handled via the kubectl assembly. kubectl operates via a “context” which tells it where commands will be executed. This is the purpose of the update-kubeconfig command in the next section. If you want to see a list of all your contexts, execute the following command:

kubectl config get-contexts

Each line entry here indicates a context you can use to talk to a Kubernetes cluster

Execute the final command in this section to verify you can talk to the Kubernetes Control Plane. If you run the following command you can see a host of resources in the Pending state – these will be deployed once a Data Plane is added to the cluster (next section)

kubectl get pods

Create the Data Plane

This next section is where you will get impacted if you have NOT raised your EC2 limits. EKS uses EC2 instances to support the Kubernetes Data Plane. Effectively, Kubernetes is nothing more than a resource scheduler. It schedules resources to run and uses the block of compute resources that are the Worker Nodes (EC2 instances) to host those resources.

On ephemeralism: The concept of ephemeral is very common within the Container ecosystem and Kubernetes. Everything within the cluster (outside the Control Plane) must be treated as a ephemeral. This means, you do not NOT want to persist state anywhere within the Cluster as you can lose it at any time.

I wont go into solutions for this but, when your deploy items that persist state in Kubernetes you need to be extra aware that it is viably being persisted.

Follow the instructions of this section, I recommend keeping the number of nodes to around 1 to 2 if this is for development and testing. Remember, in addition to paying for cluster time and the resources related to the VPC, you will also be paying for the EC2 instances – this can add up quickly. I recommend using t2.small for testing purposes as it works out to be the cheapest.

Add Your Nodes into the Cluster

As an extra step, once you create the EC2 instances that will be the worker nodes in the cluster you need to add them. I have yet to find the option that enables auto provisioning (this might be Fargate territory).

Once you finish executing the commands run the following command:

kubectl get pods

With luck, you should now see movement in the status of your nodes (mine were pretty fast and came to Running) in seconds.  Congrats, your cluster is now working, to prove that, lets launch the Kubernetes Cluster Dashboard.  Follow the instructions here: https://docs.aws.amazon.com/eks/latest/userguide/dashboard-tutorial.html

Let’s Deploy Something

Our cluster is pretty useless, so lets deploy an API to it. For this, I wrote up a pretty basic .NET Core Web API that does math, here is the source of the main controller:

Next, I create the Docker image using this Dockerfile

I recommend building it in the following way:

docker build -t calc-api:v1 .

Next, you have a decision to make. Assuming you have setup authentication with Docker Hub (via docker login) you can tag the image with your username and push, for me:

docker tag calc-api:v1 xximjasonxx/calc-api:v1

docker push xximjasonxx/cacl-api:v1

Or, if you want to take the Amazon route, you can create an Elastic Container Registry (ECR) and push the image there. To do this, simply select ECR from the Service options and create a registry. Once that is complete, Amazon will provide you with the appropriate commands.

The main point to understand is, Kubernetes will expand and contract the number of Pods that host your app as needed. To run containers on these pods, the source images need to be an accessible location, that is why we use a registry.

Once your image is pushed up you can apply a podspec file to add the resource to Kubernetes. Here is my podspec (I am using ECR):

Run the apply command as such:

kubectl apply -f deployment-spec.yaml

Once the command completes run this command and wait for the new pods to enter the Running state:

kubectl get pods –watch

Congrats you deployed your first application to Kubernetes on EKS. Problem is, this isnt very useful because the Cluster offers us no way to make our API calls. For this we will create a service.

Accessing our Pods

When it comes to accessing resources within a cluster there are a couple options: Services and Ingress. Ingress we wont discuss here, its a rather large topic. For this simple example a Service will be fine.

Here is the documentation from Kubernetes on Services: https://kubernetes.io/docs/concepts/services-networking/service/

What you need to understand is Services are, simply, the mechanisms by which we address a group of Pods. They come in four flavors: ClusterIP (default), NodePort, LoadBalancer, and ExternalName.

Locally, I like to use NodePort because I am often using minikube. When you deploy to the Cloud, the recommendation is to use LoadBalancer. Doing so will have AWS automatically deploy a LoadBalancer with an external hostname. Here is my servicespec:

Of note here is the selector node. This tells the service what Pods it is addressing, you can see the app and type values match from the Deployment spec above.

Execute the following command:

kubectl apply -f service-spec.yaml

Next use the following command to know when the service is available and addressable:

kubectl get svc –watch

Here is what my final output looked like

Selection_011

Once this is up you can use Postman or whatnot to access the endpoints on your API.

Congrats – you have deployed your first application to EKS. Do NOT forget to tear everything down, EKS is not something you want to leave running for a long duration in a personal use context.

My Thoughts

So, going into this my experience had been more on the Azure side and with minikube than with EKS. Without surprise I found EKS to be a bit more technical and demanding than AKS, mostly due to with the need for a Limit increase not being documented and the heavier emphasis on the infrastructure, which is typical with many of Amazon’s services; in contrast AKS hides much of this from you.

Overall, the rise of managed Kubernetes services like EKS is very good for the industry and represents a step closer to where I believe applications need to be: that is not caring about the underlying services or piping but, just deployed as their code running what is defined that they need. That is still a ways off but, it is fascinating that so much effort was spent to get to the cloud and then, with Kubernetes, we are trying to make it so the What cloud question no longer matters.

 

Deploying v2 Azure Functions with Terraform

 

You would not think this would be super difficult or, at the least Terraform’s documentation would cover such a common use case. But, I found it to be false, so I figured I would share the necessary changes needed.

First, some background – I am using a Terraform script to deploy a variety of Azure resources to support an internal bootcamp I will be giving at West Monroe in August.

Here is the completed Terraform block to deploy the V2 Azure Function:

This isnt something that is covered in the documentation that, in addition to specifying version we also need to include the linux_fx_version​in site_config.

I do not know if this is necessary if you use a Windows based App Service Plan (I am using Linux and sharing it with my App Service).

I found immense frustration in figuring this out and found myself annoyed that I could not find anything by V1 examples in the Terraform docs.

An Intro to Kubernetes

Kubernetes is a container orchestration framework from Google that has become a highly popular and widely used tool for developer seeking to orchestrate containers. The reasons for this are centered around the continued desire by developers and project teams to fully represent all aspects of their application within their codebase, even beyond the source code itself.

At its simplest form Kubernetes is an automated resource managed platform that works to maintain a declared ideal state for a system via the use of YAML-based spec files. While the topic of Kubernetes is very deep and encompasses many aspects of architecture and infrastructure there are 5 crucial concepts for the beginner to understand.

The Five Concepts

Cluster – the term Cluster essentially refers to a set of resources being managed by Kubernetes. The cluster can span multiple datacenters and cloud providers. Effectively each Cluster has a single control plane.

Node(s) – represents the individual blocks of compute resource that the cluster is managing. Using a cluster like minikube you only get a single node whereas many production system can contain thousands of nodes.

Pod – the most basic resource within Kubernetes. Responsible for hosting one or more containers to carry out a workload. In general, you will want to aim for a single container per pod unless using something like the sidecar pattern

Deployment – at a basic level ensures a minimum number of Pods are running per a given spec. Pod count can expand beyond this level but the replica count ensures a minimum. If number drops below, additional pods are recreated to ensure ideal state is maintained

Service – clusters are, by default, deny all and require services to “punch a hole” into the cluster. In this regard, we can think of a service as a router enabling a load balanced connection to a number of pods that match its declared criteria. Often services are fronted by an Ingress (beyond this post) which enables a cleaner entrance into the cluster for microservice architectures.

Visually, these concepts related to each other like this:
Selection_008

Options for Deployment

In his blog post, Kelsey Hightower of Google lays out how to setup Kubernetes yourself. Very much so, its well beyond most developers, myself included. Therefore, most of us will look towards managed options. In the cloud, all of the major players have managed Kubernetes options:

  • Azure Kubernetes Service
  • Google Kubernetes Engine (among other offerings)
  • Elastic Kubernetes Service (AWS)
  • Digital Ocean Kubernetes

Each of these options are very recent versions of Kubernetes and are already supporting customer deployments. However, one of the advantages Kubernetes comes with as a resource management platform, is the ability to also managed OnPrem resources. Due to this we have seen the rise of managed on-prem providers:

There is also minikube (https://kubernetes.io/docs/tasks/tools/install-minikube/) which serves as a prime setting for development and localized testing.

 

Deploying Our application

StockAppArch

This is the application we are going to deploy, the pieces are:

  • Price Generator – gets latest stock price for some stock symbols and uses a random number generator to publish price changes to RabbitMQ
  • RabbitMQ – installed via Helm chart – receives price change notifications and notifies subscribers
  • StockPrice API – .NET Core Web API – listens for Price Changes and sends price changes to listening clients via SignalR
  • StockPrice App – ReactJS SPA application receives price changes via SignalR and updates price information in its UI

With the exception of RabbitMQ, each of these pieces are deployed as Docker containers from Docker Hub.  Here is a sample Dockerfile to build the .NET pieces:

You can ignore the –environment flag – this was something I was trying for with regard to specifying environment level configuration.

Next we push the image to Docker Hub (or which ever registry we have selected) – https://cloud.docker.com/u/xximjasonxx

For reference, here is the sequence of commands I ran for building and pushing the StockPriceApi container:

docker build -t xximjasonxx/kubedemo-stockapi:v1 .

docker push xximjasonxx/kubedemo-stockapi:v1

Once the images are in a registry we can apply the appropriate spec files to Kubernetes and create the resources.  Here is what the StockAPI spec file looks like:

What is defined here is as follows:

  • Define a Deployment that indicates a minimum of three replicas be present
  • Deployment is given a template for how to create Pods
  • Service is defined which includes matching metadata for the Pods created by the Deployment. This means, no matter how many Pods there are, all can be addressed by the service

To apply this change we run the following command:

kubectl apply -f stockapi-spec.yaml

Advantages to using Kubernetes

The main reason orchestrators like Kubernetes exist is due the necessity with using automation to manage the large number of containers required to support higher levels of scale. However, while a valid argument, the greater advantage is the predictability, portability, and managability of applications running in Kubernetes.

One of the big reasons for the adoption of containers is the ability to put developers as close to the code and environment in production. Through this, we gain a much higher degree of confidence that our code and designs will execute as expected. Kubernetes takes this a step further and enables us to, in a sense, containerize the application as a whole.

The spec files that I shard can be run anywhere that support Kubernetes and it will run, more or less the same. This means we can now see the ENTIRE application as we need it, not just pieces of it. This is a huge boon for developers, especially those working on systems that are inherently difficult to test.

When you start to consider, in addition, tools like Ansible, Terraform, Puppet and how they can effect configuration changes to Spec files. And that clusters can span multiple enviroments (cloud provider -> on-prem, multi cloud provider, etc) there are some interesting scenarios that come about.

Source Code is available here: https://github.com/xximjasonxx/kubedemo

I will be giving this presentation at Beer City Code on June 1. It is currently submitted for consideration to Music City Code in Nashville, TN.

Pure Containerized Deploy with Terraform on Azure DevOps

In previous posts I have talked about the importance of Infrastructure as Code in creating a more complete solution that keeps with the core tenant of Cloud Native in that applications should manage their own architecture. In this post, I will walk through the process of deploying a container to an Azure App Service with Terraform.

Benefits of IaC (Infrastructure as Code)

When we start talking about cloud deployments we must inevitably come to see the configuration and deployment of Cloud services as being as much a part of the application as the source code itself. Any cloud application where the configuration for services is simply stored in the platform itself is encouraging disaster upon itself. A simple hit of the “Delete” key can leave teams scrambling for hours to restore service.

In addition, using IaC makes it very easy to spin up new environments which can be invaluable for testing. In fact, this is a chief benefit of a tool like Kubernetes (Jenkins X leverages this ability to create new environments per pull request). The end goal of DevOps is to see all environments and code handled in a way that requires a minimal amount of human interaction for management.

Terraform

Terraform is created by HashiCorp and is billed as a IaC tool which supports all of the major players in Cloud and infrastructure. It serves as an alternative to something like Cloud Formation or Azure Resource Manager. Files are defined using the HCL language and use code to represent the targeted infrastructure.

It can be installed from here: https://www.terraform.io/downloads.html

Our application

For this, I am referencing a microservice that I wrote for a side project (ListApp) that returns the Feed of events relevant to a user. At this stage of development, this is nothing more than a hard coded list which gets displayed in the UI.

I have already created the Dockerfile which builds the Docker image that I will use when deploying this image. You will see this referenced in the HCL code later.

Our application will be deployed on Azure.  Reference HashiCorp’s documentation on their Azure provider here to get through the initial steps and get started.

Building the initial pipeline

So, how I like to approach Terraform with .NET Core application is, I store my .tf file at the same level as my solution file (or whatever constitutes the root of your application) in a folder called terraform.

Azure DevOps makes it very easy to build pipelines which output Docker images and store them in a registry. But there is a trick to this process if you are going to use Terraform to deploy your code – publishing an artifact.

Selection_001

So, the reason you need to this, Azure DevOps operates on the notion of passing artifacts between pipelines and then operating on that artifact (usually you and build an artifact and then release it). When your artifact is a Docker container, you will not have an artifact per se, rather the release pipeline often targets the tagged Docker image in a registry somewhere. But in this case, we need the build to ALSO output our Terraform contents so they can be executed in the Release pipeline. Adding this task will accomplish that.

For more information on the actual process of building DevOps pipelines, go here

Before we get into building the release let’s cover off what the .tf file needs to look like. I posted this entry previously (https://jfarrell.net/2019/02/23/infrastructure-as-code-with-terraform/) which describes the .tf file in detail and how you can use it, locally, to deploy a containerized NodeJS application to Azure.

Now let’s talk through of the changes needed to use it with Azure DevOps

Backend State

State is a very important aspect to Terraform, it has to know if it created something previously so it knows what to expect if it finds that resource.  A great example is an Azure App Service. Without knowing this state, Terraform may try to create an Azure App Service with the same name as one which already exists, causing a failure.

By default, Terraform stores this state information in a .tfstate file which it references whenever plan and apply is run.  This situation changes when you run in DevOps since you will never have the .tfstate file – builds should always be run from clean environments.

This is where we introduce the concept of “backend state” where Terraform stores its state to a central location that it can reference during the build.  The docs are reasonably good here: https://www.terraform.io/docs/backends/types/azurerm.html.

In the end, what this amounts to is creating a storage account on Azure in which to store this state information.  Here is mine for Feed service:

This is relatively easy to understand, I am laying out what resource group, storage account, container, and what blob key I want to use when storing the state.

What is missing here is access_key and very intentionally. The docs lay this out quite nicely here: https://www.terraform.io/docs/backends/config.html

Basically, as is often the case, we want to avoid committing sensitive credentials to source control, less they be discovered by others and give access where it was not intended.  We can pass the access_key value when we call init in our Release pipeline.

This is the full .tf file I am going to commit to source control which I will plan and apply in the Release pipeline.

https://gist.github.com/xximjasonxx/422b50cbca80c14f89f5a4f0adecbe6a

Building the Release Pipeline

Returning to Azure DevOps we can now build our release pipeline.  Its simply a set of 4 steps:

Selection_002

Step 1: We install terraform into the container the release pipeline is being executed
Step 2: We call init which installs plugins and configures our backend for state storage
Step 3: We plan the deployment, this allows Terraform to get an idea of what changes are needed
Step 4: We apply the changes which updates our infrastructure as we desire it

So let’s talk specific for each of these steps:

Step 2 – init

Selection_003

Notice the _FeedServiceCIBuild after the DefaultWorkingDirectory – this is the name of your build artifact as it exists the Build pipeline. You can find this on the designer screen for the Release pipeline

We specify the -backend-config option to init in this case providing a key value pair for the access_key. I have hidden the actual value behind a pipeline variable. This will initialize Terraform to use my Azure Storage Account to store the state information

Step 3 – plan

Selection_005

Again, notice the use of _FeedServiceCIBuild as the root of where the terraform command will be executed.

We are also specifying the tag for the container created by the build pipeline. Reference the completed .tf file above to see how this is used. This is essential to updating our App Service to utilize the latest and greatest version of our code

Step 4 – apply

Selection_006

If this looks the same as the above, you are not going crazy. apply and plan often look the same.

One Important Note:
If you read only tutorials of using Terraform in CI they will make mention of a using

-input=false

with plan and apply to prevent the system from blocking. Often they will also recommend outputting a tfplan file for consumption by apply. With Azure, at least, you dont need to do this. The new -auto-approve is automatically appended to these commands which appears to be the new flavor for CI tools to  use.

Testing things out

You should now be able to kick off builds (via CI or manual) which will build a container to hold the latest compiled source code. Once this is built a Release process can be kicked off (automatically or manually) to update the Azure App Service (or create it), to reference the new container.

And just like that, you have created a managed build and release that is not only automated but, also contains the information for your App Service that would otherwise be stored transiently in the portal. Pretty cool.

Infrastructure as Code with Terraform

The concepts of Infrastructure as Code (IaC) are one of the main pillars to modern DevOps and Cloud Native Applications.  The general idea is, the software itself should dictate its infrastructure needs and should always be able to quickly and automatically deploy to existing and new environments.

This is important because most applications today tend use not a single cloud service but many, often times configured in varying ways depending on the environment. The risk here is, if this configuration lives only in the Cloud than if user error occurs or the cloud provider has problems this valuable configuration and settings information can be lost. With IaC, a simple rerun of the release script is all that is needed to reprovision your services.

Additionally, doing this mitigates the “vault of knowledge” problem whereby a small group of persons understand how things are set up. If they depart or are otherwise unavailable during an outage the organization can be at risk. The configuration and settings information for infrastructure is as much a part of your application as any DLL or line of code, we need to treat it as such.

To show this in action, we will develop a simple NodeJS application that responds to Http Requests using ExpressJS, Containerize it, and then deploy it to Azure using Terraform.

Step 1: Build the application

When laying out the application, I always find it useful to create a separate directory for my infrastructure code files, in this case I will create a directory called terraform. I store my source files under a directory src.

For this simple application I will use ExpressJS and the default Hello World code from the ExpressJS documentation:

npm install express –save

Create a file index.js – paste the following contents (taken from ExpressJS Hello World: https://expressjs.com/en/starter/hello-world.html)

const express = require(‘express’)
const app = express()
const port = 3000

app.get(‘/’, (req, res) => res.send(‘Hello World!’))

app.listen(port, () => console.log(`Example app listening on port ${port}!`))

We can run this locally using the following NPM command

npm start

Note, however, this does not come prebuilt after npm init so you might have to define it yourself. In essence, its the same as running node index.js at the command line.

Step 2: Containerize

Containerization is not actually required for this but, let’s be honest, if you are not using containers at this point you are only depriving yourself of easy more consistent deployments; in my view it has become a question of when I do NOT use containers versus the default of using containers.

Within our src directory we create a Dockerfile. Below is the contents of my Dockerfile which enable the application from above to be served via a container.

FROM node:jessie
WORKDIR /app
COPY . .
RUN npm install
EXPOSE 3000
ENTRYPOINT [ “npm”, “start” ]

We start off by using the node:jessie base image (Jessie is the flavor of Linux inside the Container) – you can find additional base images here: https://hub.docker.com/_/node/

Next we set our directory within the container (where we will execute further commands) – in this case /app – note that you can call this whatever you like

Next we copy everything from the Dockerfile context directory (by default the directory where the Dockerfile lives). Note that for our example we are not creating a .dockerignore due to the simple nature. If this were more complicated you would want to make sure the node_modules directory was not copied, less it make your build time progressively longer

We then run the npm install command which populates node_modules with our dependencies. Recall in the previous point, we do not want to copy node_modules over, this is for two reasons:

  1. Often we will have development environment specific NPM packages which we likely do not want on the container – the goal with the container is ALWAYS to be as small as possible
  2. In accordance with #1, copy from the file system is often slower (especially in the Cloud) than simply downloading things – also to make sure we only download what we need (see Point #1)

Next we run a Docker command which instructs the container to have port 3000 open and accepting traffic. If you look at the ExpressJS script, this is the port it listen on, so this is poking a hole in the container firewall so the server can receive requests.

Finally, all Dockerfile will end with the EntryPoint Docker command. With the source in place, this is the command that gets run when the Docker image is started as a container. For web servers like this, this should be a command that blocks the program from exiting because, when the program exists the container will close down as well

Step 3: Publish to a Registry

When we build Dockerfile we create a Docker image. An image, by itself, is useless as it is merely a template for subsequent container [ we haven’t run ENTRYPOINT yet ]. Images are served from a Container Registry, this is where they live until be called on to become container ( an instance of execution ).

Now, generally, its a bad idea to use a laptop to run any sort of production services (these days the same is true for development as well) so, keeping your images in the local registry is not a good idea. Fortunately, all of the major cloud providers (and others) provide registries to store your images in:

  • Azure Container Registry (Microsoft Azure)
  • Elastic Container Registry (Amazon)
  • Docker Hub (Docker)
  • Container Registry (Google)

You can create the cloud registries above within the correct provider and publish your Docker images, more here: https://docs.microsoft.com/en-us/azure/container-registry/container-registry-get-started-docker-cli

Docker images being published to a registry such as this opens them up to being used, at scale, by other services include Kubernetes (though you can also host the registry itself in Kubernetes, but we wont get into that here).

The command to publish is actually more of a push (from the link above)

docker tag nginx myregistry.azurecr.io/samples/nginx

docker push myregistry.azurecr.io/samples/nginx

With this, we have our image in a centralized registry and we can pull it into App Service, AKS, or whatever.

Step 4: Understand Terraform

At the core of IaC is the idea of using code to provision infrastructure into, normally, Cloud providers. Both Azure and Amazon offer tools to automatically provision infrastructure based on a definition: CloudFormation (Amazon) and Azure Resource Manager (ARM) (Azure).

Terraform, by HashiCorp, is a third party version which can work with either and has gained immense popularity thanks to its ease of implementation. It can be downloaded here.

There is plenty of resources around the web and on HashiCorp’s site to explain how Terraform works at a conceptual level and how it interacts with each supported provider. Here are the basics:

  • We define a provider block that indicates the provider plugin we will use to create resources; this will be specific to our target provider
  • The provider block than governs the additional block types we can define in our HCL (HashiCorp Configuration Language)
  • We define resource blocks to indicate we want to create something
  • We define data blocks to indicate that we wish to query for certain values from existing resources

The distinction between resource and data is important as some elements are ONLY available as one or the others. One such example is a Container Registry. When you think about it, this makes sense. While we will certainly want to audit and deploy many infrastructure components with new releases, the container registry is not such a components. More likely, we want to be able to read from this component and use its data points in configuring other components, such as Azure App Service (we will see this later)

To learn more about Terraform (we will start covering syntax in the next step) I would advise reading through the HashiCorp doc site for Azure, its very thorough and pretty easy to make sense of things: https://www.terraform.io/docs/providers/azurerm/index.html

Step 5: Deploy with Terraform

Terraform definition files usually end with the .tf extension. I usually advise creating this in a separate folder if only to keep it separate from your application code.

Let’s start with a basic script which creates an Azure Resource Group

provider “azurerm” {
  version=”=1.22.0″
  subscription_id=””
}
resource “azurerm_resource_group” “test” {
  name=”example-group”
  location=”CentralUS”
}

The first block defines the provider we will use (Azure in this case) and the target version we want to use of that provider. I also supply a Subscription Id which enables me to target a personal Azure subscription.

Open a command line (yeah, no GUI that I am aware of) and cd to the directory that holds your .tf file, execute the following command:

terraform init

Assuming the terraform program is in your PATH, this should get things going, you will see it download the provider and provision the .terraform which holds the binary for the provider plugin (and other plugins you choose to download). You only need to run the init command one time. Now you are ready to create things.

As with any proper IaC tool, Terraform lays out what it will do before it does it and asks for user confirmation. This is known as the plan step in Terraform and we execute the following:

terraform plan

This will analyze the .tf file and (by default) output what it intends to do to the console; you can also provider the appropriate command line argument here and get the plan into a file.

The goal of this step is to give you (and your team) a chance to review what Terrafrom will create, modify, and destroy. Very important information.

The final step is to apply the changes, which is done (you guessed it) using the apply command:

terraform apply

By default, this command will also output the contents of plan and you will need to confirm the changes.  After doing so, Terraform will use its information to create our Resource Group in Azure.

Go check out Azure and you should find your new Resource Group created in the CentralUS region (if you used the above code block).  That is pretty cool. In our next step we will take this further and deploy our application.

Step 6: Really Deploy with Terraform

Using Terraform does not excuse you from knowing how Azure works or what you need to provision to support certain resources in fact, this knowledge becomes even more critical.  For our purposes we created a simple API that responds with some text to any request, for that we will need an App Service backed by a Container but, before that we need an App Service Plan – we can create this with Terraform:

resource “azurerm_app_service_plan” “test” {
  name=”example-plan”
  location=”${azurerm_resource_group.test.location}”
  resource_group_name=”${azurerm_resource_group.test.name}”
  kind=”Linux”
  reserved=true
  sku {
    tier=”Standard”
    size=”S1″
  }
}

Here we see one of the many advantages to defining things this way; we can reference back to previous blocks (remember what we created earlier). As when writing code, we want to centralize the definition of things, where appropriate.

This basically creates a simple App Service plan that uses the bare basics, your SKU needs may vary. Since we are using containers we could also use Windows here as well, but Linux just feels better and is more readily designed for supporting containers; at least in so far as I have found.

Running the apply at this point will add App Service Plan into your Resource Group. Next we need to get some information that will enable us to reference the Docker container we published previously.

data “azurerm_container_registry” “test” {
name=”HelloWorldTest”
resource_group_name=”${azurerm_resource_group.test.name}”
}

Here we see an example of a data node which is a read action – you are pulling in information about an EXISTING resource – an Azure Container Registry in this case. Note, this does NOT have to live in the same Resource Group as everything else, its a common approach for services like this to be in separate group for those that transcend environments.

Ok now we come to it, we are going to define the App Service itself.  Before I lay this out, I want to give a shout out to https://pumpingco.de/blog/deploy-an-azure-web-app-for-containers-with-terraform which inspired this approach with App Services.

Here is the block: https://gist.github.com/xximjasonxx/0d0bdda8741ac43197528937f6cec9eb (too long for the blockquote)

There is a lot going on here so let’s walk through it.  You can see that, as with the App Service Plan definition, we can reference back to other resources to get values such as the App Service Plan Id. Resources allow you to not just create them but reference their properties (Terraform will ensure things are created in the proper order).

The app_settings block let’s us pass values that Azure would otherwise add for us when we configure container support. Notice here though, we reference the Container Registry data block we created earlier.  This makes it a snap to get the critical values we will need to allow App Service access into the Container Registry.

The last two blocks I got from PumpingCode – I know what linux_fx_version does, though I have never seen it used in App Services, same with Identity.

Step 7: But does it work?

Always the ultimate question. Let’s try it.

  1. Make a change and build your docker image. Tag it and push it to your Azure Container registry – remember the tag you gave it
    • One tip here: You might have to change the port being exposed to 80 since App Service (I think) blocks all other ports
  2. Modify the .tf file so the appropriate image repository name and tag is represented for linux_fx_version. If you want some assurances you have the right values, you can log into the Azure Portal and check out your registry
  3. Run terraform apply – verify and accept the changes
  4. Once complete, try to access your App Service (I am assuming you changed the original and went with port 80)
  5. It might take some time but, if it worked, you should see your updated message being returned from the backend

Conclusion

The main point with IaC is to understand that modern applications are more than just their source code, especially when going to the Cloud. Having your infrastructure predefined can aid in automatic recovery from problems, enable better auditing of services, and truly represent your application.

In fact, IaC is the centerpiece to tools like Kubernetes as it allows it to maintain a minimum ideal state via YAML definitions for the abstract infrastructure. Pretty cool stuff if I do say so myself.

Of course, this here is all manual, where this gets really powerful is when you back it into a CD pipeline. That is a topic for another post, however 🙂