Serverless Proxy Pattern: Part 5

Part 4 is here

Source code here

Full Disclaimer
It would appear the script referenced in previous parts was not re-entrant, that is, it contained good information but, over the course of building it, things fell into place rather than being in place. Thus, I noted it failed to setup when I ran it clean. Version 1.1 addresses this and will be referenced here

Now that we our app up and running its time to be critical of things, mainly permissions. As I stated in Part 1 AWS recommends the use of roles to fulfill service level tasks so that credentials are never stored in source control or within applications, thus lessening the blast radius in the event of compromise. However, in saying that we recognize that have a setup where all of the roles we use are open to everything is not good either. Thus first we will focus on splitting and refining AppRole.

Get the Order Right

Cloud Formation makes every attempt to determine the order in which to create resources, when it needs help you can often use the DependsOn attribute. I have personally found this wanting so, the first thing I did was establish the order of my resources so things could be efficiently and correctly created. Here is that order:

  • ThumbnailBucket (S3 Bucket)
  • ThumbnailBucketPolicy (S3 Bucket Policy)
  • ImageTable (DynamoDB Table)
  • CreateThumbnailFunctionRole (IAM Role)
  • CreateThumbnailFunction (Lambda)
  • AnalyzeImageFunctionRole (IAM Role)
  • AnalyzeImageFunction (Lambda)
  • ImageUploadTopic (SNS Topic)
  • ImageUploadTopicPolicy (SNS Topic Policy)
  • RawBucket (S3 Bucket)
  • RawBucketPolicy (S3 Bucket Policy)
  • CreateThumbnailFunctionSNSInvokePermission (Lambda Permission)
  • AnalyzeThumbnailFunctionSNSInvokePermission (Lambda Permission)
  • ApiGatewayRest (API Gateway)
  • BucketApiResource (API Gateway Resource)
  • ImagesApiResource (API Gateway Resource)
  • BucketItemApiResource (API Gateway Resource)
  • BucketItemApiMethodRole (IAM Role)
  • BucketItemApiMethod (API Gateway Method)
  • ImagesApiMethodRole (IAM Role)
  • ImageApiMethod (API Gateway Method)
  • ApiGatewayDeployment (API Gateway Deployment)
  • DefaultApiGatewayStage (API Gateway Stage)

The items highlighted in BOLD are new moving into this part and mostly represent the result of splitting AppRole. As an example here is CreateThumbnailFunctionRole:

You can see that I kept the open resource for CloudWatch, XRay, and Logs but I restricted the rights to the ThumbnailBucket to Put and Get type actions only. This disallows the role from reading information out of the Thumbnail bucket.

Further supporting this is the AssumeRolePolicy which enforces that ONLY Lambda can assume this role and no other service. This is key given the very tight use case the function and its coupling to this specific bucket.

The other interesting thing here is the user of !Sub to specify the ARN of the S3 Bucket holding raw images. Cloud Formation will always do its best to mitigate circular dependencies but, especially wit this sort of role structure, has a hard time. Thankfully AWS has standard format for resource names. Using that we can specify it here to prevent Cloud Formation from looking at its template for the ARN – this mitigates the Circular Dependency that would happen here otherwise.

Create Bucket Policies

One thing I did learn while working through this was, curiously, if I specified Resource: “*” for within my Roles for the S3 bucket things worked fine but, the moment I went resource specific I got Access Denied.

After much Googling (seriously why do all of the AWS docs never call out resource specific roles) I guessed and was correct that in addition to the Role I needed a Bucket Policy to allow the action as well.  Here is the policy for RawBucket which allows API Gateway to Proxy the PUT call to create objects:

Here we specify that, for RawBucket, we allow Lambda services to read from it (necessary to support the Thumbnail creation process) and we allow API Gateway to write it (necessary to support the PUT operations our API Gateway will proxy. Of particular note here is the /* you see – this is necessary to indicate our policy is on the bucket contents and not the bucket itself.

SNS Endpoint Verification Check

One of the other problems I came across with testing was that SNS, when it creates, will send a pulse to the Lambda functions to ensure they are there. I didnt catch this the first time because of the order in which I created things the first time. Handling this is rather easy as SNS simply sends a s3:TestEvent to the function. Both CreateThumbnail and AnalyzeImage have this code to handle the check:

Conclusion

I hope this series has been informative and worthwhile. It took a lot of care and thought to make and a lot of trial and error. Cloud Formation, and similar tools, take a lot of effort to learn and understand but, ultimately, they can save you a ton of grief. One of the value props for Infrastructure as Code (IaC) is that it enables both quick creation of environments and access to more efficient DevOps processes.

This is best manifested, right now, in JenkinsX approach to spin up entire environments in Kubernetes for testing per PR. This way, we have a way to run a complete battery of tests to ensure the code entering the trunk (and possibly being deployed automatically to production) is valid and tested.

Again here is the completed code:

https://github.com/xximjasonxx/ThumbnailCreator/tree/release/version1.1

Advertisements

Serverless Proxy Pattern: Part 4

See Part 3 here

To this point, we have defined what I would say are the “backing” services for our application. We could write frontend apps to send images to our bucket which would trigger the overall process but, in keeping with the theme, I want to provide a “codeless” way (or serverless) to interact with our services.  Enter API Gateway.

API Gateway: The Heart of the Serverless Proxy Pattern

API Gateway (and API Management on Azure) can act as proxies forwarding received calls to backing services to perform operations. This means we can dispense with writing any code to work with these services and instead rely on the API Gateway to handle the underlying calls for us.

To create an API Gateway in AWS we must first create a RestApi type resource. This serves as the container for related resources of type Resource and Method. Here is the YAML:

 

 

 

 

 

Next, we need to define our resources. If you are familiar with the nomenclature from REST standards you will recognize this term. It is the “thing” that defines what is being acted against. It can often map to database models but, does not necessarily have to.

In our case we will define our resources as such:

 

 

 

 

Each level of the API is defined as a resource such that we end up supporting the two following path structures:

  • /all-images
  • /<bucket_name>/<key>

Of particular note is the usage of the bucket name for the root resource for key – this appears to be required or, at least, I cannot find a way around it. Now that our resource structure in place we define the “actions” that will take place when the path is matched:

PUT an Object

First, we will configure out /<bucket_name>/<key> route to allow a POST verb to invoke it and forward that onto S3 using PUT to create (or update) the object in blob storage.

 

 

There is quite a bit here so lets hit the crucial areas. We define HttpMethod to indicate that the method can only be invoked using POST and that the Authorization will be gleaned from IAM credentials associated with the underlying integration calls (recall our role we defined in Part 1).

Next we define RequestParameters which in this case lets us enforce that item must be provided. Where the real magic happens is the IntegrationHttpMethod and Integration sections:

  • IntegrationHttpMethod: This is the verb that API Gateway will use when invoking whatever service it will call when the resource path matches.
  • Integration: This is the meat of this operation, here we define various mappings for values to pass to the underlying service call, what kind of integration we will (AWS Service in this case) and the Uri we will call (in this case it is the S3 bucket path)

What this ends up facilitating is a PUT using the bucket-name and key against the S3 bucket object path to pass the contents of the request to S3 API which will create the appropriate object – all without writing a line of code.

The final bit of this is the IntegrationResponses where we define that the integration will respond with a 201 (Created) – this lines up with MethodResponses intentionally.

Get All Image Data

The second resource will return the contents of our DynamoDB table ImageDataTable that we created in Part 3. For this one, the integration portion is a bit trickier, have a look:

 

The key thing to understand here is we indicating we want to call an action against the DynamoDB API – the action in this being Scan (wont go into the differences here between Query and Scan).

Scan expects a JSON block to define various parameters that it will use to perform filtering and selection. In this example, we are dropping the filtering aspect and simply specifying our DynamoDB table name so that all contents of that table are returned.

Also of note here is the IntegrationHttpMethod. Despite the fact that our API Gateway resource is called using GET, we must use POST to call the DynamoDB Action API.

Once this is in place you can get a dump of the table contents from DynamoDB. The one outstanding issue I have with this is the JSON comes back as it exists in DyanmoDB which will not look like most JSON blocks you have likely seen – but it is easily parsable by DyanamoDB libraries. Still, a standing goal of mine is to get this into a more normalized JSON block for easier consumption.

Deploying the API

The final bit here is making the API Gateway accessible via the web. This is done by deploying an API Gateway Deployment and Stage resources. In practice, these are designed to serve as a means enable environmentalization of the API Gateway so cahnges can be promoted, similar to code changes.  Here is what I defined for these resources:

 

 

 

Ending this Part

In this part I went through the complete setup of the API Gateway that we are using to proxy Amazon services: S3 and DynamoDB. This approach gives a lot of the basic functionality we find in applications but without needing to write any code and all definable via Cloud Formation. Now that our application is stood up, our next parts will focus on tweaking things to make them more secure and more efficient.

As always you can reference the complete source here: https://github.com/xximjasonxx/ThumbnailCreator/tree/release/version1

Serverless Proxy Pattern: Part 3

Part 2 is here

In the previous parts we worked to setup our basic AWS infrastructure including SNS, S3, and Role. We also added the appropriate permissions for the flow of events to work unimpeded. Next, we setup a typical CI/CD flow using Azure DevOps so that as we make changes to code and infrastructure our application is changed appropriately; this fulfills the main tenants of GitOps and Infrastructure as Code (IaC). In this part, we will actually develop the code for our Lambdas and test that our various other resources are set up correctly.

Define the DynamoDB Table

In keeping with our main theme, we will deploy a DynamoDB table using Cloud Formation. Here is the YAML we will use:

Something to note here is, I chose to NOT use a parameter to define the name of the table. You definitely could but, that then speaks to deployment considerations since you generally do not want your table names disappearing. Also, with Dynamo the naming is localized to your Amazon account so you dont have to worry about extra-account conflicts.

What is DynamoDB?
DyanamoDB is a NoSQL database available in both normal and global variants from Amazon. It is ideal for handling use cases where data is unstructured and/or coming in a high volumes where enforcing consistency found in RDBMS databases is not a primary concern.

In our case, the data that we store will be from Amazon Rekognition Label Detection which will be consistently different and thus makes sense to store in a NoSQL fashion.

The way Dynamo works is it expect SOME structure (in this case we guarantee there will ALWAYS be an Id column) provided which serves as the tables index. There is a great amount of flexibility in how primary and secondary keys are defined along with sort indexes within those key structures.

Create Thumbnail Function

Our first Lambda will respond to an image being added to our “raw” bucket and create a copy of that image with its dimensions reduced (thumbnail). Originally, when I did this I used System.Drawing but was met with a libgdiplus error; the error happens because System.Drawing is built on gdiplus which is not installed, by default, into Ubuntu Docker images. Rather than attempting to get this work I did some research and found the SixLabors imaging library that was featured at re:Invent. (link: https://sixlabors.com/projects/imagesharp)

One of the other interesting bits to this is, when Lambda receives an event from SNS the format is a bit different from when its comes from S3 directly. For that I create a mapping classset that can be used with Newtonsoft JSON.net.

Here is the core logic which does the resizing – outside of this it is all about reading the Stream from S3 and writing it to our Thumbnail S3 bucket.

Analyze Image Function

In addition to creating the thumbnail image we also want to run the image through Amazon Rekognition (Computer Vision) and use the Detect Labels to gather data about the image. This data will then be written to our DynamoDB table.

In Dynamo each row is unstructured and can have a different schema – each column for that document is represented by a key on the provided ItemDataDictionary (as shown above).

As always for reference here is the complete source: https://github.com/xximjasonxx/ThumbnailCreator/tree/release/version1

See you in Part 4

Serverless Proxy Pattern: Part 2

Part 1 available here

In the previous part, I talked through the motivation for build this application and the emphasis on using CloudFormation entirely as a means to enable efficient and consistent setup.

To start the template we specified our role (AppRole) and talked through some of the decisions and potential improvements with the approach. Next we setup our S3 buckets and configured the “raw” bucket to send a notification, via SNS to two (not yet created) Lambda functions which will handle various operations against the image data that was uploaded.

In this part, we will setup the appropriate permissions for the Lambdas so they can be corrected invoked from SNS. We will also talk through setting up GitOps so that our code and infrastructure can be deployed quickly and automatically enabling fast development.

Permissions

The one thing you learn very quickly working with AWS is the intense focus around security at all levels. This is mainly driven by “policies” which are attached to things like users and roles and specifies what actions may be taken by the attached. We talked about this a little in Part 1 and its worth repeating here – for the sake of getting started I am using a set of fairly wide-open policies, the goal being to refine these towards the end.

We can see this via the TopicPolicy which enables S3 to publish messages to the SNS Topic. Here it is in YAML format:

Keep in mind that ImageUploadTopic was described at the end of Part 1 and represents the topic we using to “fan out” bucket notifications.

Here we are declaring a policy and associating ith with the ImageUploadTopic resource (via the Resource portion of the policy statement). Effectively this allows S3 to public to this topic (the Principal). We can be further specific here, though I have chosen not to be, with what S3 resources can publish to this Topic.

With this in place, our S3 bucket can now publish messages to our SNS topic but, nothing will happen if we test this. Why? Because SNS does NOT have permissions to invoke our Lambda functions. Let’s set that up next.

Lambda Permissions

Recall from Part 1 this is what our SNS declaration looked like in YAML:

Since we have now added our TopicPolicy (above) SNS will attempt to invoke our lambdas, and will summarily fail. Why? Because, as you might guess at this point, it does not have permission to do so. For that we need to create Lambda permissions for our functions (though you could also create a more general permissions if you have many Lambda functions).

Here are the permission declarations in YAML:

Here we are specifying the SNS may invoke these functions, obviously needed. Next, we will create the actual Lambda functions we will be invoking.

Why are we using SNS?

There may be a question in your mind of why I am choosing to use SNS instead of straight Lambda invocation. The answer is, as I said in Part 1, S3 does NOT support multiple delivery of events so, if you are going to do a sort of “fan out” where the event is received by multiple services you need to use SNS. Alternatively, you could leverage Step functions or a single Lambda that calls other Lambdas. For me, I feel this is the most straightforward and keeps with the “codeless” theme I am going for as part of my “serverless” architecture.

Infrastructure as Code

Cloud Formation stands with other tools as a way to represents infrastructure needs in code, it is a core part of the GitOps movement that states all changes to an application should be invoked automatically from source control; source control needs to be the single source of truth for the application.

In general, this is part of a larger movement in the industry around “what is an application?”. Yes, traditionally, this used to be simply code which would then run on infrastructure provisioned, in some cases, by a totally separate team. As we have moved into Cloud though, this has shifted and coalesced to the point where Infrastructure and Code are intermingled dependent on each other.

When I speak with teams on these points, I emphasize the notion that the “application” is everything. And it being everything, your infrastructure definition is just as important as your compiled code. Gone are the days were it is acceptable for your Cloud infra to exist transiently in a console. It is now expected that your applications infrastructure in represented in code and versioned along with the rest of our application. That is why we are using Cloud Formation here (we could also use Terraform and similar tools).

GitOps

GitOps simply states that everything about our applications exists together and that commits to Git should be the means by which updates to our code or infrastructure are made – in other words, nothing is ever done manually.

For this application, as we enter into Lambda territory, we will want a way to compile and deploy our Lambda code AND update our infrastructure via Cloud Formation. The best way to do this is to setup a CI/CD pipeline. For this, I will leverage Azure DevOps.

Why not AWS Services?

It may seem odd to use Azure DevOps to deploy AWS infrastructure but, its not as uncommon as you might think. This due to the AWS tooling being awful – I really believe Amazon sought to simply say “we can do that” than create a viable platform supporting developer productivity. Many of the teams I have seen using AWS use Azure or, in other cases, will deploy a Jenkins server – rarely do I see people actually using CodePipeline and CodeBuild.

CI/CD Simplified

I will spare you the details of setting up the pipelines in Azure and doing the releases. I will leave you with the YAML file that represents the build pipelines that builds the Analyze Image and Create Thumbnail functions in tandem while publishing out the Cloud Formation template.

The gist of this is simple – we compile each function into a separate Zip file and publish that zip file so we can use it in our Release pipeline. Additionally we publish our infra.yaml which is our Cloud Formation template.

In the release pipeline (not shown) we use the S3 Upload tasks to upload the resulting zip files to an S3 bucket which houses the application artifacts. We then run a Stack Update task with the Cloud Formation template. This will replace the code for those Lambdas, here is the YAML:

The key thing here is the introduction of the CreateThumbnailLambdaVersionFile and AnalyzeImageLambdaVersionFile parameters which are used as the S3Key value for the Lambdas; this is fed to the template at runtime by the DevOps Release pipeline, like this:

This is what is meant by using GitOps – all changes to our application happen via Git operations, we never do anything manually. This sets up so that with a proper automated testing layer, we can achieve true Continuous delivery of our application.

That is all for this section – in the next Part we will write the actual code for our application which adds objects to our Thumbnail bucket and writes the data to DynamoDB.

As always here is the complete source if you want to skip ahead:
https://github.com/xximjasonxx/ThumbnailCreator/tree/release/version1

See you in Part 3

Serverless Proxy Pattern: Part 1

Today, I am going to introduce something I have been experimenting with over the last few weeks in my spare time. I really appreciate the benefits of serverless but, I have always felt there was so much more to it than simple Lambda functions and the like. So I wanted to explore how I could create entire systems that featured little to no code at all and still support complex functionality.

This is will be the first part in a multi-part series where I use CloudFormation to build a Thumbnail Creator / Image Analyzer app in AWS that leverages this new approach to using Serverless.

The API Gateway

One of the central pieces of this pattern, particularly for web apps is API Gateway (API Management on Azure) and its ability to integrate with backend services such as S3 and DynamoDB (Storage and Cosmos on Azure) allowing for pass through calls to these services to handle common operations.

In our example, we will proxy both S3 and DynamoDB with API Gateway to support both the reading of image data stored in DynamoDB and the storing of raw images in an S3 bucket. All created using Cloud Formation so that it can be stood up again and again as needed.

DevOps

To make deployments easier, I leverage CI/CD services via Azure DevOps, as it provides a superior experience to the CI/CD tooling offered by the AWS platform. Here I utilize the YAML based pipeline syntax for Builds and build the two Lambda functions in tandem and publish my Cloud Formation YAML template. A Release pipeline uploads my code artifacts to S3 (best place to source the Lambda binaries) and Create/Updates the stack represented by my Cloud Formation template.

Getting Started: Create our Role

In AWS, Role’s play a vital role in ensuring security for applications interacting with AWS services. Amazon recommends using Roles over credentials since it eliminates the need to keep passwords floating around and are, generally, more flexibile. Here is the starting point for the role we will use throughout:

As you can see, this Role features A LOT of policies. There is a solid case to be made that we would be better off splitting this role into smaller roles so we lessen the amount of damage that can be done if an attacker were to somehow gain access to a service with this role.

On the flip side, one of the advantages to serverless is a decreased attack surface for attackers in general. As a general rule, the less of “my” stuff in the wild, the less chance there is for an attack – I trust Amazon (and Microsoft) more with security than I do myself.

The other issue with this role definition is it is very open – for example:

  • The role is given access to ALL permissions for S3, XRay, Lambda, Logs, and CloudWatch for ANY resource

This is very bad since it means anyone can use this role to look at (or access) anything. Obviously, when we build applications we want to constrain them to only the things that they care about.  As a rule, we should find ourselves rarely, if ever, using *.

Be that as it may, I am starting this way to remove permission concerns from our plate as we develop this application. Towards the end, we will come back and update this permissions to only what we need and only on the resources that are part of our application. It is called out here as a warning to not use this in a production system.

Create the buckets

For our application we will need two buckets: One to handle raw images and one to handle the generated thumbnails of those images. Here is the YAML template for our bucket creation:

This being Cloud Formation, we will of course allowing the calling process to dictate what names we should use for our buckets. Keep in mind, if you decide to run this, bucket names MUST be unique globally so, you might have to get creative.

For the most part I assume this template is pretty self-explanatory, We are creating resources named RawBucket and ThumbnailBucket each of type AWS::S3::Bucket. We use the value passed in via the RawBucketName and ThumbnailBucketName parameters. I will point out that the names really are for display purposes, the resource names are the main block for configuration and what you will reference throughout the template (RawBucket and ThumbnailBucket in this case).

Where this might get a bit fuzzy is with the NotificationConfiguration section under RawBucket. If you are not aware, AWS allows you to configure notifications from S3 buckets when certain events happen. This was really the birth place of Serverless, responding quickly to these internal events. By taking this approach, we can build very complex systems without needing to write a lot of code ourselves, we just plug services together.

S3 Buckets support a number of Notification types including Lambda, Topic, and Queue each of which has valid use cases. One limitations to keep in mind with S3 Notifications is THEY ARE DELIVERED ONCE. This means, if you want to do fan out actions, you MUST use something that can do that for you – SNS is used most often for this case (TopicConfiguration). This is what I will be using since I will want to perform Image Analysis AND Thumbnail Creation via Lambda when a new Object is PUT into the raw bucket.
Docs: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-s3-bucket-notificationconfig.html

Looking at the above source you can see we reference the Topic (SNS) ImageUploadTopic and we send the notification ONLY for s3:ObjectCreated events.

Create the SNS Topic

Here is the YAML for the SNS Topic:

Here we define the subscriptions that will denote which services SNS will notify and how when a message is received.

We have not created these Lambda functions yet, we will do so in Part 2.

If you want to skip ahead here is the complete source I am using:
https://github.com/xximjasonxx/ThumbnailCreator/tree/release/version1

See you in Part 2

 

 

Reflecting on a DevOps Transformation

At West Monroe Partners, one of my responsibilities is helping clients and teams  understand how better DevOps can facilitate greater value and productivity. Recently, I was given the opportunity to assist one of our teams in transforming their DevOps process and correct certain inefficiencies that were holding the teams back in maximizing their efforts.

I joined the team with the aim of assisting its overall development as well as addressing these inefficiencies such as they were.

Phase 0: Gain an Understanding

In any engagement, knowledge is paramount. We must be able to speak with our clients on the level and work to gain their trust so as to better understand their problems. At its heart, DevOps refers more to culture than any specific toolset and within each organization will always be different. This being the case, we must respect these difference and work to create a process that emphasizes what the client needs rather than sticking to a formulaic approach.

In this case, I was joining the team 10 months into the project and thus would have a  limited understanding of the architecture and client relationship. It was paramount that I relax and listen to gain this knowledge and allow for the accurate identification of areas that I could impact to provide maximum value. By doing this, I was aiming to establish a trust with the client rather than becoming a “talking head”. Trust with DevOps is crucial since without it, you cannot expect the organization to take your ideas seriously and thus act on them.

Thus, I spent a large portion of my initial time talking with the team leaders and sitting in on meetings, offering only limited input. This allowed me to gain a better understanding of the overall situation and to better assess the problems areas. Ultimately, I came to four goals that I felt our teams had to address to be put them in a position to be successful:

  • Reduce Dev build times from 30+ minutes to more manageable levels allowing developers to use Dev to validate their changes
  • Create better consistency by promoting a simpler build and release model for developers to make building applications simpler
  • Break apart the large mono repo whose size had become an hindrance on my development teams and clouded clients desired deployment startegy
  • Emphasize Infrastructure as Code to enable to quick creation of environments including those that would be ephemeral in nature

The build times were the main issue here as the teams had made the decision to combine the Dev and QA builds into a single serial pipeline. While this did aid in promotion, it bottlenecked the process by holding the Dev artifact until the QA artifact completed, the later had a much longer process.

Phase 0a: Establish What I CAN do

With my goals established, the next step was to talk our client through the process and articulating the perceived value each change would bring. Based on these conversations, I ruled out the later two goals. The end proposal that was accepted basically flowed as such:

  • Existing structure of individualized Dev/QA builds would be broken apart and only unit tests would be run to enable greater speed for developer validation. Client also agreed to drop signing, obfuscation, and whitesource scanning requirements from these builds
  • Segregated builds would then be merged back into a parallelized build that would create a “package” that could be delivered to the Azure DevOps release pipeline. This would also include a refactoring effort to reduce duplication and promote reusability

With that agreed to, it was time to get started.

Phase 1: Divide the builds (1wk – against active pipeline)

The first phase was accomplished fairly quickly with the main challenge being marking our tests as “unit” to ensure we still could maintain quality without running every test. The results from this were very positive:

  • Project 1
    • Average build time went from 35m for Dev to 17m
    • For the release build times remaining unchanged since speeding them up was not the objective
  • Project 2
    • Average build time went from 45m for Dev to 20m
    • For the release build times remaining unchanged since speeding them up was not the objective

As you can see, the build times decreased heavily and become more manageable for both teams.

Note:

There is nothing inherently wrong with combining your Dev/QA builds since it does enable easier promotion. However, it is important to be mindful of this choice since there are usually more steps involved in creating the QA/Release build than the CI/Dev build. If you opt for this approach, your builds will take as long as the longest build.

These were solid results so overall people and I was pleased with the progress. But a drawback was the split doubled the number of builds and added extra pressure on the team when creating builds; not something that is ideal, particularly as Project #1 was already a tricky project to build correctly and this move seemed to move the goal posts. However, these problems were expected since the move was more designed as a preparatory move for future work.

With that out of the way, I began preparing to move to Phase 2. Before I could begin, however, I was told to hold off as the client was reconsidering their position on the mono Git repository.

Break apart the Mono Repo

I am not a huge fan of large mono repos as I believe they indirectly communicate a level of monolithic thinking that is progressively finding itself in the minority within our industry. That said, I will point out that there is nothing inherently wrong with using a single repository for multiple applications. In fact, large projects at Microsoft, Google, and other big shops are often housed in single repos and this is true for many legacy applications, particularly those built before the invention of Git.

Within our project, the large repo had consistently been a roadblock as it made it difficult to work due to the sheer size and number of files being tracked; this was pain the development teams on their side and our side both experienced first hand. I think it was this realization that led them to backtrack on the single repo design and explore a more segregated organization for the codebase.

In the end, the agreement was made to split the mono repo out into smaller repos to alleviate the performance problems. Our teams became responsible for setting up and configuring the following repos:

  • Repo for Project #1 – containing code that was to be deployed on-prem to clients. It contained the most complex build requirements as it was comprised of 6 separate modules
  • Repo for Project #2 – Containing code for cloud hosted applications which lived in Azure.
  • Repo for Frontend UIs supporting all Projects – these were written using Angular and supported users interacting with each of the applications
  • Repo for Automated UI-Testing for all Projects – these are tests written using Selenium that interacted with the UI of each Project to verify functionality

The most difficult element in this process was identified as the refactoring of code previously shared into a suitable form. It was decided that we would tackle this as it was encountered, opting for either duplication or Nuget packaging as the means to break it out.

I cannot understate how beneficial this break up was if for no other reason than showcasing that Git for Windows (or perhaps NTFS in general) is abysmal at handling large numbers of files. Our speed of interaction with Git went up by at least two orders of magnitude. It contributed immensely to our team reaching a record velocity. For myself, it underscored the importance of critically approaching project organization as a core part of the planning process.

Phase 2: Merge the builds (2wks)

With the code repos split and the teams enjoying better productivity than they had ever before over the course of the project I turned my attention to the final portion of this work: creation of a merged build to facilitate the efficient build of Project #1.

This effort was aimed at refactoring the existing segregated pipelines into a merged version designed to fulfill the following goals:

  • Removed duplicative tasks such as restores and builds
  • Remove “baggage tasks”, that is tasks that use “baggage” stored in source control and aim to use the standard tasks offered by Azure DevOps.
  • Leverage parallelism to maximize the flow of builds and remove any bottlenecks

“baggage tasks” are tasks that are supported by programs or scripts stored in source control. I consider this an anti-pattern as it unnecessarily ties you to your source control and needlessly bloats the size of your code base. Most modern DevOps platforms have built in tasks that can handle these tasks for you while removing its maintenance from the realm of your responsibility. In addition, as you move to a multi repo approach, you end up duplicating these scripts adding another layer of maintenance to your responsibilities.

The goal here is to effectively utilize our build platform and its features to increase simplicity and maintainability while reducing ceremony and “inherent knowledge”. In doing so, teams find the pipelines can more easily be maintained by anyone and problems area easier to discover because everyone understands the process.

Completing the Merged Build: The results

The results were, in a way, phenomenal. What once took 15m to create now took 5m (CI Build). And what once took 35m now took 15m (Release build). All of this resulting from a build script that was cleaner and more efficient than anything that had been there previously.

Further, since everything was under a single build we went from 12 separate build pipelines to 2, further enhancing developer productivity. This was the culmination of the goal we had set out to do.

The lone area I was not able to address was the notion of “true promotion”. That is, developer still had to kick off release builds by specifying the “commit hash”, though it only had to happen once now, instead of 6 times – this let us achieve our goal of a consistent build. This is a shortcoming in the way Azure DevOps approaches the concept of a build pipeline.

 Closing Thoughts

Overall, I am very pleased with how the new process is working and how it enables our teams to perform better. Perhaps more important though is the lessons it has taught us and our leadership just how important is to be clear and committed to DevOps from Day 0 and ensure the patterns and practices for the project are designated up front and are implemented from the get-go.

Testing the Water with EKS

Elastic Kubernetes Service or EKS is Amazon offering supporting managed Kubernetes (akin to Azure Kubernetes Service and Google Kubernetes Engine). As is often the case, Amazon takes a more infrastructure heavy approach than Azure meaning you will need to understand VPCs and Subnets when setting things up, since you will be defining things.

The good news is, Amazon offers a quick start tutorial here https://docs.aws.amazon.com/eks/latest/userguide/getting-started-console.html that will also provide the Cloud Formation scripts to set up the infrastructure for both the Control Plane and the Data Plane.

Before you get started

Amazon imposes limits on EC2 instances that an account is allowed to create, often this is very low – too low to support EKS. This is done to prevent users from accidentally spinning up resources they cannot pay for. You can see your limits here: https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#Limits: (If the link does not work, select EC2 and you will see Limits listed near the top of the left hand navigation).

The MINIMUM instance size that EKS can be used with is t2.small – you may have to request a limit increase, takes a day or two. If you do NOT do this, you will run into errors when you setup the data plane. The documentation does not call this out and it can be quite frustrating.

Part 1 – Initial Setup

Part 1 of the tutorial is divided amongst a few prerequisite actions: Creation of VPC and related Subnets via Cloud Formation, IAM Role creation, and verifying installation of the AWS CLI and kubectl.

If you have worked with Amazon before you will be familiar with the need to use VPCs to contain resources. This helps things stay organized and also provides a minimum level of security. I will say that, what is not called out is the need to raise your EC2 limits before executing this tutorial.

At a minimum, you will need to be able to deploy EC2 instances of type t2.small to support the Data Plane, if you are a new console user, you will likely have to request a limit increase. This action does not cost anything and the limit is purely in place to prevent new users from racking up charges for resources they do not need.

I should point out that this limit increase is ONLY needed for the Data Plane, not the Control Plane so, it is possible to get through half of the tutorial without it. However, you will find yourself unable to deploy Kubernetes resources without a working data plane. You can view your limits here: https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#Limits: (If the link does not work look for the Limits option under EC2)

Performing the Steps

Performing the steps is rather easy since most of the nitty gritty work will be handled by Cloud Formation. I encourage you to look over the scripts and get a sense of what is being created. For the most part its the standard VPC with subnets and an internet gateway.

Create the Control Plane

When we talk about managed Kubernetes, what is actually being referred to is a managed Control Plane. The Control Plane monitors and governs everything going on in a cluster so, it must be maintained at all costs. Kubernetes is designed to recover, automatically, from the loss of resources within the cluster. It can achieve this because the control plane responds to and addresses these problems.

Regarding the tutorial, it is straightforward and should be relatively easy. The one caution I would say is ensure the user that creates the cluster in the console is the SAME user your AWS CLI is configured to connect as (this is the default). If you fail to do this, you can receive authentication errors, provided additional configuration is not applied.

Update your kubectl context

The primary way to deploy resources to Kubernetes is via the managed API hosted on the Control Plane. This communication is handled via the kubectl assembly. kubectl operates via a “context” which tells it where commands will be executed. This is the purpose of the update-kubeconfig command in the next section. If you want to see a list of all your contexts, execute the following command:

kubectl config get-contexts

Each line entry here indicates a context you can use to talk to a Kubernetes cluster

Execute the final command in this section to verify you can talk to the Kubernetes Control Plane. If you run the following command you can see a host of resources in the Pending state – these will be deployed once a Data Plane is added to the cluster (next section)

kubectl get pods

Create the Data Plane

This next section is where you will get impacted if you have NOT raised your EC2 limits. EKS uses EC2 instances to support the Kubernetes Data Plane. Effectively, Kubernetes is nothing more than a resource scheduler. It schedules resources to run and uses the block of compute resources that are the Worker Nodes (EC2 instances) to host those resources.

On ephemeralism: The concept of ephemeral is very common within the Container ecosystem and Kubernetes. Everything within the cluster (outside the Control Plane) must be treated as a ephemeral. This means, you do not NOT want to persist state anywhere within the Cluster as you can lose it at any time.

I wont go into solutions for this but, when your deploy items that persist state in Kubernetes you need to be extra aware that it is viably being persisted.

Follow the instructions of this section, I recommend keeping the number of nodes to around 1 to 2 if this is for development and testing. Remember, in addition to paying for cluster time and the resources related to the VPC, you will also be paying for the EC2 instances – this can add up quickly. I recommend using t2.small for testing purposes as it works out to be the cheapest.

Add Your Nodes into the Cluster

As an extra step, once you create the EC2 instances that will be the worker nodes in the cluster you need to add them. I have yet to find the option that enables auto provisioning (this might be Fargate territory).

Once you finish executing the commands run the following command:

kubectl get pods

With luck, you should now see movement in the status of your nodes (mine were pretty fast and came to Running) in seconds.  Congrats, your cluster is now working, to prove that, lets launch the Kubernetes Cluster Dashboard.  Follow the instructions here: https://docs.aws.amazon.com/eks/latest/userguide/dashboard-tutorial.html

Let’s Deploy Something

Our cluster is pretty useless, so lets deploy an API to it. For this, I wrote up a pretty basic .NET Core Web API that does math, here is the source of the main controller:

Next, I create the Docker image using this Dockerfile

I recommend building it in the following way:

docker build -t calc-api:v1 .

Next, you have a decision to make. Assuming you have setup authentication with Docker Hub (via docker login) you can tag the image with your username and push, for me:

docker tag calc-api:v1 xximjasonxx/calc-api:v1

docker push xximjasonxx/cacl-api:v1

Or, if you want to take the Amazon route, you can create an Elastic Container Registry (ECR) and push the image there. To do this, simply select ECR from the Service options and create a registry. Once that is complete, Amazon will provide you with the appropriate commands.

The main point to understand is, Kubernetes will expand and contract the number of Pods that host your app as needed. To run containers on these pods, the source images need to be an accessible location, that is why we use a registry.

Once your image is pushed up you can apply a podspec file to add the resource to Kubernetes. Here is my podspec (I am using ECR):

Run the apply command as such:

kubectl apply -f deployment-spec.yaml

Once the command completes run this command and wait for the new pods to enter the Running state:

kubectl get pods –watch

Congrats you deployed your first application to Kubernetes on EKS. Problem is, this isnt very useful because the Cluster offers us no way to make our API calls. For this we will create a service.

Accessing our Pods

When it comes to accessing resources within a cluster there are a couple options: Services and Ingress. Ingress we wont discuss here, its a rather large topic. For this simple example a Service will be fine.

Here is the documentation from Kubernetes on Services: https://kubernetes.io/docs/concepts/services-networking/service/

What you need to understand is Services are, simply, the mechanisms by which we address a group of Pods. They come in four flavors: ClusterIP (default), NodePort, LoadBalancer, and ExternalName.

Locally, I like to use NodePort because I am often using minikube. When you deploy to the Cloud, the recommendation is to use LoadBalancer. Doing so will have AWS automatically deploy a LoadBalancer with an external hostname. Here is my servicespec:

Of note here is the selector node. This tells the service what Pods it is addressing, you can see the app and type values match from the Deployment spec above.

Execute the following command:

kubectl apply -f service-spec.yaml

Next use the following command to know when the service is available and addressable:

kubectl get svc –watch

Here is what my final output looked like

Selection_011

Once this is up you can use Postman or whatnot to access the endpoints on your API.

Congrats – you have deployed your first application to EKS. Do NOT forget to tear everything down, EKS is not something you want to leave running for a long duration in a personal use context.

My Thoughts

So, going into this my experience had been more on the Azure side and with minikube than with EKS. Without surprise I found EKS to be a bit more technical and demanding than AKS, mostly due to with the need for a Limit increase not being documented and the heavier emphasis on the infrastructure, which is typical with many of Amazon’s services; in contrast AKS hides much of this from you.

Overall, the rise of managed Kubernetes services like EKS is very good for the industry and represents a step closer to where I believe applications need to be: that is not caring about the underlying services or piping but, just deployed as their code running what is defined that they need. That is still a ways off but, it is fascinating that so much effort was spent to get to the cloud and then, with Kubernetes, we are trying to make it so the What cloud question no longer matters.