Deploying Containerized Azure Functions with Terraform

I am a huge fan of serverless and its ability to create more simple deployments with less for me to worry about and still with the reliability and scalability I need. I am also a fan of containers and IaC (Infrastructure as Code) so the ability to combine all three is extremely attractive from a technical, operational, and cost optimization standpoint.

In this post, I will go through a recent challenge that I completed where I used HashiCorp Terraform to setup an Azure Function app where the backing code is hosted by a Docker Container. I feel this is a much better way to handle serverless deployments instead of the referenced Zip file I have used in the past.

You need to be Premium

One of the things you first encounter when seeking out this approach is that Microsoft will only allow Function Apps to use Custom Docker Images if they use a Premium or Dedicated App Service Plan (https://docs.microsoft.com/en-us/azure/azure-functions/functions-create-function-linux-custom-image?tabs=nodejs#create-an-app-from-the-image) on Linux.

For this tutorial I will use the basic Premium Plan (SKU P1V2). A quick reminder, you Function App AND its App Service Plan MUST be in the same Azure region. I ran into a problem trying to work with Elastic Premium which, as of this writing, is only available (in the US) in East and West regions.

Terraform to create the App Service Plan (Premium P1V2)

Pretty straightforward. Far as I am aware container based hosting is ONLY available on Linux plans – Windows Container support is no doubt coming but, no idea when it will be available, if ever.

Create the Dockerfile

No surprise that the Docker image has to have a certain internal structure for the function app to be able to use it. Here is the generic Dockerfile you can get using the func helper via the Azure Function Tools (npm).

func init MyFunctionProj –docker

This will start a new Azure Function App project that targets Docker. You can use this as a starting point or just to get the Dockerfile. Below is the contents of that Dockerfile:

At the very least its a good starting point. I assume that when Azure runs the image it looks mount into known directories hence the need to conform as this Dockerfile does.

Push the Image

As with any strategy that involves Docker containers we need to push the source image to a spot where it can be accessed by other services. I wont go into how to do that, but I will assume you are hosting the image in Azure Container Registry.

Deploy the Function App

Back to our Terraform script, we need to deploy our Function App – here is the script to do this:

The MOST critical AppSetting here is WEBSITES_ENABLE_APP_SERVICE_STORAGE and its value MUST be false. This indicates to Azure to NOT look in storage for metadata (as is normal). The other all cap AppSettings are access to the Azure Container Registry – I assume these will change if you use something like Docker Hub to host the container image.

Note also the linux_fx_version setting. If you have visited my blog before you will have seen this when deploying Azure App Service instances (not surprising since a Function App is an App Service under the hood).

Troubleshooting Tips

By far the best way I found to troubleshoot this process was to access the Kudu options from Platform Features for the Azure Function App. Once in, you can access the Docker Container logs (have to click a couple links) and it gives you the Docker output. You can use this to figure out why an image may not be starting.

This was what led me to ultimately discovered the APP_SERVICE_STORAGE setting (above) as the reason why, despite the container starting, I never saw my functions in the navigation.

Hope this helps people out. I think this is a very solid way to deploy Azure Functions moving forward though, I do wish a Premium plan was not required.

Using Anchore with Azure DevOps

Anchore is a Container Image scanning tool that is used to validate the security of containers deployed for applications. I recently undertook an effort to build a custom Azure DevOps task to enable integration with this tool; nothing previously existed.

To get a feel for how this process works, this is a high level diagram of the underlying steps:

AnchoreFlow

Under the covers we contact the given Anchore Engine server after adding out built image to an accessible registry. This contact exists as a polling operation which waits for the status to change.

The Setup

For my approach, I elected to standup an Ubuntu VM in the Azure cloud and opt to run the server using docker-compose. The steps to this are here: https://docs.anchore.com/current/docs/engine/engine_installation/docker_compose. Despite being listed in the Enterprise documentation (the pay for version of Anchore, it does work for the OSS version).

Setup can take a few minutes once started since the engine needs to download information necessary to carry out scanning related functions. This actually makes it a bit faster since the information will be cached ahead of your scan requests.

Once that is complete you need to add a user. This is done as part of an account. When you perform commands against the API and give this username and password, the information is stored relative to that account. Meaning two accounts do not share things like images or registered registries.

Link: https://docs.anchore.com/current/docs/engine/usage/cli_usage/accounts

Next, we need to register our registry, though I am not certain if you need to this if using a public repository on Docker Hub – for my purpose I am using Azure Container Registry (ACR). When the ACR is created you will need to enable admin mode to have ACR generate a username and password that can be used to access.

Acr

Once you have established these values, you simply need to register the registry using the registry add command:

anchore-cli –u someUser –p somePass –url someUrl registry add SomeAcrLoginServer SomeAcrUsername SomeAcrPassword

Link: https://docs.anchore.com/current/docs/engine/usage/cli_usage/registries

With this, our Anchore Engine is set to go.

I have noticed that if the engine is kept idle for a period of time, it will stop analyzing incoming images. You need only stop and start using docker-compose. Not entirely sure why this is happening

Using the Anchore Task

The task is available, publicly, here: https://marketplace.visualstudio.com/items?itemName=Farrellsoft.anchore-task

The task contains a full listing of the various properties that are currently available; it pales in comparison to the full functionality offered by Anchore Engine, but more functionality will be added over time.

In terms of the flow you should be thinking about, here is a diagram that lays out how I see it working:

anchore-task-flow

One of the key things to note here is the “double push” of the image. We do this because we would want to segregate images created with each build with those that are suitable to move on the higher environments. Also, in the event of a rollback, we would want Ops to have to figure out which images passed the check and which ones did not.

We also do it because Anchore needs to be able to grab the image for scanning and will not be able to access it on our build agent.

Previous to the steps above you might choose to run unit tests on the artifact before it goes into the image. You could also choose to run your unit tests within the container to guarantee your assumptions within the actual execution environment.

Some Notes

I set up the Anchore Task to only make 100 attempts to check for status change, with a 5s wait between each attempt. In the event the Anchore Engine server is experiencing problems, I would rather the build fail than it enter an endless looping state. In the future, I intend to make this a configurable flag.

As I am writing this, we are still in the early days of the extension and there will no doubt be more features added. My main focus with this post was to cover setup and general usage.

 

Anchore Integration with Azure DevOps

One of my goals is to constantly find ways to improve the quality of the product that teams put into their environments from Dev all the way to Production. At West Monroe, we started exploring various reference patterns for DevOps flows; including those that had Container Image scanning as part of the flow – I decided to look into Anchore and their Open Source Container scanning product as part of my end of this effort.

Azure DevOps, however, did not support any sort of existing integration with Anchore, nor does the team have anything in Azure that enables fast setup. In a future post, I intend to walk through the actual setup of the Engine Server and API.

For the Azure DevOps side, I was challenged by a colleague to create a custom task to support this integration. I have made this open source on my GitHub (https://github.com/xximjasonxx/anchore-task-extension).

It is still very much in its infancy and I do intend to prepare a Readme file to better explain its usage. The GH repo also contains a Project tab which has some goals that I have, very limited at the moment.

Here is the link to the extension in the DevOps marketplace: https://marketplace.visualstudio.com/items?itemName=Farrellsoft.anchore-task

If you would like to contribute and help out please do so, I want this to be a collaborative effort. Thanks in advance

 

Serverless Proxy Pattern: Part 5

Part 4 is here

Source code here

Full Disclaimer
It would appear the script referenced in previous parts was not re-entrant, that is, it contained good information but, over the course of building it, things fell into place rather than being in place. Thus, I noted it failed to setup when I ran it clean. Version 1.1 addresses this and will be referenced here

Now that we our app up and running its time to be critical of things, mainly permissions. As I stated in Part 1 AWS recommends the use of roles to fulfill service level tasks so that credentials are never stored in source control or within applications, thus lessening the blast radius in the event of compromise. However, in saying that we recognize that have a setup where all of the roles we use are open to everything is not good either. Thus first we will focus on splitting and refining AppRole.

Get the Order Right

Cloud Formation makes every attempt to determine the order in which to create resources, when it needs help you can often use the DependsOn attribute. I have personally found this wanting so, the first thing I did was establish the order of my resources so things could be efficiently and correctly created. Here is that order:

  • ThumbnailBucket (S3 Bucket)
  • ThumbnailBucketPolicy (S3 Bucket Policy)
  • ImageTable (DynamoDB Table)
  • CreateThumbnailFunctionRole (IAM Role)
  • CreateThumbnailFunction (Lambda)
  • AnalyzeImageFunctionRole (IAM Role)
  • AnalyzeImageFunction (Lambda)
  • ImageUploadTopic (SNS Topic)
  • ImageUploadTopicPolicy (SNS Topic Policy)
  • RawBucket (S3 Bucket)
  • RawBucketPolicy (S3 Bucket Policy)
  • CreateThumbnailFunctionSNSInvokePermission (Lambda Permission)
  • AnalyzeThumbnailFunctionSNSInvokePermission (Lambda Permission)
  • ApiGatewayRest (API Gateway)
  • BucketApiResource (API Gateway Resource)
  • ImagesApiResource (API Gateway Resource)
  • BucketItemApiResource (API Gateway Resource)
  • BucketItemApiMethodRole (IAM Role)
  • BucketItemApiMethod (API Gateway Method)
  • ImagesApiMethodRole (IAM Role)
  • ImageApiMethod (API Gateway Method)
  • ApiGatewayDeployment (API Gateway Deployment)
  • DefaultApiGatewayStage (API Gateway Stage)

The items highlighted in BOLD are new moving into this part and mostly represent the result of splitting AppRole. As an example here is CreateThumbnailFunctionRole:

You can see that I kept the open resource for CloudWatch, XRay, and Logs but I restricted the rights to the ThumbnailBucket to Put and Get type actions only. This disallows the role from reading information out of the Thumbnail bucket.

Further supporting this is the AssumeRolePolicy which enforces that ONLY Lambda can assume this role and no other service. This is key given the very tight use case the function and its coupling to this specific bucket.

The other interesting thing here is the user of !Sub to specify the ARN of the S3 Bucket holding raw images. Cloud Formation will always do its best to mitigate circular dependencies but, especially wit this sort of role structure, has a hard time. Thankfully AWS has standard format for resource names. Using that we can specify it here to prevent Cloud Formation from looking at its template for the ARN – this mitigates the Circular Dependency that would happen here otherwise.

Create Bucket Policies

One thing I did learn while working through this was, curiously, if I specified Resource: “*” for within my Roles for the S3 bucket things worked fine but, the moment I went resource specific I got Access Denied.

After much Googling (seriously why do all of the AWS docs never call out resource specific roles) I guessed and was correct that in addition to the Role I needed a Bucket Policy to allow the action as well.  Here is the policy for RawBucket which allows API Gateway to Proxy the PUT call to create objects:

Here we specify that, for RawBucket, we allow Lambda services to read from it (necessary to support the Thumbnail creation process) and we allow API Gateway to write it (necessary to support the PUT operations our API Gateway will proxy. Of particular note here is the /* you see – this is necessary to indicate our policy is on the bucket contents and not the bucket itself.

SNS Endpoint Verification Check

One of the other problems I came across with testing was that SNS, when it creates, will send a pulse to the Lambda functions to ensure they are there. I didnt catch this the first time because of the order in which I created things the first time. Handling this is rather easy as SNS simply sends a s3:TestEvent to the function. Both CreateThumbnail and AnalyzeImage have this code to handle the check:

Conclusion

I hope this series has been informative and worthwhile. It took a lot of care and thought to make and a lot of trial and error. Cloud Formation, and similar tools, take a lot of effort to learn and understand but, ultimately, they can save you a ton of grief. One of the value props for Infrastructure as Code (IaC) is that it enables both quick creation of environments and access to more efficient DevOps processes.

This is best manifested, right now, in JenkinsX approach to spin up entire environments in Kubernetes for testing per PR. This way, we have a way to run a complete battery of tests to ensure the code entering the trunk (and possibly being deployed automatically to production) is valid and tested.

Again here is the completed code:

https://github.com/xximjasonxx/ThumbnailCreator/tree/release/version1.1

Serverless Proxy Pattern: Part 4

See Part 3 here

To this point, we have defined what I would say are the “backing” services for our application. We could write frontend apps to send images to our bucket which would trigger the overall process but, in keeping with the theme, I want to provide a “codeless” way (or serverless) to interact with our services.  Enter API Gateway.

API Gateway: The Heart of the Serverless Proxy Pattern

API Gateway (and API Management on Azure) can act as proxies forwarding received calls to backing services to perform operations. This means we can dispense with writing any code to work with these services and instead rely on the API Gateway to handle the underlying calls for us.

To create an API Gateway in AWS we must first create a RestApi type resource. This serves as the container for related resources of type Resource and Method. Here is the YAML:

 

 

 

 

 

Next, we need to define our resources. If you are familiar with the nomenclature from REST standards you will recognize this term. It is the “thing” that defines what is being acted against. It can often map to database models but, does not necessarily have to.

In our case we will define our resources as such:

 

 

 

 

Each level of the API is defined as a resource such that we end up supporting the two following path structures:

  • /all-images
  • /<bucket_name>/<key>

Of particular note is the usage of the bucket name for the root resource for key – this appears to be required or, at least, I cannot find a way around it. Now that our resource structure in place we define the “actions” that will take place when the path is matched:

PUT an Object

First, we will configure out /<bucket_name>/<key> route to allow a POST verb to invoke it and forward that onto S3 using PUT to create (or update) the object in blob storage.

 

 

There is quite a bit here so lets hit the crucial areas. We define HttpMethod to indicate that the method can only be invoked using POST and that the Authorization will be gleaned from IAM credentials associated with the underlying integration calls (recall our role we defined in Part 1).

Next we define RequestParameters which in this case lets us enforce that item must be provided. Where the real magic happens is the IntegrationHttpMethod and Integration sections:

  • IntegrationHttpMethod: This is the verb that API Gateway will use when invoking whatever service it will call when the resource path matches.
  • Integration: This is the meat of this operation, here we define various mappings for values to pass to the underlying service call, what kind of integration we will (AWS Service in this case) and the Uri we will call (in this case it is the S3 bucket path)

What this ends up facilitating is a PUT using the bucket-name and key against the S3 bucket object path to pass the contents of the request to S3 API which will create the appropriate object – all without writing a line of code.

The final bit of this is the IntegrationResponses where we define that the integration will respond with a 201 (Created) – this lines up with MethodResponses intentionally.

Get All Image Data

The second resource will return the contents of our DynamoDB table ImageDataTable that we created in Part 3. For this one, the integration portion is a bit trickier, have a look:

 

The key thing to understand here is we indicating we want to call an action against the DynamoDB API – the action in this being Scan (wont go into the differences here between Query and Scan).

Scan expects a JSON block to define various parameters that it will use to perform filtering and selection. In this example, we are dropping the filtering aspect and simply specifying our DynamoDB table name so that all contents of that table are returned.

Also of note here is the IntegrationHttpMethod. Despite the fact that our API Gateway resource is called using GET, we must use POST to call the DynamoDB Action API.

Once this is in place you can get a dump of the table contents from DynamoDB. The one outstanding issue I have with this is the JSON comes back as it exists in DyanmoDB which will not look like most JSON blocks you have likely seen – but it is easily parsable by DyanamoDB libraries. Still, a standing goal of mine is to get this into a more normalized JSON block for easier consumption.

Deploying the API

The final bit here is making the API Gateway accessible via the web. This is done by deploying an API Gateway Deployment and Stage resources. In practice, these are designed to serve as a means enable environmentalization of the API Gateway so cahnges can be promoted, similar to code changes.  Here is what I defined for these resources:

 

 

 

Ending this Part

In this part I went through the complete setup of the API Gateway that we are using to proxy Amazon services: S3 and DynamoDB. This approach gives a lot of the basic functionality we find in applications but without needing to write any code and all definable via Cloud Formation. Now that our application is stood up, our next parts will focus on tweaking things to make them more secure and more efficient.

As always you can reference the complete source here: https://github.com/xximjasonxx/ThumbnailCreator/tree/release/version1.1

Part 5 is here

Serverless Proxy Pattern: Part 3

Part 2 is here

In the previous parts we worked to setup our basic AWS infrastructure including SNS, S3, and Role. We also added the appropriate permissions for the flow of events to work unimpeded. Next, we setup a typical CI/CD flow using Azure DevOps so that as we make changes to code and infrastructure our application is changed appropriately; this fulfills the main tenants of GitOps and Infrastructure as Code (IaC). In this part, we will actually develop the code for our Lambdas and test that our various other resources are set up correctly.

Define the DynamoDB Table

In keeping with our main theme, we will deploy a DynamoDB table using Cloud Formation. Here is the YAML we will use:

Something to note here is, I chose to NOT use a parameter to define the name of the table. You definitely could but, that then speaks to deployment considerations since you generally do not want your table names disappearing. Also, with Dynamo the naming is localized to your Amazon account so you dont have to worry about extra-account conflicts.

What is DynamoDB?
DyanamoDB is a NoSQL database available in both normal and global variants from Amazon. It is ideal for handling use cases where data is unstructured and/or coming in a high volumes where enforcing consistency found in RDBMS databases is not a primary concern.

In our case, the data that we store will be from Amazon Rekognition Label Detection which will be consistently different and thus makes sense to store in a NoSQL fashion.

The way Dynamo works is it expect SOME structure (in this case we guarantee there will ALWAYS be an Id column) provided which serves as the tables index. There is a great amount of flexibility in how primary and secondary keys are defined along with sort indexes within those key structures.

Create Thumbnail Function

Our first Lambda will respond to an image being added to our “raw” bucket and create a copy of that image with its dimensions reduced (thumbnail). Originally, when I did this I used System.Drawing but was met with a libgdiplus error; the error happens because System.Drawing is built on gdiplus which is not installed, by default, into Ubuntu Docker images. Rather than attempting to get this work I did some research and found the SixLabors imaging library that was featured at re:Invent. (link: https://sixlabors.com/projects/imagesharp)

One of the other interesting bits to this is, when Lambda receives an event from SNS the format is a bit different from when its comes from S3 directly. For that I create a mapping classset that can be used with Newtonsoft JSON.net.

Here is the core logic which does the resizing – outside of this it is all about reading the Stream from S3 and writing it to our Thumbnail S3 bucket.

Analyze Image Function

In addition to creating the thumbnail image we also want to run the image through Amazon Rekognition (Computer Vision) and use the Detect Labels to gather data about the image. This data will then be written to our DynamoDB table.

In Dynamo each row is unstructured and can have a different schema – each column for that document is represented by a key on the provided ItemDataDictionary (as shown above).

As always for reference here is the complete source: https://github.com/xximjasonxx/ThumbnailCreator/tree/release/version1

Part 4 is here

Serverless Proxy Pattern: Part 2

Part 1 available here

In the previous part, I talked through the motivation for build this application and the emphasis on using CloudFormation entirely as a means to enable efficient and consistent setup.

To start the template we specified our role (AppRole) and talked through some of the decisions and potential improvements with the approach. Next we setup our S3 buckets and configured the “raw” bucket to send a notification, via SNS to two (not yet created) Lambda functions which will handle various operations against the image data that was uploaded.

In this part, we will setup the appropriate permissions for the Lambdas so they can be corrected invoked from SNS. We will also talk through setting up GitOps so that our code and infrastructure can be deployed quickly and automatically enabling fast development.

Permissions

The one thing you learn very quickly working with AWS is the intense focus around security at all levels. This is mainly driven by “policies” which are attached to things like users and roles and specifies what actions may be taken by the attached. We talked about this a little in Part 1 and its worth repeating here – for the sake of getting started I am using a set of fairly wide-open policies, the goal being to refine these towards the end.

We can see this via the TopicPolicy which enables S3 to publish messages to the SNS Topic. Here it is in YAML format:

 

 

 

 

 

 

Keep in mind that ImageUploadTopic was described at the end of Part 1 and represents the topic we using to “fan out” bucket notifications.

Here we are declaring a policy and associating ith with the ImageUploadTopic resource (via the Resource portion of the policy statement). Effectively this allows S3 to public to this topic (the Principal). We can be further specific here, though I have chosen not to be, with what S3 resources can publish to this Topic.

With this in place, our S3 bucket can now publish messages to our SNS topic but, nothing will happen if we test this. Why? Because SNS does NOT have permissions to invoke our Lambda functions. Let’s set that up next.

Lambda Permissions

Recall from Part 1 this is what our SNS declaration looked like in YAML:

 

 

 

 

 

 

 

 

 

 

Since we have now added our TopicPolicy (above) SNS will attempt to invoke our lambdas, and will summarily fail. Why? Because, as you might guess at this point, it does not have permission to do so. For that we need to create Lambda permissions for our functions (though you could also create a more general permissions if you have many Lambda functions).

Here are the permission declarations in YAML:

 

 

 

 

 

Here we are specifying the SNS may invoke these functions, obviously needed. Next, we will create the actual Lambda functions we will be invoking.

Why are we using SNS?

There may be a question in your mind of why I am choosing to use SNS instead of straight Lambda invocation. The answer is, as I said in Part 1, S3 does NOT support multiple delivery of events so, if you are going to do a sort of “fan out” where the event is received by multiple services you need to use SNS. Alternatively, you could leverage Step functions or a single Lambda that calls other Lambdas. For me, I feel this is the most straightforward and keeps with the “codeless” theme I am going for as part of my “serverless” architecture.

Infrastructure as Code

Cloud Formation stands with other tools as a way to represents infrastructure needs in code, it is a core part of the GitOps movement that states all changes to an application should be invoked automatically from source control; source control needs to be the single source of truth for the application.

In general, this is part of a larger movement in the industry around “what is an application?”. Yes, traditionally, this used to be simply code which would then run on infrastructure provisioned, in some cases, by a totally separate team. As we have moved into Cloud though, this has shifted and coalesced to the point where Infrastructure and Code are intermingled dependent on each other.

When I speak with teams on these points, I emphasize the notion that the “application” is everything. And it being everything, your infrastructure definition is just as important as your compiled code. Gone are the days were it is acceptable for your Cloud infra to exist transiently in a console. It is now expected that your applications infrastructure in represented in code and versioned along with the rest of our application. That is why we are using Cloud Formation here (we could also use Terraform and similar tools).

GitOps

GitOps simply states that everything about our applications exists together and that commits to Git should be the means by which updates to our code or infrastructure are made – in other words, nothing is ever done manually.

For this application, as we enter into Lambda territory, we will want a way to compile and deploy our Lambda code AND update our infrastructure via Cloud Formation. The best way to do this is to setup a CI/CD pipeline. For this, I will leverage Azure DevOps.

Why not AWS Services?

It may seem odd to use Azure DevOps to deploy AWS infrastructure but, its not as uncommon as you might think. This due to the AWS tooling being awful – I really believe Amazon sought to simply say “we can do that” than create a viable platform supporting developer productivity. Many of the teams I have seen using AWS use Azure or, in other cases, will deploy a Jenkins server – rarely do I see people actually using CodePipeline and CodeBuild.

CI/CD Simplified

I will spare you the details of setting up the pipelines in Azure and doing the releases. I will leave you with the YAML file that represents the build pipelines that builds the Analyze Image and Create Thumbnail functions in tandem while publishing out the Cloud Formation template.

 

 

 

The gist of this is simple – we compile each function into a separate Zip file and publish that zip file so we can use it in our Release pipeline. Additionally we publish our infra.yaml which is our Cloud Formation template.

In the release pipeline (not shown) we use the S3 Upload tasks to upload the resulting zip files to an S3 bucket which houses the application artifacts. We then run a Stack Update task with the Cloud Formation template. This will replace the code for those Lambdas, here is the YAML:

 

 

The key thing here is the introduction of the CreateThumbnailLambdaVersionFile and AnalyzeImageLambdaVersionFile parameters which are used as the S3Key value for the Lambdas; this is fed to the template at runtime by the DevOps Release pipeline, like this:

 

This is what is meant by using GitOps – all changes to our application happen via Git operations, we never do anything manually. This sets up so that with a proper automated testing layer, we can achieve true Continuous delivery of our application.

That is all for this section – in the next Part we will write the actual code for our application which adds objects to our Thumbnail bucket and writes the data to DynamoDB.

As always here is the complete source if you want to skip ahead:
https://github.com/xximjasonxx/ThumbnailCreator/tree/release/version1

Part 3 is here