Durable Functions: Part 2 – The Upload

I covered the opening to this series in Part 1 (here). Our overall goal is to create a File Approval flow using Azure Durable Functions and showcase how complex workflows can be me managed within this Azure offering. This topic exists in a much larger topic that is Event Driven design. Under EDD we aim to build “specialized” components which respond to events. By doing taking this approach we write only what code should do an alleviate ourselves of boilerplate and unrelated code. This creates a greater degree of decoupling which can help us when change inevitably comes. It also allows us to solve specific problems without generating wasteful logic which can hide bugs and creates other problems.

In this segment of the series, we will talk through building the upload functionality to allow file ingestion. Our key focus will be the Azure Function binding model that allows boilerplate code to be neatly extracted away from our function. Bindings also underpin the event driven ways we can working with functions, specifically allowing them to be triggered by an Azure Event.

Let’s get started. As I move forward I will be assuming that you are using Visual Studio Code with the Azure Function tools to create your functions. This is highly recommended and is covered in detail in Part 1.

Provision Azure Resources

The first thing we will want to do is setup our Azure resources for usage, this includes:

  • Resource Group
  • Azure Storage Account (create a container)
  • Azure Table Storage (this is the Table option within Cosmos)
  • Cognitive Services (we will use this in Part 3)

As a DevOps professional my recommended approach to deploying infrastructure to any cloud environment is to use a scripted approach, ideally Terraform or Pulumi. For this example, we will not go into that since it is not strictly my aim to extol good DevOps practices as part of this series (we wont be deploying CI/CD either).

For this simple demo, I will leave these resources publically available thus, we can update local.settings.json with the relevant connection information as we develop locally. local.settings.json is a special config file that, by default, the template for an Azure Function project created by the VSCode Azure Functions will excludes from source control. Always be diligent and refrain from checking credentials into source control, especially for environments above Development.

Getting started, you will want to having the following values listed in local.settings.json:

  • AzureWebJobsStorage – used by Azure Functions runtime, the value here should be the connection string to the storage account you created
  • FUNCTIONS_WORKER_RUNTIMEdotnet – just leave this alone
  • StorageAccountConnectionString – this is the account where our uploads are saved too, again it can share the same Storage Account that you previously created
  • TableConnectionString– this is the connection string to the Azure Table Storage instance
  • CognitiveServicesKey – the value of the key given when you create an instance of the Azure Cognitive Services resource
  • CognitiveServicesEndpoint – the value of the endpoint to access your instance of the Azure Cognitive services

Here is the complete code for the Azure Function which handles this upload:

public static class UploadFile
{
[FunctionName("UploadFile")]
public static async Task<IActionResult> Run(
[HttpTrigger(AuthorizationLevel.Anonymous, "post", Route = "file/add")] HttpRequest uploadFile,
[Blob("files", FileAccess.Write, Connection = "StorageAccountConnectionString")] CloudBlobContainer blobContainer,
[Table("metadata", Connection = "TableConnectionString")] CloudTable metadataTable,
ILogger log)
{
var fileName = Guid.NewGuid().ToString();
await blobContainer.CreateIfNotExistsAsync();
var cloudBlockBlob = blobContainer.GetBlockBlobReference(fileName);
await cloudBlockBlob.UploadFromStreamAsync(uploadFile.Body);
await metadataTable.CreateIfNotExistsAsync();
var addOperation = TableOperation.Insert(new FileMetadata
{
RowKey = fileName,
PartitionKey = fileName,
});
await metadataTable.ExecuteAsync(addOperation);
return new CreatedResult(string.Empty, fileName);
}
}
view raw upload.cs hosted with ❤ by GitHub
Figure 1 – File Upload for Azure Function

The code looks complex but, it is actually relatively simple given to the heavy use of Azure Function bindings. There are three in use:

  • HttpTrigger – most developers will be familiar with this trigger. Through it, Azure will listen for Http requests to a specific endpoint and route and execute this function when such a request is detected
  • Blob – You will need this Nuget package. This creates a CloudBlobContainer initialized with the given values. It makes it incredibly easy to write data to the container.
  • Table – Stored in the same Nuget as the Blob bindings. This, like the blob, opens up a table connection to make it easy to add data, even under high volume scenarios

Bindings are incredibly useful when developing Azure Functions. Most developers are only familiar with HttpTrigger which is used to respond to Http requests but there is a huge assortment and support for events from many popular Azure resources. Using these removes the need to write boilerplate code which can clutter our functions and obscure their purpose.

Blob and Table can be made to represent an item in their respective collections or a collection of items. The documentation (here) indicates what types method arguments using these attributes can be. Depending on how you use the attribute it can be a reference to the table itself, a segment of data from that table (using the partition key), or an item itself. The Blob attribute has similar properties (here).

One thing to keep in mind is a “bound parameter” must be declared as part of a trigger binding attribute to be used by other non-trigger bindings. Essentially it is important to understand that bindings are bound BEFORE the function is run, not after. Understanding this is essential to creating concise workflows using bindings.

Understanding Binding through an example

Taking our code sample from above (repasted here)

public static class UploadFile
{
[FunctionName("UploadFile")]
public static async Task<IActionResult> Run(
[HttpTrigger(AuthorizationLevel.Anonymous, "post", Route = "file/add")] HttpRequest uploadFile,
[Blob("files", FileAccess.Write, Connection = "StorageAccountConnectionString")] CloudBlobContainer blobContainer,
[Table("metadata", Connection = "TableConnectionString")] CloudTable metadataTable,
ILogger log)
{
var fileName = Guid.NewGuid().ToString();
await blobContainer.CreateIfNotExistsAsync();
var cloudBlockBlob = blobContainer.GetBlockBlobReference(fileName);
await cloudBlockBlob.UploadFromStreamAsync(uploadFile.Body);
await metadataTable.CreateIfNotExistsAsync();
var addOperation = TableOperation.Insert(new FileMetadata
{
RowKey = fileName,
PartitionKey = fileName,
});
await metadataTable.ExecuteAsync(addOperation);
return new CreatedResult(string.Empty, fileName);
}
}
view raw upload.cs hosted with ❤ by GitHub

So here I am creating the unique Id (called fileName) in code. If I wanted to, I could specify {id} in the HttpTrigger as part of the path. This would give me access to the value of {id} in other bindings or as a parameter to the function called id. In this case, it would amount to relying on the user to give me a unique value, which would not work.

I hope that explains this concept, I find understanding it makes things easier and more straightforward with how you may choose to write your code. If not, there will be other examples of this in later sections and I am happy to explain more in the comments.

The Upload Process

Now that we have covered the binding the code should make a lot more sense if it did not before.

Simply put, we:

  1. Call Guid.NewGuid().ToString() and get a string representation or a new Guid. This is our unique Id for this file upload
  2. The binary stream accepted through the Http Post request is saved to a block reference into our Azure Storage Account
  3. Next, the initial record for our entry is created in the Azure Table Storage (Approval flags are both set to false)
  4. We return a 201 Created response as is the standard for Post operations which add new state to systems

Straightforward and easy to understand and thanks to the bindings all heavy lifting was done outside of the scope our function allowing it to clearly express its intent.

Why Azure Table Storage?

Azure Table Storage is an offering that has existed for a long time in Microsoft Azure, it only recently came to be under the Cosmos umbrella along with other NoSQL providers. The use of Table Storage here is intentional due to its cost effectiveness and speed. But, it does come with some trade-offs:

  • Cosmos Core (DocumentDb) offering is designed as a massive scalable NoSQL system. For larger systems with more volume, I would opt for this over Table Storage – though you get what you pay for, its not cheap
  • DocumentDb is a document-database meaning the schema is never set in stone and can always be changed as new records are added. This is not the case with Table Storage which will set its schema based on the first record written.

When making this decision it is important to consider not just current requirements but also near-term requirements as well. I tend to like Table Storage when the schema is not going to have a lot of variance and/or I want a NoSQL solution that is cheap and still effective. Cosmos Core is the other extreme where I am willing to pay more high redundancy and greater performance as well as using a document database where my schema can be different insert to insert.

Triggering the Workflow

Reading the upload code you may have wondered where the workflow is triggered or how it is triggered. By now the answer should not surprise you: binding. Specifically, a BlobTrigger which can listen for new blobs being added (or removed) and trigger a function when that case is detected. Here is declaration of the Azure Durable Function which represents the bootstrapping of our workflow.

[FunctionName("ApproveFile_Start")]
public static async Task HttpStart(
[BlobTrigger("files/{id}", Connection = "StorageAccountConnectionString")] Stream fileBlob,
string id,
[Table("metadata", "{id}", "{id}", Connection = "TableConnectionString")] FileMetadata metadata,
[Table("metadata", Connection = "TableConnectionString")] CloudTable metadataTable,
[DurableClient] IDurableOrchestrationClient starter,
ILogger log)
{
//
}
view raw workflow_start.cs hosted with ❤ by GitHub
Figure 2 – Workflow start declaration

As you can see here, we are starting to get a bit nuts with our triggers. Here is a brief summary:

  • We use a BlobTrigger to initiate this function and {id} to grab the name of newly created blob (this will be the Guid which is generated during upload)
  • The Table attribute is used twice: once to reference the actual Table Storage record represented by the newly created blob and another as a reference to the metadata table where the referenced row exists (we need this to write the row back once its updated)
  • Finally DurableClient (from this Nuget package) which provides the client that allows us to start the orchestrator that will manage the workflow

I will go into much more depth on this in Part 3 but the one point I do want to call out is Table attribute is NOT two way. This means, even if you reference the single item (as we did in our example), changes to that single item are NOT saved back to the table – you must do this manually. This is important as it drives the reason we see some rather creative uses of this attribute.

Closing

We explored some code in this portion of the series, though it was not immediately tied to Durable Functions it was, tied to event driven programming. Using these bindings we can create code that alleviates itself from mundane and boilerplate operations and allow other systems to manage this on our behalf.

Doing this gets us close to the event driven model discussed in Part 1 and allows each function to specialize in what it must do. By cutting our excess and unnecessary code we can remove bugs and complexities that can make it more difficult to manage our code base.

In Part 3, we are going to dive deeper and really start to explore Durable Functions and showing how they can be initiated and referenced in subsequent calls, including those that can advance the workflow, circa a human operation.

The complete code for this entire series is here: https://github.com/jfarrell-examples/DurableFunctionExample

3 thoughts on “Durable Functions: Part 2 – The Upload

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s