One of the things I have been focusing on lately is Kubernetes, its always been an interest to me but, I recently decided to pursue the Certified Kubernetes Application Developer (CKAD) and so diving into topics that I was not totally familiar with has been a great deal of fun.
One topic that is of particular interest is storage. In Kubernetes, though really in Containerized applications, state storage is an important topic since the entire design of these systems is aimed at being transient in nature. With this in mind, it is paramount that storage happen in a centralized and highly available way.
A common approach to this is to simply leverage the raw cloud APIs for things like Azure Storage, S3, etc as the providers will do a better job ensuring the data is stored securely and in a way that makes it hard for data loss to occur. However, Kubernetes enables the mounting of the cloud systems directly into Pods through Persistent Volumes and Storage Classes. In this post, I want to show how to use Storage Class with Azure so I wont be going to detail about the ins and out of Storage Classes or their use cases over Persistent Volumes, frankly I dont understand that super well myself, yet.
Creating the Storage Class
The advantage to Storage Class (SC) over something like Persistent Volume (PV) is the former can automatically create the later. That is, a Storage Class can received Claims for volume and will, under the hood, createPVs. This is why SC’s have become very popular with developers, less maintenance.
Here is a sample Storage Class I created for this demo:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This step is actually optional – I only did it for practice. AKS will automatically create 4 default storage classes (they are useless without a Persistent Volume Claim (PVC)). You can see them by running the follow command:
kubectl get storageclass
Use kubectl create -f to create the storage class based on the above, or use one of the built in ones. Remember, by itself, the storage class wont do anything. We need to create a Volume Claim for the magic to actually start.
Create the Persistent Volume Claim
A persistent volume claim (PVC) is used to “claim” a storage mechanism. The PVC can be, depending on its access mode, attached to multiple nodes where its pods reside. Here is a sample PVC claim that I made to go with the SC above:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The way PVCs work (simplistically) is they seek out a Persistent Volume (PV) that can support the claim request (see access mode and resource requests). If nothing is found, the claim is not fulfulled. However, when used with a Storage Class its fulfillment is based on the specifications of the Storage Class provisioner field.
One of the barriers I ran into, for example, was that my original provisioner (azure-disk) does NOT support multi-node (that is it does not support ReadWriteMany used above). This means, the storage medium is ONLY ever attached to a single node which limits where pods using the PVC can be scheduled.
Run a kubectl create -f to create this PVC in your cluster. Then run kubectl get pvc – if all things are working your new PVC should have a state of Bound.
Let’s dig a bit deeper into this – run a kubectl describe pvc <pvc name>. If you look at the details there is a value with the name Volume. This is actually the name of the PV that the Storage Class carved out based on the PVC request.
Run kubectl describe pv <pv name>. This gives you some juicy details and you can find the share in Azure now under a common Storage Account that Kubernetes has created for you (look under Source).
This is important to understand, the claim creates the actual storage and Pods just use the claim. Speaking of Pods, let’s now deploy an application to use this volume to store data.
Using a Volume with a Deployment
Right now, AKS has created a storage account for us based on the request from the given PVC that we created. To use this, we have to tell each Pod about this volume.
I have created the following application as Docker image xximjasonxx/fileupload:2.1. Its a basic C# Web API with a single endpoint to support a file upload. Here is the deployment that is associated with this:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The key piece of this the ENV and Volume Mounting specification. The web app looks to a hard coded path for storage if not overridden by the Environment Variable SAVE_PATH. In this spec, we specify a custom path within the container via this environment variable and then mount that directory externally using the Volume created by our PVC.
Run a kubectl create -f on this deployment spec and you will have the web app running in your cluster. To enable external access, create a Load Balancer Service (or Ingress), here is an example:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Run kubectl create -f on this spec file and then run kubectl get svc until you see an External IP for this service indicating it can be addressed from outside the cluster.
I ran the following via Postman to test the endpoint:
If all goes well, the response should be a Guid which indicates the name of the image as stored in our volume.
To see it, simply navigate to the Storage Account from before and select the newly created share under the Files service. If you see the file, congrats, you just used a PVC through a Storage Class to create a place to store data.
What about Blob Storage?
Unfortunately, near as I can tell so far, there is no support for saving these items to object storage, only file storage. To use the former, at least with Azure, you would still need to use the REST APIs.
This also means you wont get notifications when new files are created in the file share as you would with blob storage. Still, its useful and a good way to ensure that data provided and stored is securely and properly preserved as needed.
I was recently asked by a client how I would go about injecting user information into a service that could be accessed anywhere in the call chain. They did not want have to capture the value at the web layer and pass it to what could be a rather lengthy call stack.
The solution to this is to leverage scoped dependencies in ASP .NET Core which will hold an object for the duration of the request (default). In doing this, we can gather information related to the request and expose it. I also wanted to add an additional twist. I wanted to have two interfaces for the same object, one that enable writing and the other that would enable reading, like so:
The reason for doing this is aimed at being deterministic. What I dont want to support is the ability for common code to accidentally “change” values, for whatever reason. When the injection is made, I want the value to be read only. But, to get the value in there I need to be able to write it, so I segregate the operations into different interfaces.
This may be overkill for your solution but, I want the code to be as obvious in its intent and capabilities – this helps instruct users of this code how it should be used.
Configuring Injection
Our ContextService, as described above, contains only a single property: Username. For this exercise, we will pull the value for this out of the incoming query string (over simplistic I grant you, but it works well enough to show how I am using this).
I am going to define two interfaces which this class implements: IContextReaderService and IConextWriterService, code below:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The tricky part now is, we want the instance of ContextService created with and scoped to the incoming request to be shared between IContextReaderService and IContextWriterService, that is I want the same instance to comeback when I inject a dependency marked with either of these interfaces.
In Startup.cs I need to do the following to achieve this:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The secret here is the request scoped ContextServiceFactory which is given as the parameter to AddScoped that allows us to tell .NET Core how to resolve the dependency. This factory is defined very simply as such:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Remember, by default, something added as a scoped dependency is shared throughout the lifetime of the request. So here, we maintain state within the factory to know if it has created an instance of ContextService or not, and if it has, we will return that one. This factory object will get destroyed when the request completed and recreated when a new request is processed.
Hydrating the Context
Now that we have our context split off, we need to hydrate the values, thus we need to inject our IContextWriterService dependency into a section of code that will get hit on each request. You might be tempted to use a global filter, which will work but, the better approach here is custom middleware. Here is what I used:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Because of the way they are used, you can only use constructor injection for singleton scoped dependencies, if you attempt to use a Scoped or Transient scoped dependency in the middleware constructor, it will fail to run.
Fear not, we can use method injection here to inject our dependency as a parameter to the Invoke method which is what ASP .NET Core will look for and execute with each request. Here you can see we have defined a parameter of type IContextWriterService.
Within Invoke perform the steps you wish to take (here we are extracting the username from the name parameter in the Query String, for this example). Once you complete your steps be sure to call the next bit of middleware in sequence (or return a Completed Task to stop the chain).
Using the Reader dependency
Now that we have configured the dependency and hydrated it using middleware we can no reference the IContextReaderService to read the value out. This works in the standard way as you would expect:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We can inject this dependency wherever we need (though more specifically, wherever we can access the IContextReaderService).
Mutability vs Immutability
The main goal I was trying to illustrate here is to leverage immutability to prevent side effects in code. Because of the interface segregation, a user would be unable to change the given value of the context. This is desirable since it lends to better code.
In general, we want to achieve immutability with objects in our code, this is a core learning from functional programming. By doing this, operations become deterministic and less prone to sporadic and unexplainable failures. While the example presented above is simplistic in nature, in a more complex systems, having assurances that users can only read or write depending on which interface is used allows for better segregation and can yield cleaner and more discernable code.
Hope you enjoyed. More Testing posts to come, I promise.
Part 1 is here – where I intro Testing Strategies.
Unit testing is the single most important test suite within ANY application. It is the first line of defense guarding against defects and is paramount to instilling confidence in developers that the application of changes does not break any existing logic. This being the case, they are (should be) the most numerous type of test authored for a system. High performing teams will run them often as a verification step and ensure their runs are as fast as possible to save time. By doing so and building confidence they are able to achiever ever higher levels of efficiency and quality.
What do we Unit Test?
This is perhaps the single most important and common question you will get from teams or you will discuss within your own teams. Making the right decision here is critical for the long term success of the project and preventing quality and performance issues from negatively impacting your teams.
As a fundamental rule, we do not unit test external dependencies, that is database calls, network calls, or any logic that might involve any sort of external dependencies. Our unit test runs need to be idempotent such that we can run them as much as we like without having to worry about disk space, data pollution, or other external factors.
Second, the focus must be on a unit of code. In this regard, our tests do not test multi-step processes. They test a single path through a unit of code; the need for a unit test to be complex is often an indicator of a code smell: either the logic is overly complicated and needs refactoring or, the test itself is wrong and should either be broken down or tested with a different form of testing such as integration tests.
Finally, we should test known conditions for external dependencies through the use of mocking. By using a mocking library we can ensure that code remains resilient and that our known error cases are handled. Further, using a mocking library often forces us to use design by contract which can improve the readability of our code.
Making the wrong choice – a story from the past
I worked with a team in a past life that made the wrong choice when it came to their testing. As part of an effort to improve quality the client (astutely) asked the team to ensure testing was being done against database and networking calls. Leaders on the team, due to poor knowledge around testing or poor decision making, opted to work these tests into the unit test library. Over the course of the project, this caused the test run time to increase to greater than 40m.
One of the critical elements to high functioning teams is the notion of fast feedback. We want to ensure developers are given immediate feedback when something breaks. Unit tests are a core part of achieving this and their speed is paramount to the teams effectiveness. What happens when you allow tests times to balloon as mentioned? Disaster.
When the turnaround time is that long, developers will seek ways to avoid incurring that time cost (there is still the pressure to get work done). Generally this involves not writing tests (we dont want to increase the time cost), running them minimally (get the work done and test at the end), or turning them off. None of these options improve efficiency and, in fact, make an already bad problem that much worse.
In this case, the team adopted a branching model that called for entire features to be developed in a “feature” branch before merging. With any development environment we always want to minimize “drift”, that is differences between master and any branches. The less drift the fewer merge conflicts and the quicker problems are discovered.
By not understanding this principle, the team unknowingly, compounded their problem. In some cases these “features” would be in flight for 10+ days, creating enormous amounts of drift. And, as the team was looking to avoid running the tests too often, the changes were not being checked regularly by the tests. As you can imagine, issues were found persistently near the end of sprints, as code was merged. And due to the size of the incoming changes debugging became a massive task.
This created more problems for the beleaguered teams as they were forced to spend time after hours routinely debugging and trying to finish features before the end of the sprint. Burnout was rampant and the team members became jaded with one another and the company itself – they endured this for 10+ months. While the project ultimately did complete, the client relationship was ruined and several good developers left the company.
To be clear, the bad choices around testing alone were not the single cause of this failure, there were numerous other problems. However, I have found that that even a difficult client can be assuaged if code quality is maintained and the team delivers. I can recall a team that I led where we had unit testing and continuous delivery processes in place such that, even though we had delays and bugs, these processes enabled us to respond quickly – the client remained delighted and worked with us.
The lesson here is, no matter what, we MUST ensure the development team has the tools needed to support automation processes. These processes form the core of the ability to deliver and lend themselves to building healthy and sustainable client relationships.
How do I write a Unit Test?
So, now you have an understanding of what can be unit tested, let’s talk about how you write them. First, I wish to introduce you to the AAA pattern: Arrange, Act, Assert. This pattern is crucial as you write your tests to check yourself against the warning signs for bad unit tests.
Arrange: In this step we “arrange” the unit, that is we do all of the things to prepare for executing our unit. Be wary at this level if the steps to arrange feel too cumbersome, it likely indicates that your design need refactoring
Act: In this step we “invoke” the unit. This executes our the code we are specifically testing. Be wary at this level if more than two executions are necessary. This means you are NOT testing a unit and your design needs to be re-evaluated. Remember, we do not test multi-part flows with unit tests.
Assert: In this step we check the outcome of our unit. Important here is to only assert on the minimum amount of information needed to verify the unit. I have seen teams assert on 20+ properties for an object, this is excessive. Think carefully about what indicates a failure. My rule of thumb is never more than three asserts. If you need more, create another test.
Here is an example of a simple math problem under unit test:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
As you can see, in this example we define our two variables (numberOne and numberTwo) in the arrange section, we then invoke our add operation in the act and finally we assert that the value meets with our expectations.
The [Fact] is a part of the xUnit testing library. xUnit is a popular open source testing framework commonly used with .NET Core. There are other libraries available. The use of a library for unit testing makes great sense and will greatly aid in your productivity. Below are a few of the common ones in the .NET ecosystem:
nUnit (https://nunit.org/) – the grand-daddy of them all. Base dont JUnit from Java and one of the first unit testing frameworks devised for .NET
MSTest – Microsoft’s testing framework. It offers the same functionality as nUnit and is built into .NET Framework
xUnit – as mentioned above, similar to nUnit in functionality and aimed at supporting testing in an OS agnostic programming world. This is my default
The next common problem is organization. When you start talking about an application that has thousands, if not tens of thousands (or more) tests, it becomes very apparent that a clear and consistent strategy must be adopted. Over the course of my career I have seen many different approaches but, the one that I favor is the given and assert naming convention. Mainly because it plays very well with most test reporters. Here is an example.
Imagine we have defined the following Web API Controller:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In this case we might define our test fixture (that is the class that contains our test) as such:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Notice the name of the class here, while it does violate traditional C# naming convention, when you run the test runner, it will precede your method name. Therefore, if we expand this to include a test like so:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The above example is a product of over simplification and ONLY for demonstration purposes. When unit testing controllers, the emphasis needs to be on result types returned NOT values. Testing the outcome of operations should be done with unit tests against services. The above represents code that violates the separation of concerns principle.
With this in place, if we run a test runner and view the results in the reporter we will see the following:
As you can see, the advantage to this strategy is it lines up nicely and produces a readable English sentence detailing what the test is doing. There are other strategies but, as I said, this is my go to in most cases due to the readability and scalable nature of this naming method.
Further, it bakes into it a necessary check to ensure unit tests are not checking too much. As a rule, the assert portion should never contain the word and as that it implies more than one thing is being checked which violates the unit principle.
How do I test external dependencies?
The short answer is, you dont, you generally write integration tests (next part in this series) to cover those interactions. However, given the speed and criticality of the logic checked by unit tests we want to maximize their ability as best we can.
A classic example of this case is Entity Framework. If you have worked with Entity Framework you will be familiar with the DbContext base class that denotes the context which handles querying our underlying database. As you might expect, our unit tests should NEVER invoke this context directly, not even the InMemory version but, we do need to ensure our logic built on the context works properly. How can we achieve this?
The short answer is: we can define an interface which exposes the necessary methods and properties on our context and have our classes take a dependency on this interface rather than the concreate context class itself. In doing so, we can use mocking libraries to mock the context allowing testing against these lower level classes.
The long answer is, honestly, an entire blog post (Learning Tree has a good write up that uses NSubstitute here) that I will try to add on later.
But this strategy of using interfaces also allows us to take dependencies on static components as well. In older versions of ASP .NET it was common for applications to utilize the HttpContext.Current property to reference the incoming ISAPI results. But, because this property was static it could not be unit tested directly (it would always be null unless running in the web context).
Using the interface approach, we commonly saw things like this:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Using this approach the controller, which will have unit tests, is dependent on the injected IContextAccessor interface instead of HttpContext. This fact is crucial as it allows us to write code like such:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is the result. This code validates that our logic is correct but, it does NOT validate that HttpContext gets built properly at runtime, this is not our responsibility, it is the author of the framework (Microsoft in this case) whose responsibility that is.
This brings a very clear and important point when writing tests: some tests are NOT yours to right. It is not on your team to validate that, for example, Entity Framework works properly, or that a request through HttpClient works – these components are already (hopefully) being tested by their authors. Attempting to go down this road will not lead you anywhere where the test drive value.
A final point
The final use case with testing I would like to make, and this is especially true with .NET is, tests should ALWAYS be synchronous and deterministic. Parallel code needs to be broken down into its discrete pieces and those pieces need to be tested. Trying to unit test parallel code is fraught with the risk of introducing “flakiness” into tests – these are tests that pass sometimes and other times not.
.NET developers commonly use the async/await syntax in their code. Its very useful and helpful however, when running unit tests it needs to be forced down a synchronous path.
We do not test external dependencies so, the use of async/await should not be needed for ANY test. Our dependencies should be mocked and thus will return instantaneously.
To do this, it is quite easy, we can call GetAwaiter and GetResult methods which will force the resolution of the Task return variable. Here is an example:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
By calling GetAwaiter() and GetResult() we force the call to be synchronous. This is important since, in some cases, the Asserts may run BEFORE the actual call completes, resulting in increased test flakiness.
The most important thing is not just to test but also to be fast
Hopefully this post has shown you some of the ways you can test things like databases, async calls, and other complex scenarios with unit tests. This is important. Due to their speed, it makes sense to use them to validate wherever possible.
One of the uses that I did not show here is “call spying“, this is where the mocking framework can “track” how many times a method is called which can serve as another way to assert.
But the most important thing I hope I can impress is the need to not only ensure unit tests are built with the application but, also that you continually are watching to ensure they remain fast enough to be effective for your developers to perform validation on a consistent ongoing basis.
The next topic which I intend to cover will focus on Integration Tests, primarily via API testing through Postman.
One of the challenges with incorporating DevOps culture for teams is understanding that greater speed yields better quality. This is often foreign to teams, because conventional logic dictates that “the slow and steady win the race”. Yet, in every State of DevOps report (https://puppet.com/resources/report/state-of-devops-report/) since it began Puppet (https://puppet.com/) has consistently found that teams which move faster see higher quality that those that move slower – and this margin is not close and the gap continues to accelerate. Why is this? I shall explain.
The First Way: Enable Fast and Increasing Flow
DevOps principles (and Agile) were born out of Lean Management which is based on the Toyota Production System (https://en.wikipedia.org/wiki/Toyota_Production_System). Through these experience we identify The Three Ways, and the first of these specifically aims for teams to operating on increasingly smaller workloads. With this focus, we can enable more rapid QA and faster rollback as it far easier to diagnose a problem in one thing than in 10 things. Randy Shoup of Google observed:
“There is a non-linear relationship between the size of the change and the potential risk of integrating that change—when you go from a ten-line code change to a one-hundred-line code change, the risk of something going wrong is more than 10x higher, and so forth”
What this means is, the more changes we make the more difficult it is to diagnose and identify problems. And this relationship is non-linear meaning, this difficulty goes up exponentially as the size of our changes increase.
In more practical terms, it argues against concepts such as “release windows” and aims for a more continuous deployment model whereby smaller changes are constantly deployed and evaluated. The value here is, by operating on these smaller pieces we can more easily diagnose a problem and rollbacks become less of an “event”. Put more overtly, the aim is to make deployments “normal” instead of large events.
This notion is very hard for many organizations to accept and it often runs counter to how many IT departments operate. Many of these departments have long had problems with software quality and have devised release and operations plans to, they believe, minimize the risk of these quality issues. However, from the State of DevOps reports, this thinking is not backed up by evidence and tends to create larger problems. Successful high functioning teams are deploying constantly and moving fast. Speed is the key.
The secret to this speed with quality is the confidence created through a safety net. Creating a thorough safety net can even create enough confidence to let newest person on the team deploy to Production on Day 1 (this is the case at Etsy).
Creating the Safety Net
In the simplest terms, the safety net is the amalgamation of ALL your tests/scans running automatically with each commit. The trust and faith in these tests to catch problems before they reach production allows developers to move faster with confidence. It also being automated means it does not rely on a sole person (or group) and can scale with the team.
Ensuring the testing suite is effective is a product of having a solid understanding of the breakdown of testing types and adopting of the “Shift Left” mindset. For an illustration of testing breakdown, we can reference the tried and true “Testing Pyramind”:
As illustrated, unit tests comprise the vast majority of tests in the system. The speed of these tests is something that should be closely monitored as they are run the most often. Tips for ensuring speed:
Do NOT invoke external dependencies (database, network calls, disk, etc)
Focus on a UNIT, use Mocking libraries to fulfill dependencies
Adhere to the AAA model (Arrange, Act, Assert) and carefully examine tests for high complexity
Unit tests must be run frequently to be effective. In general, a minimum of three runs should occur with any change: Local run, run as part of PR validation, and a run when the code is merged to master. The speed is crucial to reduce, as much as possible, the amount of time developers have to wait for these tests.
At the next level we start considering “Integration tests”. These are tests which require a running instance of the application and thus need to follow a deploy action. Their inclusion of external dependencies makes them take longer to run, hence we decrease the frequency. There are two principle strategies I commonly see when executing these tests:
Use of an Ephemeral “Integration” environment – in this strategy, we use Infrastructure as code to create a wholly new environment to run our Integration tests in – this has several advantages and disadvantages
Benefit – avoids “data pollution”. Data pollution occurs when data created as part of these tests can interfere with future test runs. A new environments guarantees a fresh starting point each time
Benefit – tests your IaC scripts more frequently. Part of the benefit in modern development is the ability to fully represent environments using technologies like Terraform, ARM, and others. These scripts, like the code itself, need exercising to ensure they continue to meet our needs.
Negative – creating ephemeral environments can elongate the cycle time for our release process. This may give us clues when one “thing” is more complex than it should be
Execute against an existing environment. Most commonly, I recommend this to be the Development environment as it allows the testing to serve as a “gate” to enable further testing (QA and beyond)
Benefit – ensures that integration testing completes before QA examines the application
Negative – requires logic to avoid problems with data pollution.
What about Load Testing?
Load Testing is a form of integration testing with some nuance. We want to run them frequently but, their running must be in a context where our results are valid. Running them in, let us say, a QA environment is often not helpful since a QA server likely does not have the same specs as Production. Thus problems in QA with load may not be an issue in higher environments.
If you opt for the “ephemeral approach” you can conduct load testing as part of these integration tests – provided your ephemeral environment is configured to have horsepower similar to production.
If the second strategy is used, I often see Load Testing done for staging, which I disagree with – it is too far to the right. Instead, this should be done in QA ahead of (or as part of) the manual testing effort.
As you can see above in the pyramid, ideally these integration tests comprise about 20% of your tests. Typically though, this section is where the percentage will vary the most depending on the type of application you are building.
Finally we do our Manual Testing with UI testing
UI tests and/or acceptance testing comprises the smallest percentage (10%), mainly because the tests are so high level that they become brittle and excessive amounts will generate an increased need for maintenance. Further, testing here tends to be more subjective and strategic in nature, thus exploratory testing tends to yield more results and inform the introduction of more tactical tests at other levels.
QA is a strategic resource, not a tactical one
A core problem that is often seen within teams and organizations prior is how QA is seen and used. Very often, QA is a member of the team or some other department that code is “thrown over the wall to” as a last step in the process. This often leads to bottlenecks and can even create an adversarial relationship between Engineering and QA.
The truth is, the way QA is treated is not fair and nor is it sensible. I always ask teams “how often has QA been given 4 features to test at 445pm the day before the Sprint Demo?”. And each time, this is not an exception, it is consistent. And of course, QA finds issues and results in the whole team staying late or “living with bugs”. The major mistake that is made
The truth is, this creates a bottleneck with QA, a rather unfair one at that. How often has QA been asked to work long hours the day before the sprint ends after being given 5 features that “just finished and need testing”? This is not acceptable and underlines the misunderstanding organizations have for QA.
QA is not responsible for testing, per se, they are responsible for guiding testing and to ensure it is happening. Testing, ultimately, falls to developers as they are the closest to the code and have the best understanding of it. This is why automated testing (unit in particular) is so vital to the “safety net” concept. Getting developers to understand that testing and writing tests is their responsible is vital to adopting DevOps culture.
This is not to say QA does NO testing, they do. But it is more strategic in nature; aimed at exploratory testing and/or ensuring the completness of the testing approach. They also lead in the identification of issues and their subsequent triaging. Key to high function teams is, whenever an issue is found, the team should remediate it but also create a test which can prevent it from appearing in the future. As the old saying goes “any problem is allowed to happen once”.
Moving away from this relience on the QA department/individual can feel rash to teams that have become overly dependant on this idiom. But rest assured, the best way forward is to focus on automation to create and maintain a suitable safety net for teams.
Safety Nets Take Time
Even introducing 1000 unit tests tomorrow is not going to immediately give your developers the confidence to move faster. Showing that you can deploy 6x a day is not going to immediately see teams deploying 6x a day. Confidence is earned and, there is a saying, “you only notice something when it breaks”. DevOps is not a magic bullet or even a tool – it is a cultural shift, one that, when properly done, touchest every corner of the organization and every level, from the most junior developer to the CEO.
The culture implores participants to constantly challenge themselves and application to ensure that safety measures in place work correctly and complete. High functioning teams want to break their systems, notable Netflix will often break things in Production intentionally to ensure failsafes are working properly.
More canonically, if a test never breaks, how do we know it works at all? This is the reason behind the Red-Green-Refactor development methodology (https://www.codecademy.com/articles/tdd-red-green-refactor). I see a lot of teams simply write tests with the assumption that they work, without actually creating a false condition to test if they break.
But the effort is worth it to move faster and see higher quality. In addition, adopting this aspect of DevOps culture means teams can have higher confidence in their releases (even if they are not deploying all the time). This makes for decreased burn out and better morale/productivity. Plus, you get a full regression suite for free.
I plan to continue this series by diving more deeply into many of the concepts I covered here with unit testing likely being the next candidate.
We are here now at the final part of our example (Part 1, Part 2, Part 3) that will focus on what happens after we approve our upload as shown in Part 3. That is, we will leverage Cognitive Services to gather data about image and store it in the Azure Table Storage we have been using. As to why this part is coming so much later, I moved into a house so I was rather busy 🙂
In the previous blog posts we built up a Durable Function Orchestrator which is initiated by a blob trigger from the file upload. To this point, we have uploaded the file and allowed a separate HTTP Trigger function to “approve” this upload, thereby demonstrating how Durable Functions enable support of workflows that can advance in a variety of different ways. Our next step will use an ActivityTrigger, which is a function that is ONLY ever executed in the context of an orchestrator by an orchestrator.
Building the Activity Trigger
ActivityTriggers are identified by their trigger parameter as shown in the code sample below (only the function declaration):
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In this declaration we are indicating this function is called as an activity within an orchestration flow. Further, as we have with other functions we are referencing the related Blob and, new here, the ocrData cloud table which will hold the data outputted from the OCR process (Optical Character Recognition, Computer Vision essentially).
To “call” this method we expand our workflow and add the CallActivityAsync call:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This approach enables us to fire “parallel” tasks and further leverage our pool of Azure Functions handlers (200 at a time). This can be more effective than trying to leverage the parallel processing within the Azure Function instance itself, but always consider how best to approach a problem needing parallelism to solve.
I am not certain if the Function attribute is necessary on the function since, as you can see, we are referring to by its canonical name in C#. We also pass in the Target Id for the Azure Table record, this so a FK relationship can exist for this data. This is purely stylistic – in many cases it may make more sense for all data to live together, this is one of the strengths of document databases like DocumentDb and Mongo.
Finally, we have our Function “wait” for activity to complete. This activity, as I indicated, can spawn other activities and use its separate function space as needed.
Once you have cognitive services setup, you can update your settings so that your keys and URL match your service, install the necessary Nuget package:
As a first step, we need to make sure the OcrData table is created and indicate what bits of the computer vision data we want. To do this efficient I created the follow extension method:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
All this does is allow me to specify parent object points in the return structure for Ocr results and create a name value pair that I can return and more easily insert into the Table Storage schema I am aiming to achieve. Once I have all of these OcrPairs, I use a batch insert operation to update the OcrData table.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Now that the Ocr data has been generated our Task.WhenAny will allow the orchestrator to proceed. The next step is to wait for an external user to indicate their approval for the data to be downloaded – this is nearly a carbon copy of the step which approved the uploaded file for processing.
Once the approval is given, our user can call the DownloadFile function to download the data and get a tokenized URL to use for raw download (our blob storage is private and we want to control access to blobs). Here is our code for the download action:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
That is quite a bit of code but, in essence, we are simply gathering all data associated with the data entry being requested for download and generating a special URL for download out of our blob storage that will be good for only one hour – a lot more restrictions can be placed on this so its an ideal way to allow external users to have temporary and tightly controlled access to blobs.
And that is it, you can call this function through Postman and it will give you all data collected for this file and a link to download the raw file. There is also a check to ensure the file has been approved for download.
Closing
When I started to explore Durable Functions this was the antithesis of what I was after: event based workflow execution with a minimal amount of code needing to be written and managed.
As I said in Part 1 – for me event driven programming is the way to go in 95% of cloud based backends; the entire platform is quite literally begging us to leverage the internal events and APIs to reduce the amount of code we need to write while still allowing us to deliver on value propositions. True, going to event approach does create new challenges but, I feel that trade-off in most cases is well worth it.
In one of my training classes I explore how we can write “codeless” applications using API Management by effectively using APIM to “proxy” Azure APIs (Key Vault and Storage notably). Sure, there are cases where we need to support additional business logic but, there are also many cases where we write a service to store data to blob storage when we dont need to – when we can just store it there and use events to filter and process things.
In the end, the cloud gives you a tremendous amount of options for what you can do and how to solve problems. And that really is the most important thing: having options and realizing the different ways you can solve problems and deliver value.
In Part 1 of this series, we explained what we were doing and why including the aspects of Event Driven Design we are hoping to leverage using Durable Functions (and indeed Azure Functions) for this task.
In Part 2, we build our file uploader that sent our file to blob storage and recorded a dummy entry in Azure Table Storage that will later hold our metadata. We also explained why we choose Azure Table Storage over Document DB (Cosmos default offering)
Here in Part 3, we will start to work with Durable Functions directly by triggering it based on the afore mentioned upload operation (event) and allowing its progression to be driven by a human rather than pure backend code. To that end, we will create an endpoint that enables a human to approve a file by its identifier which, advances the file through the workflow represented by the Durable Function.
Defining Our Workflow
Durable Function workflows are divided into two parts: The Orchestrator Client and the Orchestrator itself.
The Orchestrator Client is exactly what it sounds like, the client which launches the orchestrator. Its main responsibility is initializing the long running Orchestrator function and generating an instanceId which can be thought of as a workflow Id
The Orchestrator, as you might expect, represents our workflow in code with the stopping points and/or fan outs that will happens as a result of operations. Within this context you can start subworkflows if desired or (as we will show) wait for a custom event to allow advancement
To that end, I have below the code for the OrchestratorClient that I am using as part of this example.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
First, I want to call attention to the definition block for this function. You can see a number of parameter, most of which have an attribute decoration. The one to focus on is the BlobTrigger as it does two things:
It ensures this functions is called whenever a new object is written to our files container in the storage account defined by our Connection. The use of these types of triggers is essential for achieving the event driven style we are after and which yield substantial benefit when used with Azure Functions
It defines the parameter id via its binding notation {id} through this, we can use this value in other parameters which feature binding (such as if we wanted to output a value to a Queue or something similar)
The Table attributes each perform a separate action:
The first parameter (FileMetadata) extracts from Azure Table Storage the row with the provided RowKey/PartitionKey combination (refer to Part 2 for how we stored this data). Notice the use of {id} here – this value is defined via the same notation used in the BlobTrigger parameter
The second parameter (CloudTable) brings forth a CloudTable reference to our Azure Storage Table. Table does not support an output operation, or at least not a straightforward one. So, I am using this approach to save the entity from the first parameter back to the table, once I update some values
What is most important for this sort of function is the DurableClient reference (Need this Nuget package). This is what we will use to start the action workflow.
Reference Line 11 of our code sample and the call to StartNewAsync. This literally starts an orchestrator to represent the workflow. It returns an InstanceId which we save back to our Azure Table Storage Entity. Why? We could technically have the user pass the InstanceId received from IDurableOrchestrationClient but, for this application, that would run contrary to the id they were given after file upload so, instead we choose to have them send us the file id, perform a look up so we can access the appropriate workflow instance, your mileage may vary.
Finally, since this method is pure backend there is no reason to return anything though you certainly could. In the documentation here Microsoft lays out a number of architectural patterns that make heavy use of the parallelism offered through Durable Functions.
Managing the Workflow
Noting the above code on Line 11 we actually name the function that we want to start, this function is expected to have one argument of type IDurableOrchestrationContext (Note Client vs Context) that is decorated with the OrchestratioinTrigger attribute. This denotes the method is triggered by a DurableClient starting a workflow with this given name (the name here is ProcessFileFlow).
The code for this workflow (at least the initial code) is shown below:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I feel it is necessary to keep this function very simple and only contain code that represents steps in the flow or any necessary logic for branching. Any updates to the related info elements is kept in the functions themselves.
For this portion of our code base, I am indicating to the Orchestration Context that advancement to the next step can only occur when an external event called UploadApproved is received. This is, of course, an area that we could provide a split of even a time out concept (this so we dont have n number of workflows sitting waiting for an event that may never be coming).
To raise this event, we need to build a separate function (I will use an HttpTrigger) that can raise this event. Here is the code I choose to use:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Do observe that, as this an example, we are omitting a lot of functionality that would pertain to authentication and authorization to allow the UploadApprove action – as such this code should not be taken literally and used only to understand the concept we are driving towards.
Once again, we leverage bindings to simplify our code, mainly based on the fileId provided by the caller we can bring in the FileMetadata reference represented in our Azure Table Storage (we also bring in CloudTable so the afore mentioned entry can be updated to denote the file upload has been approved).
Using the IDurableOrchestrationClient injected into this function we can use the RaiseEventAsync method with the InstanceId extracted from the Azure Table Storage record to raise the UploadApproved event. Once this event is raised, our workflow advances.
Next Steps
Already we see the potential use cases for this approach, as the ability to combine workflow advancement with code based approaches makes our workflows even more dynamic and flexible.
In Part 4, we will close out the entire sample as we add two more approval steps to the workflow (one code driven and the other user driven) and then add a method to download the file.
I covered the opening to this series in Part 1 (here). Our overall goal is to create a File Approval flow using Azure Durable Functions and showcase how complex workflows can be me managed within this Azure offering. This topic exists in a much larger topic that is Event Driven design. Under EDD we aim to build “specialized” components which respond to events. By doing taking this approach we write only what code should do an alleviate ourselves of boilerplate and unrelated code. This creates a greater degree of decoupling which can help us when change inevitably comes. It also allows us to solve specific problems without generating wasteful logic which can hide bugs and creates other problems.
In this segment of the series, we will talk through building the upload functionality to allow file ingestion. Our key focus will be the Azure Function binding model that allows boilerplate code to be neatly extracted away from our function. Bindings also underpin the event driven ways we can working with functions, specifically allowing them to be triggered by an Azure Event.
Let’s get started. As I move forward I will be assuming that you are using Visual Studio Code with the Azure Function tools to create your functions. This is highly recommended and is covered in detail in Part 1.
Provision Azure Resources
The first thing we will want to do is setup our Azure resources for usage, this includes:
Resource Group
Azure Storage Account (create a container)
Azure Table Storage (this is the Table option within Cosmos)
Cognitive Services (we will use this in Part 3)
As a DevOps professional my recommended approach to deploying infrastructure to any cloud environment is to use a scripted approach, ideally Terraform or Pulumi. For this example, we will not go into that since it is not strictly my aim to extol good DevOps practices as part of this series (we wont be deploying CI/CD either).
For this simple demo, I will leave these resources publically available thus, we can update local.settings.json with the relevant connection information as we develop locally. local.settings.json is a special config file that, by default, the template for an Azure Function project created by the VSCode Azure Functions will excludes from source control. Always be diligent and refrain from checking credentials into source control, especially for environments above Development.
Getting started, you will want to having the following values listed in local.settings.json:
AzureWebJobsStorage – used by Azure Functions runtime, the value here should be the connection string to the storage account you created
FUNCTIONS_WORKER_RUNTIME – dotnet – just leave this alone
StorageAccountConnectionString – this is the account where our uploads are saved too, again it can share the same Storage Account that you previously created
TableConnectionString– this is the connection string to the Azure Table Storage instance
CognitiveServicesKey – the value of the key given when you create an instance of the Azure Cognitive Services resource
CognitiveServicesEndpoint – the value of the endpoint to access your instance of the Azure Cognitive services
Here is the complete code for the Azure Function which handles this upload:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The code looks complex but, it is actually relatively simple given to the heavy use of Azure Function bindings. There are three in use:
HttpTrigger – most developers will be familiar with this trigger. Through it, Azure will listen for Http requests to a specific endpoint and route and execute this function when such a request is detected
Blob – You will need this Nuget package. This creates a CloudBlobContainer initialized with the given values. It makes it incredibly easy to write data to the container.
Table – Stored in the same Nuget as the Blob bindings. This, like the blob, opens up a table connection to make it easy to add data, even under high volume scenarios
Bindings are incredibly useful when developing Azure Functions. Most developers are only familiar with HttpTrigger which is used to respond to Http requests but there is a huge assortment and support for events from many popular Azure resources. Using these removes the need to write boilerplate code which can clutter our functions and obscure their purpose.
Blob and Table can be made to represent an item in their respective collections or a collection of items. The documentation (here) indicates what types method arguments using these attributes can be. Depending on how you use the attribute it can be a reference to the table itself, a segment of data from that table (using the partition key), or an item itself. The Blob attribute has similar properties (here).
One thing to keep in mind is a “bound parameter” must be declared as part of a trigger binding attribute to be used by other non-trigger bindings. Essentially it is important to understand that bindings are bound BEFORE the function is run, not after. Understanding this is essential to creating concise workflows using bindings.
Understanding Binding through an example
Taking our code sample from above (repasted here)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
So here I am creating the unique Id (called fileName) in code. If I wanted to, I could specify {id} in the HttpTrigger as part of the path. This would give me access to the value of {id} in other bindings or as a parameter to the function called id. In this case, it would amount to relying on the user to give me a unique value, which would not work.
I hope that explains this concept, I find understanding it makes things easier and more straightforward with how you may choose to write your code. If not, there will be other examples of this in later sections and I am happy to explain more in the comments.
The Upload Process
Now that we have covered the binding the code should make a lot more sense if it did not before.
Simply put, we:
Call Guid.NewGuid().ToString() and get a string representation or a new Guid. This is our unique Id for this file upload
The binary stream accepted through the Http Post request is saved to a block reference into our Azure Storage Account
Next, the initial record for our entry is created in the Azure Table Storage (Approval flags are both set to false)
We return a 201 Created response as is the standard for Post operations which add new state to systems
Straightforward and easy to understand and thanks to the bindings all heavy lifting was done outside of the scope our function allowing it to clearly express its intent.
Why Azure Table Storage?
Azure Table Storage is an offering that has existed for a long time in Microsoft Azure, it only recently came to be under the Cosmos umbrella along with other NoSQL providers. The use of Table Storage here is intentional due to its cost effectiveness and speed. But, it does come with some trade-offs:
Cosmos Core (DocumentDb) offering is designed as a massive scalable NoSQL system. For larger systems with more volume, I would opt for this over Table Storage – though you get what you pay for, its not cheap
DocumentDb is a document-database meaning the schema is never set in stone and can always be changed as new records are added. This is not the case with Table Storage which will set its schema based on the first record written.
When making this decision it is important to consider not just current requirements but also near-term requirements as well. I tend to like Table Storage when the schema is not going to have a lot of variance and/or I want a NoSQL solution that is cheap and still effective. Cosmos Core is the other extreme where I am willing to pay more high redundancy and greater performance as well as using a document database where my schema can be different insert to insert.
Triggering the Workflow
Reading the upload code you may have wondered where the workflow is triggered or how it is triggered. By now the answer should not surprise you: binding. Specifically, a BlobTrigger which can listen for new blobs being added (or removed) and trigger a function when that case is detected. Here is declaration of the Azure Durable Function which represents the bootstrapping of our workflow.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
As you can see here, we are starting to get a bit nuts with our triggers. Here is a brief summary:
We use a BlobTrigger to initiate this function and {id} to grab the name of newly created blob (this will be the Guid which is generated during upload)
The Table attribute is used twice: once to reference the actual Table Storage record represented by the newly created blob and another as a reference to the metadata table where the referenced row exists (we need this to write the row back once its updated)
Finally DurableClient (from this Nuget package) which provides the client that allows us to start the orchestrator that will manage the workflow
I will go into much more depth on this in Part 3 but the one point I do want to call out is Table attribute is NOT two way. This means, even if you reference the single item (as we did in our example), changes to that single item are NOT saved back to the table – you must do this manually. This is important as it drives the reason we see some rather creative uses of this attribute.
Closing
We explored some code in this portion of the series, though it was not immediately tied to Durable Functions it was, tied to event driven programming. Using these bindings we can create code that alleviates itself from mundane and boilerplate operations and allow other systems to manage this on our behalf.
Doing this gets us close to the event driven model discussed in Part 1 and allows each function to specialize in what it must do. By cutting our excess and unnecessary code we can remove bugs and complexities that can make it more difficult to manage our code base.
In Part 3, we are going to dive deeper and really start to explore Durable Functions and showing how they can be initiated and referenced in subsequent calls, including those that can advance the workflow, circa a human operation.
No Code in this post. Here we establish the starting point
Event Driven Programming is a popular way to approach complex systems with a heavy emphasis on breaking applications apart and into smaller, more fundamental pieces. Done correctly, taking an event driven approach can make coding more fun and concise and allow for “specialization” over “generalization”. In doing so, we get closer to the purity of code that does only what it needs to do and nothing more, which should always be our aim as software developers.
In realizing this for cloud applications I have become convinced that, with few exceptions, serverless technologies must be employed as the glue for complex systems. The more they mature, the greater the flexibility they offer for the Architect. In truth, not using serverless can, and should be, viewed in most cases as an anti-pattern. I will note that I am referring explicitly to tooling such as AWS Lamba, Google Cloud Functions, and Azure Functions, I am not speaking to “codeless” solutions such as Azure Logic Apps or similar tools in other platforms – the purpose of such tools is mainly to allow less technical persons to build out solutions. Serverless technologies, such as those mentioned, remain in the domain of the Engineer/Developer.
Very often I find that engineers view serverless functions as more of a “one off” technology, good for that basic endpoint that can run in Consumption. As I have shown before, Azure Functions in particular are very mature and through the use of “bindings” can enable highly sophisticated scenarios without need for writing excessive amounts of boilerplate code. Further, offerings such as Durable Functions in Azure (Step Functions in AWS) enable serverless to go a step further and actually maintain a semblance of state between calls – thus enabling sophisticated multi-part workflows that feature a wide variety of inputters for workflow progression. I wanted to demonstrate this in this series.
Planning Phase
As with any application, planning is crucial and our File Approver application shall be no different. In fact, with event driven applications planning is especially crucial because while Event Driven systems offer a host of advantages they also require certain questions to be answered. Some common questions:
How can I ensure events get delivered to the components of my system?
How do I handle a failure in one component but success in another?
How can I be alerted if events start failing?
How can I ensure events that are sent during downtime are processed? And in the correct order?
Understandable, I hope, these questions are too big to answer as part of this post but, are questions I hope you, as an architect, are asking your team when you embark on this style of architecture.
For our application, we will adopt a focus on the “golden path”. That is, the path which assumes everything goes correctly. The following diagram shows our workflow:
Our flow is quite simple and straightforward
Our user uploads a file to an Azure Function that operates off an HttpTrigger
After receiving this file, the binary data is written to Azure Blob Storage and a related entry is made in Azure Table Storage
The creation of the blob triggers Durable Function Orchestration which will manage a workflow that aims to gather data about the file contents and ultimately allow users to download it
Our Durable workflow contains three steps, two of which will pause our workflow waiting for human actions (done via Http API calls). The other is a “pure function” that is only called as part of this workflow
Once all steps are complete the file is marked available for download. When requested the Download File function will return the gathered metadata for the file AND the generated SAS Token allowing persons to download the file for a period of 1hr
Of course, we could accomplish this same goal with a traditional approach but, that would leave us to write a far more sophisticated solution than I ended up with. For reference, here is the complete source code: https://github.com/jfarrell-examples/DurableFunctionExample
Azure Function Bindings
Bindings are a crucial components of efficient Azure Function design, at present I am not aware of a similar concept in AWS but, I do not discount its existence. Using bindings we can write FAR LESS code and make our functions easier to understand with more focus on the actual task instead of logic for connecting and reading from various data source. In addition, the triggers tie very nicely into the whole Event Driven paradigm. You can find a complete list of ALL triggers here:
Throughout my code sample you will see references to bindings for Durable Functions, Blobs, Azure Table Storage, and Http. Understanding these bindings is, as I said, crucial to your sanity when developing Azure Functions.
Visual Studio Code with Azure Function Tools
I recommend Visual Studio Code when developing any modern application since its lighter and the extensions give you a tremendous amount of flexibility. This is not to say you cannot use Visual Studio, the same tools and support exist, I just find Visual Studio Code (with the right extensions) to be the superior product, YMMV.
Once you have Visual Studio Code you will want to install two separate things:
I really cannot say enough good things about Azure Function Core Tools. It has come a long way from version 1.x and the recent versions are superb. In fact, I was able to complete my ENTIRE example without ever deploying to Visual Studio, using breakpoints all along the way.
The extension for Visual Studio Code is also very helpful for both creating and deploying Azure Functions. Unlike traditional .NET Core applications, I do not recommend using the command line to create the project. Instead, open Visual Studio Code and access your Azure Tools. If you have the Functions extension installed, you will see a dedicated blade – expand it.
The first icon (looks like a folder) enables you to create a Function project through Code. I recommend this approach since it gets you started very easily. I have not ruled out the existence of templates that could be downloaded and use through dotnet new but this works well enough.
Keep in mind that a Function project is 1:1 with a Function app so, you will want to target an existing directory if you play to have more than one in your solution. Note that this is likely completely different in Visual Studio, I do not have any advice for that approach.
When you go through the creation process you will be asked to create a function. For now, you can create whatever you like, I will be diving into our first function in Part 2, as you create subsequent functions use the lightning icon next to the folder. Doing this is not required, it is perfectly acceptable to build your functions up but, using this gets the VSCode settings correct to enable debugging with the Core Tools so, I highly recommend it.
The arrow (third icon) is for deploying. Of course, we should never use this outside of testing since we would like a CI/CD process to test and deploy code efficiently – we wont be covering CI/CD for Azure Functions in this series but, we will certainly in a future series.
Conclusion
Ok so, now we understand a little about what Durable Functions are and how they play a role in Event Driven Programming. I also walked through the tools that are used when developing Azure Functions and how to use them.
Moving forward into Part 2, we will construct our File Upload portion of the pipeline and show how it starts our Durable Function workflow.
Quick what type of password cannot be cracked? The answer is, one that is not known to anyone. You cannot reveal what you do not know. This is why so many people use Password Managers, we create insanely long passwords that we cannot remember, nor do we need to, and use them – their length and complexity makes it very difficult to crack. Plus, by making it easy to create these kinds of passwords we can avoid the other problem where the same password is used everywhere.
If you were to guess which password you would LEAST like to be compromised I am willing to bet many of you would indicate the password your web app uses to communicate with its database. And yet, I have seen so many cases where passwords to database are stored in web.config and other settings files in plain text for any would be attacker to read and use at their leisure. So I figured tonight I would tackle one of the easiest and most common ways to secure a password.
Remember RBAC
If you have been following my blog you know that, particularly of late, I have been harping on security through RBAC (Role Based Access Control). In the cloud especially, it is vital that applications only have access to what they need to carry out their role, such is the emphasis of least privileged security.
In Microsoft Azure, as well as other cloud platforms, we can associate the ability to read and update a database with a particular role and grant our web application an identity that is a member of that role. In doing so, we alleviate ourselves from having to manage a password while still ensuring that the application can only access data relevance to its task and purpose.
As I often do, I like to start from the default project template for a .NET Core Web API project. This means I have a basic API setup with the WeatherForecast related assets. The first goal will be to set this up as an EF Core driven application that auto creates its database and seeds with some data – effectively we are going to replace the GET call with a database driven select type operation.
To aid with this, and to remove myself from writing out how to build an API I am providing the source code here: https://github.com/jfarrell-examples/DatabaseMSI. From this point I will only call out certain pieces of this code and shall assume, moving forward, you have an API that you can call an endpoint and it will return data from the database.
Create a Database Admin
For the majority of these steps you will want to have the Azure CLI installed and configured for your Azure instance. You can download it here
Access Azure Active Directory from the Portal and create a new user. You can find this option off the main landing in the left navigation sidebar. Your use does not need to be anything special though, I recommend setting the password yourself.
Once the user is created, open a private window or tab and log in to https://portal.azure.com as that user. You do this to validate the account and reset the password, it shows as Expired otherwise. We are going to use this user as your Azure SQL Admin (yes, I assume you already created this).
The tutorial linked above provides a command line query to search for your newly created Azure AD User and get its corresponding objectId (userId for the uninitiated). I personally prefer just using the following command:
az ad user list –query “[].{Name: displayName,Id: objectId}” -o table
This will format things nicely and require you only to look for the display name you gave the user. You will want to save the objectId to a shell variable or paste it somewhere you can easily copy it.
az sql server ad-admin create –resource-group <your rg> –server-name <db-server-name> –display-name ADMIN –object-id <ad-user-objectId>
This command will install our user as an admin to the target SQL Server. Replace the values above as shown. You can use whatever you like for display-name.
Congrats you have now linked the AD User to SQL Server and given them admin rights. We wont connect as this user, but we need this user to carry out certain tasks.
Configure the WebAPI to Support MSI Login
As a note, the link above also details the steps for doing this with ASP .NET, I wont be showing that, I will be focusing only on ASP .NET Core.
We need to inform that which is managing our database connection, if anything (EF Core for me in this case) that we are going to use MSI authentication. As with most MSI related things, this will entail getting an access token from the identity authority within Azure.
Open the DbContext and add the following code as part of your constructor:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
For this to work you will need to add the Microsoft.Azure.Services.AppAuthentication NuGet package but the rest of this can be pasted in as a good starting point.
I have also added a AppSettings Env which denotes the present environment. In this case, since I am showing an example I will only have Local and Cloud. In professional projects the set of allowable values is going to be higher. From a purpose standpoint, this allows the code to use a typical connection method (username and password) locally.
Remember, it is essential that, when developing systems that will access cloud resources, we ensure a solid way for developers to interact with those same resources (or a viable alternative) without having to change code or jump through hoops.
The final bit is to prepare our connection string for use in the Cloud with MSI. In .NET Core this change is absurdly simple, the tutorial shows this but you will want this connection string to be used in your cloud environments:
With this in place, we can now return to the Cloud and finish our setup.
Complete the manged identity setup
The use of MSI is built upon the concept of identity in Azure. There are, fundamentally, two types: user defined and system assigned. The later is the most common as it allows Azure to manage the underlying authentication mechanics.
The enablement of this identity for your Azure Resources is easy enough from the portal but, it can also be done via the Azure CLI using the following command (available in the tutorial linked above):
az webapp identity assign –resource-group <rg-name> –name <app-name>
This will return you a JSON object showing the pertinent values for your Managed Identity – copy and paste them somewhere.
When you use a Managed Identity, by default, Azure will name the identity after the resource for which it applies, in my case this was app-weatherforecast. We need to configure the rights within Azure SQL for this identity – to do that, we need to enter the database.
There are a multitude of ways to do this but, I like how the tutorial approaches it using Cloud Shell. sqlcmd is a program you can download locally but, I always prefer to NOT add additional firewall rules to support external access. Cloud Shell allows me to handle these kind of operations within the safety of the Azure Firewall.
This command will get you to a SQL prompt within your SQL Server. I want to point out that the add-user-name is the domain username assigned to the user you created earlier in this post. You will need to include the “@mytenant.domain” suffix as part of the username. You are logging in as the ADMIN user you created earlier.
When your application logs into the SQL Server it will do as a user with the name from the identity given (as mentioned above). To support this we need to do a couple things:
We must create a user within our SQL Server database that represents this user
For the created user we must assigned the appropriate SQL roles, keeping in mind principle of least privileged access
From the referenced tutorial, you will want to execute the following SQL block:
CREATE USER [<identity-name>] FROM EXTERNAL PROVIDER;
GO
ALTER ROLE db_datareader ADD MEMBER [<identity-name>];
GO
ALTER ROLE db_datawriter ADD MEMBER [<identity-name>];
GO
ALTER ROLE db_ddladmin ADD MEMBER [<identity-name>];
GO
Remember, identity-name here is the name of the identity we created earlier, or the name of your Azure resource if using System Assigned identity.
So, your case may vary but, deeply consider what roles your application needs. If you application will only be access the database to read you can forgo adding the datawriter and ddladmin roles.
If the database is already set in stone and you wont need new tables created by an ORM than you likely will not need the ddladmin role. Always consider, carefully, the rights given to a user. Remember, our seminal aim as developers is to ensure that, in the event of a breach, we limit what the attacker can do – thus if they somehow spoof our MSI in this case, we would want them to be confined to ONLY this database. If we used a global admin, they would then have access to everything.
Congrats. That is it, you now have MSI authentication working for your application.
Closing Thoughts
Frankly, there are MANY ways to secure the credentials for critical systems like databases in applications, from encryption, to process restrictions, to MSI – all have their place and all address the radically important goal of limiting access.
The reason I like MSI over many of these options is two principle reasons:
It integrates perfectly into Azure and takes advantage of existing features. I always prefer to let someone else do something for me if they are better at it, and Microsoft is better at identity management than I am. Further, since we can associate with roles inside Azure its easier to limit access to the database and other systems the corresponding application accesses
It totally removes the need to store and manage a password. As you saw above, we never referenced a password at any point. This is important since an attacker cannot steal what is never made available.
Attackers are going to find our information, they are going to hack our systems, we can do what we can to prevent this but, it will happen. So, the least we can do is make their efforts useless or limit what they can steal. Keep passwords out of source code, use Key Vault, and leverage good automation pipelines to ensure sensitive values are never exposed or “kept in an Excel somewhere”. Ideally, the fewer people that know these passwords the better.
The database is, for many applications, the critical resources and using MSI can go a long way to protecting our data and ensuring proper access and limit blast radius for attacks.
Using Auth0 for applications is a great way to offload user management and role management to a third party provider which can aid in limiting the blast radius of a breach. While, I will not yet be getting into truly customizing the login experience (UI related) I do want to cover how we can take control of the process to better hide our values and control the overall experience.
You dont control Authentication
In previous posts I discussed the difference between Authentication and Authorization and I bring it up here as well. For applications, Authentication information is the information we least want in the hands of an attacker – being able to log into a system legitimately can make it very hard to track down what has been lost and what has been compromised. This is why authentication mechanisms leveraging OAuth or OpenId rely on performing the authentication OUTSIDE of your system.
By performing the authentication outside of your system and returning a token your site never even see’s the user credentials and you cannot expose what you do not have. Google Pay and other contactless payment providers operate on a similar principal – they grab the credit card number and information for the merchant and pass back a token with the status.
Understanding this principle is very important when designing systems of authentication. Authorization information is less important in terms of data loss but, highly important in terms of proper provisioning and management.
For the remainder of this we will be looking mainly at how to initiate authentication in the backend.
Preparation
For this example, I constructed a simple ASP .NET Core MVC Application and created an Index view with a simple Login button – this could have been a link.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
The goal here is to have a way for the user to initiate the login process. This post will not cover how to customize this Login Screen (hoping to cover that in the future).
Let’s get started
Looking at the code sample above you can see that, submitting the Form goes to a controller called Home and an action called Login. This does not actually submit anything because, remember, Auth0 operates as a third party and we want our users to login and be authenticated there rather than on our site. Our site only cares about the tokens that indicate Auth0 verified the user and their access to the specific application.
Here is the skeleton code for the action that will receive this Post request:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
OAuth flows are nothing more than a back and forth of specific URLs which first authorize our application request to log in, then authorize the user based on credentials, after which a token is generated and a callback URL is invoked. We want to own this callback URL.
Install the following Nuget package: Auth0.AuthenticationApi – I am using version 7.0.9. Our first step is to construct the Authentication Url which will send us to to Auth0 Lock which handles authentication.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
The use of the Client Id indicates to Auth0 for which application we want to authenticate against
Response Type is code meaning, we are asking for an authorization code that we can use to get other tokens
Connection indicates what connection scope our login can use. Connection scopes indicate where user credential information is stored (you can have multiple). In this case I specify Username-Password-Authentication which will disallow social logins
The redirect URI indicates what URL the auth code is passed to. In our case, we want it passed back to us so this URL is for another controller/action combination on our backend. Be sure your Application Settings also specify this URL
Scope is the access rights the given User will have. By default we want them to be able to access their information
State is a random variable, you will often see it as a value called nonce. This is just a random value designed to make the request unique
After we specify these values and call Build and ToString to get a URL that we can Redirect to. This will bring up the Login Lock screen for Auth0 to allow our user to present their credentials.
Receive the callback
Our next bit is to define the endpoint that will receive the callback from Auth0 when the login is successful. Auth0 will send us a code in the query string that indicates the login was successful.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This is not atypical for application which use this – if you ever looked at the Angular sample it too provides for a route that handles the callback to receive the code. Once we get the code we can ask for a token. Here is the complete code:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Here we are asking for authorization to the application and it will come with two pieces of information that we want – Access Token and Id Token. The former is what you pass to other APIs that you want to access (your permissions are embedded in this token) and the Id Token represents you user with all of their information.
To aid in what these tokens look like (wont cover Refresh tokens here) I have created this simple custom C# class and Razor view:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Successfully logging in will eventually land you on this Razor page where you should values for everything but AuthCode (there is no set in the code snippet). But something might strike you as weird, why is the access_token so short. In fact, if you run it through jwt.io you may find it lacks any real information.
Let’s explain.
By default Tokens can only talk to Auth0
In previous posts I have discussed accessing APIs using Auth0 Access Tokens. Core to that is the definition of an audience. I deliberately left his code off when we built our Authentication Url as part of login. Without it, Auth0 will only grant access to the userinfo API hosted in Auth0. If we also want that token to be good for our other APIs we need to register them with Auth0 and indicate our UI app can access it.
Before discuss this further, lets update our Auth URL building code as such:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
The difference here is Line 12 where we specify our audience. Note that we are NOT allowed to specify multiple audiences so, when designing a microservice API this can be tricky. I want to cover this more in-depth in a later post.
With this in place you will see two things if you run through the flow again:
Your access token is MUCH longer and will contain relevant information for accessing your API
The refresh token will be gone (I cannot speak to this yet)
This will now return to your call the Access Token you can use and store to access APIs. Congrats.
Why use this?
So when would an approach this like this be practical? I like it for applications that need more security. You see, with traditional SPA application approaches you wind up exposing your client Id and potentially other information that, while not what I could sensitive, is more than you may want to expose.
Using this approach all of the information remains in the backend and is facilitated outside of the users control or ability to snoop.
Conclusion
In this post, I showed how you can implement the OAuth flow yourself using Auth0 API. This is not an uncommon use case and can be very beneficial should your application require tighter control over the process.