Kubernetes makes managing services at scale very easy by creating abstractions around the common pain points of application management. One of the core values is to treat things as stateless since everything is designed in a way to be highly reliable and failure resistant.
This stateless nature can befuddle developers who want to create services that persist state, databases are the most common manifestation of this need. Putting aside the ongoing debate on whether databases should be located in a cluster when managed providers exist, I wanted to instead focus on the aspect of disk storage.
It is a well establish fact that storing user data within a container or within a Pod in Kubernetes is simply not acceptable and presents too many challenges even outside potential for data loss. Thankfully, each of the managed providers enables a way to map volumes from their storage services into Kubernetes allowing the cluster to save data to a persistent, scalable, and resilient storage medium. In this post I will walk through how I accomplished this with Azure Kubernetes Service.
Part 1: The Application
I am going to deploy a simple Web API .NET Core application with a single endpoint to accept an arbitrary file uploaded. Here is what this endpoint looks like:
|public class FileController : ControllerBase|
|private readonly IConfiguration _configuration;|
|private readonly ILogger<FileController> _logger;|
|public FileController(ILogger<FileController> logger, IConfiguration configuration)|
|_logger = logger;|
|_configuration = configuration;|
|public async Task<IActionResult> Post(IFormFile file)|
|var uploadStream = file.OpenReadStream();|
|using (var fileStream = System.IO.File.Create(Path.Join(_configuration.GetValue<string>("OutputDirectory"), Guid.NewGuid().ToString())))|
All we are doing here is reading in the stream from the request and then saving it to a directory as defined in our configuration. Locally, this will be driven by our appsettings.json file. DotNet Core will automatically ensure that Environment variables are also added as the program starts – these will overwrite values with the same name coming from the JSON files (this will be very important to us).
We can now create our mostly standard Dockerfile – below:
|FROM mcr.microsoft.com/dotnet/core/sdk:3.1 as sdk|
|COPY . .|
|RUN dotnet publish -c Release -o output|
|FROM mcr.microsoft.com/dotnet/core/aspnet:3.1 as runtime|
|COPY –from=sdk /code/output .|
|RUN mkdir /image_write|
|ENV OutputDirectory /image_write|
|ENTRYPOINT [ "dotnet", "FileUpload.dll" ]|
Do you see a slight difference? In the Dockerfile I created an environment variable to overwrite the value in appSettings.json (/image_write in this case). This now gives me a way to mount external resources to this location in the container, very important when we get into Kubernetes.
Build this image and push it to a registry your cluster has access to.
Part 2: Setup and mount the Azure File Share
Our next step involves creating an Azure file share and enabling our cluster to communicate with it thus allowing us to mount it when Pods are deployed/brought online.
Microsoft actually does a very good job explaining this here: https://docs.microsoft.com/en-us/azure/aks/azure-files-volume
By following these instructions you end up with a new Azure Storage Account that contains a file share. We store the connection string for this file share in an Kubernetes secret (in the same namespace our stuff is going to get deployed to (I call mine file-upload).
Here is the deployment spec I used to deploy these Pods with the appropriate mounting:
|– name: file-upload-server|
|– containerPort: 80|
|– name: savepath|
|– name: savepath|
So you can see in the container spec section, we mount our savepath volume to our defined path. We then define this volume as coming from Azure in our volumes section. The rest of the definition is as we would expect.
From here you would need to enable external access to the Pods, you have three options:
- Service of type NodePort and then call the appropriate IP with /file using POST – refer to the endpoint definition for the parameter name.
- Service of type LoadBalancer – instructions same as above
- Use of Ingress controller to split the connection at the Kubernetes level
This was a pretty neat exercise and I was impressed at just how easy it was to set this up. Having our data be stored on a managed provider means we can apply Kubernetes to more scenarios and get more value – since the managed cloud providers just have more resources.