Token Enrichment with Azure B2C

Recently while working with a client, we were asked to implement an Authorization model that could crosscut our various services with the identity tying back to B2C and the OAuth pattern in use. A lot of ideas were presented, and it was decided to explore (and ultimately use) the preview feature Token Enrichment: https://learn.microsoft.com/en-us/azure/active-directory-b2c/add-api-connector-token-enrichment?pivots=b2c-user-flow

How does OAuth handle Authorization?

Prior to the rise of OAuth, or delegated authentication really, authorization systems usually involved database calls based on a user Id which was passed as part of a request. Teams might even leverage caching to speed things up but, in the world of monoliths this was heavily used, realistically for most teams there was no alternative.

Fast forward to the rise in distributed programming and infrastructure patterns like Microservices or general approaches using SOA (Service Oriented Architecture) and this pattern falls on its face, and hard. Today, the very idea of supporting a network call for EVERY SINGLE request like this is outlandish and desperately avoided.

Instead, teams leverage an identity server concept (B2C here, or Okta, Auth0 [owned by Okta] or Ping] whereby a central authority issues the token and embeds the role information into the token’s contents; the names of roles should never constitute sensitive information. Here is a visual:

The critical element to understand here is that tokens are hashed and signed. Any mutation of the token will render it invalid and unable to pass any authoritative check. Thus, we just need to ensure no sensitive data is exposed in the token, as they can be easily decoded by sites like https://jwt.ms and https://jwt.io

Once the Access Token is received by the service and it is verified, the service can then strip claims off the token and use it for its own processing. I wont be showing you in this article BUT dotnet (and many other web frameworks) natively support constructs to make this parsing easy and enable easy implementation of Auth systems driven by claims.

How do I do this with B2C?

B2C supports API Connectors per the article above. These connectors allow B2C to reach out at various stages and contact an API to perform additional work; including enrichment.

The first step in this process is the creation of a custom attribute to be sent with the Access Token to hold the custom information, I called mine Roles.

Create the Custom Attribute for ALL Users

  1. From your Azure B2C Tenant select User Attributes
  2. Create a new Attribute called extension_Roles of type string
  3. Click Save

The naming of the attribute here is crucial. It MUST be preceded be extension_ for B2C to return the value.

This attribute is created ONLY to hold the value coming from token enrichment via the API, it is not stored in B2C, only returned as part of the token.

Configure your sign-in flow to send back our custom attribute

  1. Select User Flows from the main menu in Azure B2C
  2. Select your sign-in flow
  3. Select Application claims
  4. Find the custom claim extension_Roles in the list
  5. Click Save

This is a verification step. We want to ensure our new attribute is the Application Claims for the flow and NOT, the user attributes. If it is in the user attributes, it will appear in the sign-up pages.

Deploy your API to support the API Connector

The link at the top shows what the payload to the API connector looks like as well as the response. I created a very simple response in an Azure Function, shown below:

public class HttpReturnUserRolesFunction
{
[FunctionName("HttpReturnUserRoles")]
public IActionResult HttpReturnUserRoles(
[HttpTrigger(AuthorizationLevel.Anonymous, "get", "post", Route = null)] HttpRequest req,
ILogger log)
{
return new OkObjectResult(new {
version = "1.0.0",
action = "Continue",
postalCode = "12349",
extension_Roles = "Admin,User"
});
}
}
view raw function.cs hosted with ❤ by GitHub

We can deploy this the same way we deploy anything in Azure; in my testing I used right-click publishing to make this work.

Setting the API Connector

We need to configure B2C to call this new endpoint to enrich the provided token.

  1. Select API Connectors from the B2C menu
  2. Click New API Connector
  3. Choose any Display name (I used Enrich Token)
  4. For the Endpoint URL input the web page to your API
  5. Enter whatever you want for the Username and Password
  6. Click Save

The username and password can provide an additional layer of security by sending a base64 encoded string to the API endpoint which the endpoint can decode and validate the caller is legitimate. In the code above, I choose not to do this, though I would recommend it for a real scenario.

Configure the Signup Flow to call the Enrich Token API endpoint

The last step of the setup is to tell the User Flow for Signup/Sign-in to call our Enrich Token endpoint.

  1. Select User Flows
  2. Select the User Flow that represents the Signup/Signin operation
  3. Select API Connectors
  4. For the Before including application claims in token (preview) select the Enrich Token API Connector (or whatever name you used)
  5. Click Save

This completes the configuration for calling the API Connector as part of the user flow.

Testing the Flow

Now let’s test our flow. We can do this using the built-in flow tester in B2C. Before that though, we need to create an Application Registration and set the a reply URL so the flow has some place to dump the user when validation is successful.

For this testing, I recommend using either jwt.ms or jwt.io which will receive the token from B2C and show its contents. For more information on creating an Application Registration see this URL: https://learn.microsoft.com/en-us/azure/active-directory-b2c/tutorial-register-applications?tabs=app-reg-ga

Once you have created the registration return to the B2C page and select User Flows. Select your flow, and then click Run User Flow. B2C will ask under what identity do you want to run the flow as. Make sure you select the identity you created and validate that the Reply URL is what you expect.

Click Run user flow and login (or create a user) and you should get dumped to the reply URL and see your custom claims. Here is a sample of what I get (using the code above).

{
"alg": "RS256",
"kid": "X5eXk4xyojNFum1kl2Ytv8dlNP4-c57dO6QGTVBwaNk",
"typ": "JWT"
}.{
"ver": "1.0",
"iss": "b2clogin_url/v2.0/",
"sub": "d0d196a4-96b3-4c46-b550-842ab59cd4d8",
"aud": "3a61cc01-104a-44c8-a3ff-d895a860d70e",
"exp": 1695000577,
"nonce": "defaultNonce",
"iat": 1694996977,
"auth_time": 1694996977,
"extension_Roles": "Admin,User",
"tfp": "B2C_1_Signup_Signin",
"nbf": 1694996977
}.[Signature]
view raw token.json hosted with ❤ by GitHub

Above you can see extension_Roles with the value Admin,User. This token can then be parsed in your services to check what roles are given to the user represented by this token.

Configuring Function Apps to Use Azure App Configuration

Recently, we had a client that wanted to use an Azure Function app to listen to a Service Bus. Easy enough with ServiceBusTrigger but I wanted to ensure that the queue name to listen to came from Azure App Configuration service. But this proved to be more challenging.

What are we trying to do?

Here is what our function looks like:

public class ServiceBusQueueTrigger
{
[FunctionName("ServiceBusQueueTrigger")]
public void Run(
[ServiceBusTrigger(queueName: "%QueueName%", Connection = "ServiceBusConnection")]string myQueueItem,
ILogger log)
{
log.LogInformation($"C# ServiceBus queue trigger function processed message: {myQueueItem}");
}
}
view raw trigger.cs hosted with ❤ by GitHub

As you can see, we are using the %% syntax to indicate to the Function that it should pull the queue name from configuration. Our next step would be to connect to Azure App Configuration and get our configuration, including the queue name.

If you were to follow the Microsoft Learn tutorials, you would end up with something like this for the Startup.cs file:

[assembly: FunctionsStartup(typeof(FunctionApp.Startup))]
namespace FunctionApp
{
class Startup : FunctionsStartup
{
public override void ConfigureAppConfiguration(IFunctionsConfigurationBuilder builder)
{
string cs = Environment.GetEnvironmentVariable("ConnectionString");
builder.ConfigurationBuilder.AddAzureAppConfiguration(cs);
}
public override void Configure(IFunctionsHostBuilder builder)
{
}
}
}
view raw startup.cs hosted with ❤ by GitHub
This came from: https://learn.microsoft.com/en-us/azure/azure-app-configuration/quickstart-azure-functions-csharp?tabs=in-process

If you use this code the Function App will not start. The reason is because the way the loading process happens is the configuration will now bind to the parameters in a Trigger. This all works fine for code that the functions execute, but if you are trying to bind trigger parameters to configuration values you have to do something different.

What is the solution?

After much Googling I came across this: https://github.com/Azure/AppConfiguration/issues/203

This appears to be a known issue that does not have an official solution, but the above workaround does work. So, if we use this implementation, we remove the error which prevents the Function Host from starting.

[assembly: FunctionsStartup(typeof(ConfigTest.Startup))]
namespace ConfigTest
{
public class Startup : IWebJobsStartup
{
public IConfiguration Configuration { get; set; }
public void Configure(IWebJobsBuilder builder)
{
var configurationBuilder = new ConfigurationBuilder();
configurationBuilder.AddEnvironmentVariables();
var config = configurationBuilder.Build();
configurationBuilder.AddAzureAppConfiguration(options =>
{
options.Connect(config["AppConfigConnectionString"])
.ConfigureRefresh(refresh =>
{
refresh.Register("QueueName", refreshAll: true)
.SetCacheExpiration(TimeSpan.FromSeconds(5));
});
});
Configuration = configurationBuilder.Build();
builder.Services.Replace(ServiceDescriptor.Singleton(typeof(IConfiguration), Configuration));
}
}
}
view raw startup2.cs hosted with ❤ by GitHub

Now this is interesting. If you are not aware, Function Apps are, for better or worse, built on much of the same compute infrastructure as App Service. App Service has a feature called WebJobs which allowed them to perform actions in the background – much of this underlying code seems to be in use for Azure Functions. FunctionsStartup, which is what is recommended for the Function App startup process, abstracts much of this into a more suitable format for Function Apps.

Here we are actually leveraging the old WebHost routines and replacing the configuration loaded by the Function Host as part of Function startup. This lets us build the Configuration as we want, thereby ensuring the Function Host is aware of the value coming in from App Configuration and supporting the binding to the Trigger parameter.

As a side note, you will notice that I am building configuration twice. The first time is so I can bring in Environment variables (values from the Function Configuration blade in Azure) which contains the endpoint for the App Configuration service.

The second time is when I build by IConfiguration type variable and then run replace to ensure the values from App Configuration are available.

Something to keep in mind

The %% syntax is a one-time bind. Thus, even though App Configuration SDK does support the concept of polling, if you change a value in Configuration service and it gets loaded via the poller the trigger bindings will not be affected – only the executing code.

Now, I dont think this is a huge issue because I dont think most use cases call for a real time value change on that binding and you would need the Function Host to rebind anyway. Typically, I think a change like this is going to be accompanied by a change to the code and a deployment which will force a restart anyway. If not, you can always indicate a restart action to the Function App itself which will accomplish the same goal.

Assigning Roles to Principals with Azure.ResourceManager.Authorization

I felt compelled to write this post for a few reasons, most centrally that, while I do applaud the team for putting out a nice modern library I must also confess that it has more than a few idiosyncrasies and the documentation is very lacking. To the effect of the former, I feel the need to talk through an experience I had recently involving a client project.

In Azure there are many different kinds of users, each relating back to a principal: User, Group, Service, Application, and perhaps others. Of these all but Application can be assigned RBAC (Role Based Access Control) roles in Azure, the foundational way security is handled.

The Azure.ResourceManager (link) and its related subprojects are the newest release aimed at helping developers code against the various Azure APIs to enable code based execution of common operations – this all replaces the previous package Microsoft.Azure.Management (link) which has been deprecated.

A full tutorial on this would be helpful and while the team has put together some documentation, more is needed. For this post I would like to focus on one particular aspect.

Assigning a Role

I recently authored the following code aimed at assigning a Service Principal to an Azure RBAC role. Attempting this code frequently led to an error stating I was trying to change Tenant Id, Application Id, Principal Id, or Scope. Yet as you can see, none of those should be changing.

public Task AssignRoleToServicePrincipal(Guid objectId, string roleName, string scopePath)
{
var tcs = new TaskCompletionSource();
Task.Run(() =>
{
try
{
var roleAssignmentResourceId = RoleAssignmentResource.CreateResourceIdentifier(scopePath, roleName);
var roleAssignmentResource = _armClient.GetRoleAssignmentResource(roleAssignmentResourceId);
var operationContent = new RoleAssignmentCreateOrUpdateContent(roleAssignmentResource.Id, objectId)
{
PrincipalType = RoleManagementPrincipalType.ServicePrincipal
};
var operationOutcome = roleAssignmentResource.Update(Azure.WaitUntil.Completed, operationContent);
tcs.TrySetResult();
}
catch (Exception ex)
{
tcs.TrySetException(ex);
}
});
return tcs.Task;
}
view raw version1.cs hosted with ❤ by GitHub

I have some poor variable naming in here but here is a description of the parameters to this method:

  • objectId – the unique identifier for the object within Azure AD (Entra). It is using this Id that we assign roles and take actions involving this Service Prinicipal
  • roleName – in Azure parlance, this is the name of the role which is a Guid, it can also be thought of as the role Id. There is another property called RoleName which returns the human readable name, ie Reader or Contributor.
  • scopePath – this is the path of assignment, that is where in the Azure resource hierarchy we want to make the assignment. This could reference a Subscription, a Resource Group, or a Resource itself

As you can see, there is no mutating of the values listed. While RoleAssignmentCreateOrUpdateContent does have a Scope property, it is read-only. The error was sporadic and annoying. Eventually I realized the issue, it is simple but does require a deeper understanding of how role assignments work in Azure.

The Key is the Id

Frankly, knowing what I know now I am not sure how the above code ever worked. See, when you create a role assignment that action, in and of itself, has to have a unique identifier. A sort of entry that represents this role definition with this scoping. In the above I am missing that, I am trying to use the Role Definition Id instead. After much analysis I finally realized this and modified the code as such:

public Task AssignRoleToServicePrincipal(Guid objectId, string roleDefId, string scopePath)
{
var tcs = new TaskCompletionSource();
Task.Run(() =>
{
try
{
var scopePathResource = new ResourceIdentifier(scopePath);
var roleDefId = $"/subscriptions/{scopePathResource.SubscriptionId}/providers/Microsoft.Authorization/roleDefinitions/{roleName}";
var operationContent = new RoleAssignmentCreateOrUpdateContent(new ResourceIdentifier(roleDefId), objectId)
{
PrincipalType = RoleManagementPrincipalType.ServicePrincipal
};
var roleAssignmentResourceId = RoleAssignmentResource.CreateResourceIdentifier(scopePath, Guid.NewGuid().ToString());
var roleAssignmentResource = _armClient.GetRoleAssignmentResource(roleAssignmentResourceId);
var operationOutcome = roleAssignmentResource.Update(Azure.WaitUntil.Completed, operationContent);
tcs.TrySetResult();
}
catch (Exception ex)
{
tcs.TrySetException(ex);
}
});
return tcs.Task;
}
view raw version2.cs hosted with ❤ by GitHub

As you can see, this code expands things. Most notably is the first section where I build the full Role Definition Resource Id this is the unique Id for a Role Definition, which can later be assigned.

Using this library, the content object indicates what I want to do – assign objectId the role definition provided. However, what I was missing was the second part: I had to tell it WHERE to make this assignment. It seems obvious now but, it was not at the time.

The obvious solution here, since it has to be unique, is to just use Guid.NewGuid().ToString(). When I call Update it will get the point of assignment from roleAssignmentResource.

And that was it, just like that, the error went away (a horribly misleading error mind you). Now the system works and I learned something about how this library works.

Hope it helps.

Mounting Key Vault Secrets into AKS with CSI Driver

Secret values in Kubernetes has always been a challenge. Simply put, the notion of putting sensitive values into a Secret with nothing more than Base64 encoding, and hopefully RBAC roles has seemed like a good idea. Thus the goal was always find a better way to bring secrets into AKS (and Kubernetes) from HSM type services like Azure Key Vault.

When we build applications in Azure which access services like Key Vault we do so using Managed Service Identities. These can either be generated for the service proper or assigned as a User Assigned Managed Identity. In either case, the identity represents a managed principal, one that Azure controls and is only usable from within Azure itself, creating an effective means of securing access to services.

With a typical service, this type of access is straightforward and sensible:

The service determines which managed identity it will use and contacts the Azure Identity Provider (and internal service to Azure) and receives a token. It then uses this token to contact the necessary service. Upon receiving the request with the token, the API determines the identity (principal) and looks for relevant permissions assigned to the principal. It then uses this to determine whether the action should be allowed.

In this scenario, we can be certain that a request originating from Service A did in fact come from Service A. However, when we get into Kubernetes this is not as clear.

Kubernetes is comprised of a variety of components that are used to run workloads. For example:

Here we can see the identity can exist at 4 different levels:

  • Cluster – the cluster itself can be given a Managed Identity in Azure
  • Node – the underlying VMs which comprise the data layer can be assigned a Managed Identity
  • Pod – the Pod can be granted an identity
  • Workload/Container – The container itself can be granted an identity

This distinction is very important because depending on your scenario you will need to decide what level of access makes the most sense. For most workloads, you will want the identity at the workload level to ensure minimal blast radius in the event of compromise.

Using Container Storage Interface (CSI)?

Container Storage Interface (CSI) is a standard for exposing storage mounts from different providers into Container Orchestration platforms like Kubernetes. Using it we can take a service like Key Vault and mount it into a Pod and use the values securely.

More information on this is available here: https://kubernetes-csi.github.io/docs/

AKS has the ability to leverage CSI to mount Key Vault, given the right permissions, and access these values through the CSI mount.

Information on enabling CSI with AKS (new and existing) is here: https://learn.microsoft.com/en-us/azure/aks/csi-storage-drivers

For the demo portion, I will assume CSI is enabled. Let’s begin.

Create a Key Vault and add Secret

Create an accessible Key Vault and create a single secret called MySecretPassword. For assistance with doing this, see these instructions: https://learn.microsoft.com/en-us/azure/key-vault/general/quick-create-portal and https://learn.microsoft.com/en-us/azure/key-vault/secrets/quick-create-portal#add-a-secret-to-key-vault

Create a User Managed Identity and assign rights to Key Vault

Next we need to create an Service Principal that will serve as our identity for our workload. This can be created in a variety of ways. For this demo, we will use a User assigned identity. Follow these instructions to create: https://learn.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/how-manage-user-assigned-managed-identities?pivots=identity-mi-methods-azp#create-a-user-assigned-managed-identity

Once you have the identity, head back to the Key Vault and assign the Get and List permissions for Secrets to the identity. Shown here: https://learn.microsoft.com/en-us/azure/key-vault/general/assign-access-policy?tabs=azure-portal

That is it, now we shift our focus back to the cluster.

Enable OIDC for the AKS Cluster

OIDC (OpenID Connect) is a standard for creating federation between services. It enables the identity to register with the service and the token exchange occurring as part of the communication is entirely transparent. By default AKS will NOT enable this feature, you must enable it via the Azure Command line (or PowerShell).

More information here: https://learn.microsoft.com/en-us/azure/aks/use-oidc-issuer

Make sure to record this value as it comes back, you will need it later

Create a Service Account

Returning to your cluster, we need to create a Service Account resource. For this demo, I will be creating the account relative to a specific namespace. Here is the YAML:

apiVersion: v1
kind: Namespace
metadata:
name: blog-post
apiVersion: v1
kind: ServiceAccount
metadata:
name: kv-access-account
namespace: blog-post
view raw setup.yaml hosted with ❤ by GitHub

Make sure to record these values, you will need them later.

Federate the User Assigned Identity with the Cluster

Our next step will involve creating a federation between the User assigned identity we created and the OIDC provider we enabled within our cluster. The following command can be used WITH User Assigned Identities – I linked the documentation for an unmanaged identities below:

az identity federated-credential create
–name "kubernetes-federated-credential"
–identity-name $USER_ASSIGNED_IDENTITY_NAME
–resource-group $RESOURCE_GROUP
–issuer $AKS_OIDC_URL
–subject "system:serviceaccount:${SERVICE_ACCOUNT_NAMESPACE}:${SERVICE_ACCOUNT_NAME}"
view raw federate.sh hosted with ❤ by GitHub

As a quick note, the $RESOURCE_GROUP value here refers to the RG where the User Identity you created above is located. This will create a trusted relationship between AKS and the Identity, allow workloads (among others) to assume this identity and carry out operations on external services.

How to do the same using an Azure AD Application: https://azure.github.io/secrets-store-csi-driver-provider-azure/docs/configurations/identity-access-modes/workload-identity-mode/#using-azure-ad-application

Create the Secret Provider Class

One of the resource kinds that is added to Kubernetes when you enable CSI is the SecretProviderClass. We need this class to map our secrets into the volume we are going to mount into the Pod. Here is an example, an explanation follows:

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: azure-kv-password-provider
namespace: blog-post
spec:
provider: azure
parameters:
keyvaultName: kv-blogpost-jx01
clientID: "client id of user assigned identity"
tenantId: "tenant id"
objects: |
array:
– |
objectName: MySecretPassword
objectType: secret

Mount the Volume in the Pod to access the Secret Value

The next step is to mount this CSI volume into a Pod so we can access the secret. Here is a sample of what the YAML for a Pod like this could look like. Notice I am leveraging an example from the Example site: https://azure.github.io/secrets-store-csi-driver-provider-azure/docs/getting-started/usage/#deploy-your-kubernetes-resources

kind: Pod
apiVersion: v1
metadata:
name: busybox-secrets-store-inline
namespace: blog-post
spec:
serviceAccountName: kv-access-account
containers:
– name: busybox
image: registry.k8s.io/e2e-test-images/busybox:1.29-4
command:
– "/bin/sleep"
– "10000"
volumeMounts:
– name: secrets-store-inline
mountPath: "/mnt/secrets-store"
readOnly: true
volumes:
– name: secrets-store-inline
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: "azure-kv-password-provider"
view raw pod.yaml hosted with ❤ by GitHub

This example uses a derivative of the busybox image that is provided via the example. The one change that I made was adding serviceAccountName. Recall that we created a Service Account above and defined it as part of the Federated Identity creation payload.

You do not actually have to do this. You can instead use default which is the default Service Account all pods run under within a namespace. However, I like to define the user more specifically to be 100% sure of what is running and what has access to what.

To verify things are working. Create this Pod and run the following command:

kubectl exec --namespace blog-post busybox-secrets-store-inline -- cat /mnt/secrets-store/MySecretPassword

If everything is working, you will see your secret value printed out in plaintext. Congrats, the mounting is working.

Using Secrets

At this point, we could run our application in a Pod and read the secret value as if it were a file. While this works, Kubernetes offers a way that is, in my view, much better. We can create Environment variables for the Pod from secrets (among other things). To do this, we need to add an additional section to our SecretProviderClass that will automatically create a Secret resource whenever the CSI volume is mounted. Below is the updated SecretProviderClass:

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: azure-kv-password-provider
namespace: blog-post
spec:
provider: azure
secretObjects:
– secretName: secret-blog-post
type: Opaque
data:
– objectName: MySecretPassword
key: Password
parameters:
keyvaultName: kv-blogpost-jx01
clientID: be059d0e-ebc1-4b84-a71c-1f51fa21ac7b
tenantId: <tenantId>
objects: |
array:
– |
objectName: MySecretPassword
objectType: secret

Notice the new section we added. This will, at the time of the CSI being mounted create a secret in the blog-post namespace called secret-blog-post with a key in the data called Password.

Now, if you apply this definition and then attempt to get secret from the namespace, you will NOT get a secret. Again, its only created when we mount it. Here is the updated Pod definition with the Environment variable from the secret.

kind: Pod
apiVersion: v1
metadata:
name: busybox-secrets-store-inline
namespace: blog-post
spec:
serviceAccountName: kv-access-account
containers:
– name: busybox
image: registry.k8s.io/e2e-test-images/busybox:1.29-4
command:
– "/bin/sleep"
– "10000"
env:
– name: PASSWORD
valueFrom:
secretKeyRef:
name: secret-blog-post
key: Password
volumeMounts:
– name: secrets-store-inline
mountPath: "/mnt/secrets-store"
readOnly: true
volumes:
– name: secrets-store-inline
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: "azure-kv-password-provider"
view raw pod2.yaml hosted with ❤ by GitHub

After you apply this Pod spec, you can run a describe on the pod. Assuming it is run and running successfully you can then run a get secret command and you should see the secret-blog-post. To fully verify our change, using this container, run the following command:

kubectl exec --namespace blog-post busybox-secrets-store-inline -- env

This command will print out a list of the environment variables present in the container, among them should be Password with a value matching the value in the Key Vault. Congrats, you can now access this value from application code the same way you could access any environment value.

This conclude the demo.

Closing Remarks

Over the course of this post, we focused on how to bring sensitive values into Kubernetes (AKS specifically) using the CSI driver. We covered why workload identity really makes the most sense in terms of securing actions from within Kubernetes, since Pods can have many containers/workloads, nodes can have many disparate pods, and clusters can have applications running over many nodes.

One thing that should be clear: security with Kubernetes is not easy. It matters little for such a demonstration however, we can see a distinct problem with the exec strategy if we dont have the proper RBAC in place to prevent certain operations.

Nonetheless, I hope this post has given you some insight into a way to bring secure content into Kubernetes and. Ihope you will try CSI in your cuture projects.

FluxCD for AKS Continuous Deployment (Private Repo)

Writing this as a matter of record, this process was much harder than it should have been so remembering the steps is crucial.

Register the Extensions

Note, the quickest way to do most of this step is the activate the GitOps blade after AKS has been created. This does not activate everything however, as you still need to run

az provider register –namespace Microsoft.Kubernetes.Configuration

This command honestly took around an hour to complete, I think – I actually went to bed.

Install the Flux CLI

While AKS does offer an interface through which you can configure these operations, I have found it out of date and not a good option for getting the Private Repo case to work, at least not for me. Installation instructions are here: https://fluxcd.io/flux/installation/

On Mac I just ran: brew install fluxcd/tap/flux

You will need this command to create the necessary resources that support the flux process, keep in mind we will do everything from command line.

Install the Flux CRDs

Now you would think that activating the Flux extension through AKS would install the CRDs, and you would be correct. However, as of this writing (6/13/2023) the CRDs installed belong to the v1beta1 variant; the Flux CLI will output the v1 variant so, it will be a mismatch. Run this command to install the CRDs:

flux install –components-extra=”image-reflector-controller,image-automation-controller”

Create a secret for the GitRepo

There are many ways to manage the secure connection into the private repository. For this example, I will be using a GitHub Personal Access Token.

Go to GitHub and create a Personal Access Token – reference: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens

For this example, I used classic, though there should not be a problem if you want use fine-grained. Once you have the token we need to create a secret.

Before you do anything, create a target namespace – I called mine fastapi-flux. You can use this command:

kubectl create ns fastapi-flux

Next, you need to run the following command to create the Secret:

flux create secret git <Name of the Secret> \

–password=<Raw Personal Access Token> \

–username=<GitHub Username> \

–url=<GitHub Repo Url> \

–namespace=fastapi-flux

Be sure to use your own namespace and fill in the rest of the values

Create the Repository

Flux operates by monitoring a repository for changes and then running YAML in a specific directory when a change occurs. We need to create a resource in Kubernetes to represent the repository it should listen to. Use this command:

flux create source git <Name of the Repo Resource> \

–branch main \

–secret-ref <Name of the Secret created previously> \

–url <URL to the GitHub Repository> \

–namespace fastapi-flux

–export > repository.yaml

This command will create the GitRepository resource in Kubernetes to represent our source. Notice here, we use the –export to indicate we only want the YAML from this command and we are directing the output to the file repository.yaml. This can be run without –export and it will create the resource without providing the YAML.

I tend to prefer the YAML so I can run it over and over and make modifications. Many tutorials online make reference to this as your flux infrastructure and will have a Flux process to apply changes to them automatically as well.

Here, I am doing it manually. Once you have the YAML file you can use kubectl apply to create the resource.

Create the Kustomization

Flux referes to its configuration for what build when a change happens as a kustomization. All this is, is a path in a repo to look for, and execute, YAML files. Similar to the above, we can create this directly using the Flux CLI or us the same CLI to generate the YAML; I prefer the later.

flux create kustomization <Name of Kustomization> \

    –source=GitRepository/<Repo name from last step>\

    –path=”./<Path to monitor – omit for root>” \

    –prune=true \

    –interval=10m \

    –namespace fastapi-flux –export > kustomization.yaml

Here is a complete reference to the command above: https://fluxcd.io/flux/components/kustomize/kustomization/

This will create a Kustomization resource that will immediately try to pull and create our resource.

Debugging

The simplest and most direct way to debug both resources (GitRepository and Kustomization) is to perform a get operation on the resources using kubectl. For both, the resource will list any relevant errors preventing it from working, The most common for me were errors were the authentication to GitHub failed.

If you see no errors, you can perform a get all against the fastapi-flux (or whatever namespace you used) to see if you items are present. Remember, in this example we placed everything in the fastapi-flux namespace – this may not be possible given you use case.

Use the reconcile command if you want to force a sync operation on a specific kustomization.

Final Thoughts

Having used this now I can see why ArgoCD (https://argoproj.github.io/cd/) has become so popular as. a means for implementing GitOps. I found Flux hard to understand due its less standard nomenclature and quirky design. Trying to do it using the provided interface from AKS did not help either as I did not find the flexibility that I needed. Not saying it isn’t there, just hard to access.

I would have to say if I was given the option, I would use ArgoCD over Flux every time.

Reviewing “Team Topologies”

Recently I finished “Team Topologies: Organizing Business and Technology Teams for Fast Flow” written by Matthrew Skelton and Manuel Pais – Amazon: https://a.co/d/1U8Gz56

I really enjoyed this book because it took a different tactic to talking about DevOps, one that is very often overlooked by organizations: Team Structure and Team Communication. A lot of organizations that I have worked with misunderstand DevOps as simply being automation or the use of a product like GitHub Actions, CircleCI, Azure DevOps, etc. But the truth is, DevOps is about so much more than this and the book really goes deep into this explore team topologies and emphasizing the need to organize communication.

In particular the book calls out four core team types:

  • Stream aligned – in the simplest sense these are feature teams but, really they are so much more. If you read The Phoenix Project by Gene Kim you start to understand that IT and engineering are not really its own “thing” but rather, a feature of a specific department, entity, or collab. Thus, what stream-aligned actually means is a set of individuals, working together to handle changes for that part of the organization
  • Enabling – this one I was aware of though, I had never given it a formal name. This team is designed to assist steam aligned teams enable something. A lot of orgs make the mistake of creating a DevOps team, which is a known anti-pattern. DevOps (or automation as it usually is) is something you enable teams with, with things like self-service and self-management. The goal of the enabling team is to improve the autonomy of the team.
  • Platform – platform teams can be stream-aligned teams but, their purpose is less about directly handling the changes for a part of the org than it is support other stream aligned teams. Where enabling teams may introduce new functionality, platform teams support various constructs to enable more streamlined operation. Examples might include a wiki with documentation for certain techniques to even a custom solution enabling the deployment of infrastructure to the cloud.
  • Complicated Sub-system – the author views this a specialized team that is aligned to a single, highly complex, part of a system or the organization (can even be a team managing regulatory compliance). The author uses the example of a trading platform, where individuals on the team manage a highly complex system performing market trades, where speed and accuracy must be perfect.

The essence of this grouping is to align teams to purpose and enable fast flow, what Gene Kim (in The DevOps Handbook) calls The First Way. Speed is crucial for an organization using DevOps, as speed to deploy also means speed to recover. And to enable that speed teams need focus (and to reduce change sizes). Too often organizations get into sticky situations and respond with still more process. While the general thought is it makes things better, really it is security theater (link) – in fact I observed this often leads to what I term TPS (Traumatic Process Syndrome) where processes become so bad, that teams do every thing they can to avoid the trauma of going through with them.

Team Topologies goes even deeper than just talking about these four team types, going even into office layouts and referencing the Spotify idea of squads. But, in the end, as the author indicates, this is all really a snapshot in time and it is important to constantly be evaluating your topology and make the appropriate shifts as priorities or realities shift – nothing should remain static.

To further make this point, the book introduces the three core communication types:

  • Collaboration – this is a short lived effort so two teams can perform discovery of new features, capabilities, and techniques in an effort to be better. The author stresses this MUST be short lived, since collaborating inherently brings about inefficiencies and blurs boundaries of responsibilities, and increased cognitive load for both teams.
  • X-as-a-Sevice – this is the natural evolution from collaboration, where one team provides functionality “as a service” to one or more teams. This is not necessarily a platform model but, instead, enforces the idea of separation of responsibilities. Contrasting with collaboration, cognitive load is minimal here as each knows their responsibilities
  • Facilitating – this is where one team is guiding another. Similar, in my view, to collaboration, it is likewise short-lived and designed to enable new capabilities. Therefore, this is the typical type of communication a stream-aligned and enabling team will experience.

One core essence of this is avoid anti-patterns like Architectural Review Boards, or another other ivory-tower planning committee. Trying to do this sort of planning up front is at best, asking for continuous stream of proposals as architectures morph, and at worst a blocking process that delays projects and diminishes trust and autonomy.

It made me recall an interaction I had with a client many years ago. I had asked “how do you ensure quality in your software?” to which they replied “we require a senior developer approve all PRs”. I looked at the person and then asked “about how many meetings per day is that person involved in?” I asked. They conferred for a moment and came back and said “8”. I then looked at them and said, “how much attention would you say he is actually exercising against the code?” It began to dawn on them. It came to light much later that that Senior Developer had not been actively in the code in months and was just approving what he was asked to approve. It was the Junior developers approving and validating their work with each other – further showing that “developers will do what it takes to get things done, even in the face of a bad process”.

And this brings me to the final point I was to discuss from this book, cognitive load. Being in the industry for 20yrs now I have come to understand that we must constantly monitor how much cognitive load an action takes, people have limits. For example, even if its straightforwad, opening a class file with 1000 lines will immediately overload cognitive load for most people. Taking a more complex approach, or trying to be fancy when it is not needed also affects cognitive load. And this makes it progressively harder for the team to operate efficiently.

In fact, Team Topologies talks about monitoring cognitive load as a way to determine when a system might need to be broken apart. And yes, that means giving time for the reduction of technical debt, even in the face of delaying features. If LinkedIn can do it (https://www.bloomberg.com/news/articles/2013-04-10/inside-operation-inversion-the-code-freeze-that-saved-linkedin#xj4y7vzkg) your organization can do it, and in doing so shift the culture to “team-first” and improve its overall health.

I highly recommend this book for all levels and roles, technologists will benefit as much as managers. Organizing teams is the key to actually getting value from DevOps. Anyone can write pipelines and automate things but, if such a shift is done without actually addressing organizational inefficiencies in operations and culture, you may do more harm than good.

Team Topologies on Amazon – https://a.co/d/1U8Gz56

Create a Private Function App

Deploying to the Cloud makes a lot of sense as the large number of services in Azure (and other providers) can help accelerate teams and decrease time to market. However, while many services are, with their defaults, a great option for hosting applications on the public Internet, it can be a bit of a mystery for scenarios where applications should be private. Here I wanted to walk through the steps of privatizing a Function App and opening it to the Internet via an Application Gateway.

Before we start, a word on Private Endpoint

This post will heavily feature Private Endpoint as a means to make private connections. Private Endpoints and the associated service, Private Link, enable to very highest levels of control over the flow of network traffic by restricting it ONLY within the attached Virtual Network.

This, however, comes at a cost as it will typically require the usage of Premium plans for services to support the feature. What is important to understand is that service-service (even cross region) communication in Azure is ALL handled on the Microsoft backbone, it never touches the public internet. Therefore, your traffic is, by default, traveling in a controlled and secure environment.

I say this because I have a lot of clients whose security teams set Private Endpoint as the default. For the vast majority of use cases, this is overkill as the default Microsoft networking is going to be adequate for majority of data cases. The exceptions are the obvious ones: HIPPAA, CIJIS, IRS, and Financial (most specifically PCI), and perhaps others. But, in my view, using it for general data transfer, is overkill and leads to bloated cost.

Now, on with the show.

Create a Virtual Network

Assuming you already have a Resource Group (or set of Resource Groups) you will first want to deploy a Virtual Network with an address space, for this example I am taking the default of 10.0.0.0/16. Include the following subnets:

  • functionApp – CIDR: 10.0.0.0/24 – will host the private endpoint that is the Function App on the Virtual Network
  • privateEndpoints – CIDR: 10.01.0/24 – will host our private endpoints for related services, Storage Account in this case
  • appGw – CIDR: 10.0.2.0/24 – will host the Application Gateway which enables access to the Function App for external users
  • functionAppOutbound – CIDR: 10.0.3.0/24 – This will be the integration point where the function app will send outbound requests

The Region selected here is critical, as many network resources can either not cross a regional boundary OR can only cross into their paired region. I am sing East US 2 for my example.

Create the Storage Account

Function Apps rely on a storage account to support the runtime. So we will want to create one to support our function app. One thing to keep in mind, Private Endpoints are NOT supported on v1 of Storage Account, only v2. If you attempt to create the Storage Account through the Portal via the Function App process, it will create a v1 account and NOT support Private Endpoint.

When you create this Storage Account, be sure the network settings are wide-open; we will adjust it after the Function App is successfully setup.

Create the Function App

Now with the Function App we want to keep a few things in mind.

  • Use the same region that the Virtual Network is deployed into
  • You MUST use either a Premium or App Service plan type, Consumption does not support privatization
  • For hosting, select the storage account you created in the previous section.
  • For the time being do NOT disable public access – we will disable it later

For added benefit I recommend picking Windows for the Operating system as it will enable in-portal editing. This will let you quickly setup the Ping endpoint I am going to describe later. Note this post does NOT go into deploying – without public access additional configuration may be required to support automated deployments.

Allow the process to complete to create the Function App.

Enable VNet Integration

VNet integration is only available on Premium SKUs and above for both Function Apps and App Services. It enables a service to sit effectively on the boundary of the VNet and communicate with private IPs in the attached VNet as well as peered VNets.

For this step, access the Networking blade in your Function App and look for the VNet integration link on the right side of the screen.

Next, click “Add VNet” and select the Virtual Network and Subnet (functionAppOutbound) which receive the outbound traffic from the Function App.

Once complete, leave ROUTE ALL enabled. Note that for many production scenarios leaving this on can create issues, as I explain next. But for this simple example, having it enabled will be fine.

What is ROUTE ALL?

I like to view an App Service, or Function App, as having two sides, inbound and outbound. VNet integration allows the traffic coming out of the service to enter a Virtual Network. Two different modes are supported: ROUTE ALL and default. With ROUTE ALL enabled ALL traffic enters the VNet, including traffic perhaps bound for an external host (https://www.google.com for example). Thus, to support this YOU must add the various control to support egress. With ROUTE ALL disabled, routing will simply follow rules within RFC1918 (link) and send 10.x and a few others into the Virtual Network and the rest will follow Azure Routing rules.

Microsoft documentation explains it more clearly: Integrate your app with a Virtual Network

Setup Private Connection for Storage Account

Function Apps utilize two sub-services within Storage Account for operation: blob and file. We need to create private endpoints for these two sub-services so that, using the VNet Integration we just enabled, the connection to the runtime is handled via private connection.

Access the Storage Account and select the Networking blade. Immediately select Disabled for Public network access. This will force the use of Private Endpoint as the sole means to access the storage account. Hit Save before continuing to the next step.

Select the Private endpoint connections tab from the top. Here is a screen shot of the two Private Endpoints I created to support my case:

We will create a Private Endpoint for file share and blob services, as these are being used by the Function App to support the runtime. By doing this through the portal, other networking elements, such as setup of the Private DNS Zone can be handled for us. Note, in an effort to stay on point, I wont be discussing how Private Endpoint/Link routing actually works.

Click the + Private endpoint button and follow the steps for both file and blob subresource types. Pay special attention to the values the defaults select, if you have other networking in the subscription, it can select these components and cause communication issues.

Each private endpoint should link into the privateEndpoints subnet that was created with the Virtual Network.

Remember, it is imperative that the Private Endpoint MUST be deployed in the same region and same subscription as the Virtual Network to which it is being attached to.

More information on Private Endpoint and the reason for the Private DNS Zone here

Update the Function App

Your Function App needs to be updated to ensure it understands that it must get its content over a VNet. Specifically this involves updating Configuration values.

Details on what values should be updated: Configure your function app settings

The one to key on is the WEBSITE_CONTENTOVERVNET setting and ensuring it is set to 1. Note the documentation deploys a Service Bus, we are not doing so here so you can skip related fields.

Be sure to check that each values matches expectation. I skipped over this the first time and ran into problems because of it.

Click Save to apply and before moving on.

Go into General Settings and disable HTTPS Only. We are doing this to avoid dealing with certificate in the soon to be created Application Gateway. In a Production setting you would not want this turned off.

Click Save again to apply the changes.

Next, create a new HttpTrigger Function called HttpPing. Use the source code below:

#r "Newtonsoft.Json"
using System.Net;
using Microsoft.AspNetCore.Mvc;
using Microsoft.Extensions.Primitives;
using Newtonsoft.Json;
public static IActionResult Run(HttpRequest req, ILogger log)
{
return new OkObjectResult("ping");
}
view raw ping.cs hosted with ❤ by GitHub

Again, I am assuming you used Windows for your Plan OS otherwise, you will need to figure out how to get custom code to this function app so you can validate functionality, beyond seeing the loading page.

Once you complete this, break out Postman or whatever and hit the endpoint to make sure it’s working.

Coincidentally, this will also validate that the Storage Account connection is working. Check for common errors like DLL not found or Runtime unreachable or the darn thing just not loading.

Create Private Endpoint for Function App

With the networking features in place to secure the outbound communications from the Function App we need to lock down the incoming traffic. To do this we need disable private access and use Private Endpoint to get a routable private IP for the Function App.

Return to the Networking blade and this time, select Private Endpoints from the screen (shown below):

Using the Express option, create a private endpoint attached to the functionApp subnet in our Virtual Network – choose Yes for Integrate with private DNS zone (this will create the Private DNS zone and allow routing to work). Once complete, attempt to hit your Function App again, it should still work.

Now, we need to disable Public Access to the function app. Do this by returning to the Networking blade of the Function App, this time we will select Access restriction.

Declick the Allow public access checkbox at the top of the page. And click Save.

If you attempt to query the Function App now, you will be met with an error page indicating a 403 Forbidden.

Remember, for most PaaS services, unless an App Service Environment is used, it can never be fully private. Users who attempt to access this function app now will receive a 403 – as the only route left to the service is through our Virtual Network. Let’s add an Application Gateway and finish the job.

Create an Application Gateway

Application Gateway are popular networking routing controls that operate at Layer-7, the HTTP layer. This means they can route based on pathing, protocol, hostname, verb, really any feature of the HTTP payload. In this case, we are going to assign the Application Gateway a Public IP and then call that Public IP and see our Function App respond.

Start by selecting Application Gateway for the list of available services:

On the first page set the following values:

  • Region should be the SAME as the Virtual Network
  • Disable auto-scaling (not recommended for Production scenarios)
  • Virtual Network should be the Virtual Network created previously
  • Select the appGw subnet (App Gateway MUST have a dedicated subnet)

On the second page:

  • Create a Public IP Address so as to make the Application Gateway addressable on the public internet, that is it will allow external clients to call the Function App.

On the third page:

  • Add a backend pool
  • Select App Service and pick you Function App from the list
  • Click Add

On the fourth page:

  • Add a Routing Rule
  • For Priority make it 100 (can really be whatever number you like)
  • Take the default for all fields, but make sure the Listener Type is Basic Site and the Fronend IP Protocol is HTTP (remember we disabled HTTPS Only on the Function App)
  • Select Backend Targets tab
  • For Backend Target select the pool you defined previously
  • Click Add new for Backend Settings field
  • Backend protocol should be HTTP, with port 80
  • Indicate you wish to Override with new host name. Then choose to Pick host name from backend target – since we will let the Function App decide the hostname
  • Click Add a couple times

Finish up and create the Application Gateway.

Let’s test it out

When we deployed the Application Gateway we attached it to a public IP. Get the address of that Public IP and replace the hostname in your query – REMEMBER we must use HTTP!!

If everything is setup properly you should get back a response. Congratulations, you have created a private Azure Function App routable only through your Virtual Network.

Options for SSL

To be clear, I would not advocate the use of HTTP for any scenario, even in Development. I abstained from that path to make this walkthrough easier. Apart from create an HTTPS listener in the Application Gateway, Azure API Management operating in External mode with Developer or Premium SKU (only they support VNet Integration) would be the easiest way of support TLS throughout this flow.

Perhaps another blog post in the future – just APIM takes an hour to deploy so, it is a wait 🙂

Closing Remarks

Private endpoint is designed as a way to secure the flow of network data between services in Azure, specifically it is for high security scenarios where data needs to meet certain regulatory requirement for isolation. Using Private Endpoint for this case, as I have shown, is a good way to approach security without taking on the expense and overhead of an App Service Environment which creates an isolated block within the data center for your networking.

That said, using them for all data in your environment is not recommended. Data, by default, goes over the Azure backbone and stays securely on Microsoft networks so long as the communication is between Azure resources. This is advised for most data scenarios and can free your organization from the cost and overhead of maintaining Private Endpoints and Premium SKUs for apps that make no sense to have such capability.

Understanding Terraform Data Sources in Modules

So over the last week I have been battling an issue in Terraform that truly drove me nuts and I think understanding can help someone else who is struggling with the same issue.

What is a data source?

In Terraform there are two principle elements when building scripts: resources and data sources. Resource is something that will be created by and controlled by the script. A data source is something which Terraform expects to exist. Keep this in mind, as it will be important in a moment.

What is a module?

Modern Infrastructure as Code approaches focus on modules as a way to encapsulate logic and standards which can be reused. It is this approach which underpins the problem I found. That is, when a module is built that needs to look up existing resources to hydrate fields on encapsulated resources.

What is the problem?

The problems a sequence of events like this:

terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "=3.30.0"
}
}
}
provider "azurerm" {
features {}
}
resource azurerm_resource_group this {
name = "rg-terraform"
location = "East US 2"
}
module "storage-account" {
source = "./storage-account"
resource_group_name = azurerm_resource_group.this.name
}
module "image-container" {
source = "./storage-account-container"
storage_account_resource_group_name = azurerm_resource_group.this.name
storage_account_name = module.storage-account.storage_account_name
container_name = "images"
}
view raw main.tf hosted with ❤ by GitHub

Where the problem rears its head is in the source for the image-container

terraform {
}
variable storage_account_resource_group_name {
type = string
}
variable storage_account_name {
type = string
}
variable container_name {
type = string
}
data azurerm_storage_account sa {
name = var.storage_account_name
resource_group_name = var.storage_account_resource_group_name
}
resource azurerm_storage_container container {
name = var.container_name
storage_account_name = data.azurerm_storage_account.sa.name
}
view raw main.tf hosted with ❤ by GitHub

The key here is the data source reference to the azurerm_storage_account. Here, DESPITE the reference to the storage-account module used in the root, Terraform will attempt to resolve the data source within the image-container before ANYTHING, which results in this error:

As you can see, Terraform will NOT wait for the storage-account module to complete before trying to resolve the data source within the container module.

What is the solution?

Frankly, I am not sure I would classify this as a bug so much as “by design” but it still annoying. The way to get around it is to not have ANY data source in your modules that reference components created as part of your root scripts. So for example, our test script would look like this:

terraform {
}
variable storage_account_resource_group_name {
type = string
}
variable storage_account_name {
type = string
}
variable container_name {
type = string
}
resource azurerm_storage_container container {
name = var.container_name
storage_account_name = var.storage_account_name
}
view raw main.tf hosted with ❤ by GitHub

Notice how I have removed the data source and simply used the variable for the storage account name directly? This is all you have to do. This is, I admit, a simple script but, it is something you will want to be thinking about. Where I ran into this problem was defining custom domains and mTLS certificates for API Management,

So that is it, that is what I found. Maybe it is not new and it was something obvious, though I venture otherwise due to the lack of this mention in the Terraform documentation. It might be something HashiCorp considers “works as designed” but I still found it annoying. So if this helps you, let me know in the comments.

Cheers.

Introducing a Redis binding for Azure Functions

One of the things I have been looking at it recent weeks is Event Streaming, or basically democratizing data within a system so that it can be freely accessed by any number of microservices. It is a wonderful pattern and Azure Functions are an ideal platform for implementation. However, over the course of this process I came to realize that while Azure Functions has a bevy of bindings available to it, one that is very clearly missing is for Redis. So I set about building one.

I am pleased to say I have released version 1.1.0 on nuget.org – https://www.nuget.org/packages/Farrellsoft.Azure.Functions.Extensions.Redis/

Under the hood there are two supporting libraries:

  • NewtonSoft Json
  • Stack Exchange Redis

Reading Values

public static async Task<IActionResult> Run(
[Redis(key: "value", Connection = "RedisConnectionString")] string value)
{
}
view raw simple-read.cs hosted with ❤ by GitHub

Here, the binding is reading a string value from Redis (connected using the RedisConnectionString app setting value) with the key of value. The binding is, right now, limited to only reading data from either a Redis string or a Redis list. However, it can support reading C# classes – which are stored as JSON strings and deserializable using Newtonsoft’s Json.NET. For example:

public static async Task<IActionResult> Run(
[Redis(key: "object", Connection = "RedisConnectionString")] Person person)
{
}
view raw object-read.cs hosted with ❤ by GitHub

When we get into reading lists, the internal type can either be string or a normal C# class. For example:

public static async Task<IActionResult> Run(
[Redis(key: "objects", Connection = "RedisConnectionString")] List<Person> people)
{
}
view raw objects-read.cs hosted with ❤ by GitHub

As when reading a single object our of the cache, the underlying value is stored as a JSON string and will be deserialized using Newtonsoft Json.NET.

Writing Values

This is currently the newest use case I have added to the binding, and it, like reading, only supports the string and basic C# types saved using either ICollector<T> or IAsyncCollector<T>, currently out parameter is NOT supported, I plan to add it in the future.

public async Task<IActionResult> Run(
[Redis(key: "value", valueType: RedisValueType.Single, Connection = "RedisConnectionString")] ICollector<string> values)
{
values.Add("testvalue")
}
view raw string-write.cs hosted with ❤ by GitHub

When doing an output (write) case, you must specify the underlying type the value is stored as, right now either Single or Collection. In the above example, the use of Single will invoke JSON serialization logic and store the value given using StringSet. If the given key already has a value, the new value sent through the collector will overwrite the existing value.

When using Collection the underlying code will use List functions against the Redis class, with two potential execution paths. For example, the following code will append JSON strings for objects given:

public async Task<IActionResult> Run(
[Redis(key: "values", valueType: RedisValueType.Collection, Connection = "RedisConnectionString")] ICollector<Person> values)
{
values.Add(new Person { Name = "Name 1" });
values.Add(new Person { Name = "Name 2" });
}

Its important to understand that, the above code will keep adding values to the end of the Redis list. If you want to update values in a list, you need to have your C# object implement the IRedisListItem interface, which will force an Id property. Note this approach is NOT available for strings.

public class Person : IRedisListItem
{
public string Name { get; set; }
public string Id { get; set; }
}
view raw person.cs hosted with ❤ by GitHub

The binding will key off this Id value when it receives a value for the specific Redis key. The one drawback in the current implementation is the entire list has to be pulled, so if you are adding a lot of values for a C# class you will notice performance degradation. Here is an example of this approach being used:

public async Task<IActionResult> Run(
[Redis(key: "values", valueType: RedisValueType.Collection, Connection = "RedisConnectionString")] ICollector<Person> values)
{
values.Add(new Person { Id = "1", Name = "Name 1" });
values.Add(new Person { Id = "1", Name = "Name 2" });
}

In the example above, because the Id value is found in the list, the final value for Name will be Name 2.

The source code for the binding is located at https://github.com/farrellsoft/azure-functions-redis-binding and I am open to taking feedback and I look forward to people having a positive usage experience.

Creating Custom OPA Policies with Azure Policy

Governance is a very important component of a strong security and operational posture. Using Azure Policy, organizations can define rules around how the various resources in Azure may be used. These cover a wide range of granularity from the general “resources may only be deployed to these regions” to “VMs must be of certain SKUs and certain resource tiers are prohibited” to finally, “network security groups must contain a rule allow access over port 22”. Policies are essential to giving operators and security professionals peace of mind that rules and standards are being enforced.

As organizations have moved to adopt Cloud Native technologies like Kubernetes, the governance question continues to come up as a point of concern. Kubernetes resources are, after all, outside the true scope of the Cloud Provider, Azure included. Thus, in many cases, teams rely on each other to avoid any pitfalls, in lieu of proper governance.

Open Policy Agent (https://openpolicyagent.org) is a Graduated CNCF (https://cncf.io) project aimed at providing an agnostic platform for enforcing policy rules across a variety platform, one of which is Kubernetes (a fully intro on this topic is not covered here). Open Policy Agent hooks into the Kubernetes Admission Controller and works to prevent the creation of resources that violate defined rules. To aid in applying this at scale, Microsoft created the Azure Policy for Kubernetes (https://docs.microsoft.com/en-us/azure/governance/policy/concepts/policy-for-kubernetes) feature to allow the provisioning of OPA policies (written in Rego) to be created in an assigned AKS or Arc connected cluster.

The feature is enabled via an add-on to the target AKS cluster. Azure Policy authors have already written a large amount of these policies that you can use for free with an Azure subscription. Support for custom OPA policies is, at this time, still in Preview mode but, it is stable enough that walking through it should prove beneficial.

Enable the Add-on

Support for OPA type policies in AKS is done through an add-on, which must be enabled to support proliferation of the policies. Instructions for enabling this add-on can be done using the following command:

az aks enable-addons --addons azure-policy --resource-group $RG_NAME --name $CLUSTER_NAME

As with ANY addon, you should run a list first to see if its already installed. Be prepared for the enablement process to take a non-trivial amount of time.

Create your Rego Policy

Whether you use OPA in the traditional sense or with Azure Policy you start with creating a ConstraintTemplate. This template is what enables the creation of the custom kind (the name of a specific resource type in Kubernetes) that will enable low level assignment of your policy. Below is a simple ConstraintTemplate which restricts a certain for a target namespace:

apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
name: k8srestricttype
spec:
crd:
spec:
names:
kind: K8sRestrictType
validation:
openAPIV3Schema:
properties:
namespace:
type: string
targets:
– target: admission.k8s.gatekeeper.sh
rego: |
package k8srestrictype
violation[{ "msg": msg }] {
object := input.review.object
object.metadata.namespace == input.parameters.namespace
msg := sprintf("%v is not allowed for creation in namespace %v", [input.review.kind.kind, input.parameters.namespace])
}
view raw template.yaml hosted with ❤ by GitHub

This code is not something I would use in a production setting, but it gets the point across without being overly complicated. The policy registers a violation if within a namespace as indicated by an input parameter namespace, if creation of a certain type of resource is attempted.

VERY IMPORTANT!! Note the use of the openAPIV3Schema block to indicate the supported parameters and their type. Including this is vital, otherwise the addon will not generate the relevant constraint with the parameter value provided.

Once you have created the ConstraintTemplate you should store it a place which is accessible, I recommend, for ease of use, Azure Blob Storage with a public container.

Create the Azure Policy

The Azure Policy team has a wonderful extension for VSCode that can aid in generating a strong starting point for a new Azure Policy: https://docs.microsoft.com/en-us/azure/governance/policy/how-to/extension-for-vscode. The below is the Policy JSON I used for the above policy:

{
"properties": {
"policyType": "Custom",
"mode": "Microsoft.Kubernetes.Data",
"policyRule": {
"if": {
"field": "type",
"in": [
"Microsoft.ContainerService/managedClusters"
]
},
"then": {
"effect": "[parameters('effect')]",
"details": {
"templateInfo": {
"sourceType": "PublicURL",
"url": "https://stopapolicyjx01.blob.core.windows.net/templates/template.yaml&quot;
},
"apiGroups": [
""
],
"kinds": [
"[parameters('kind')]"
],
"namespaces": "[parameters('namespaces')]",
"excludedNamespaces": "[parameters('excludedNamespaces')]",
"labelSelector": "[parameters('labelSelector')]",
"values": {
"namespace": "[parameters('namespace')]"
}
}
}
},
"parameters": {
"kind": {
"type": "String",
"metadata": {
"displayName": "The Kind of Resource the policy is restricting",
"description": "This is a Restriction"
},
"allowedValues": [
"Pod",
"Deployment",
"Service"
]
},
"namespace": {
"type": "String",
"metadata": {
"displayName": "The namespace the restriction will be applied to",
"description": ""
}
},
"effect": {
"type": "String",
"metadata": {
"displayName": "Effect",
"description": "'audit' allows a non-compliant resource to be created or updated, but flags it as non-compliant. 'deny' blocks the non-compliant resource creation or update. 'disabled' turns off the policy."
},
"allowedValues": [
"audit",
"deny",
"disabled"
],
"defaultValue": "audit"
},
"excludedNamespaces": {
"type": "Array",
"metadata": {
"displayName": "Namespace exclusions",
"description": "List of Kubernetes namespaces to exclude from policy evaluation."
},
"defaultValue": [
"kube-system",
"gatekeeper-system",
"azure-arc"
]
},
"namespaces": {
"type": "Array",
"metadata": {
"displayName": "Namespace inclusions",
"description": "List of Kubernetes namespaces to only include in policy evaluation. An empty list means the policy is applied to all resources in all namespaces."
},
"defaultValue": []
},
"labelSelector": {
"type": "Object",
"metadata": {
"displayName": "Kubernetes label selector",
"description": "Label query to select Kubernetes resources for policy evaluation. An empty label selector matches all Kubernetes resources."
},
"defaultValue": {},
"schema": {
"description": "A label selector is a label query over a set of resources. The result of matchLabels and matchExpressions are ANDed. An empty label selector matches all resources.",
"type": "object",
"properties": {
"matchLabels": {
"description": "matchLabels is a map of {key,value} pairs.",
"type": "object",
"additionalProperties": {
"type": "string"
},
"minProperties": 1
},
"matchExpressions": {
"description": "matchExpressions is a list of values, a key, and an operator.",
"type": "array",
"items": {
"type": "object",
"properties": {
"key": {
"description": "key is the label key that the selector applies to.",
"type": "string"
},
"operator": {
"description": "operator represents a key's relationship to a set of values.",
"type": "string",
"enum": [
"In",
"NotIn",
"Exists",
"DoesNotExist"
]
},
"values": {
"description": "values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty.",
"type": "array",
"items": {
"type": "string"
}
}
},
"required": [
"key",
"operator"
],
"additionalProperties": false
},
"minItems": 1
}
},
"additionalProperties": false
}
}
}
}
}
view raw policy.json hosted with ❤ by GitHub

Now, that is a rather long definition so, lets hit on the key points, focusing on the then block:

  • OPA supports many standard parameters, and these are listed in the policy JSON under parameters. Important to understand is that OPA ALWAYS expects these, so we do not need to do anything extra. You may also omit them, and the system does not seem to care.
  • Custom parameters should be placed under the values key. Parameters specified here MUST have a corresponding openAPIV3Spec definition.
  • kind and apiGroups relate directly to concepts in Kubernetes and help declare for what resources and actions against those resources the rule applies.
  • Note the templateInfo key value under then. This indicates where Azure Policy can find the OPA constraint template – that which we created earlier. There are a few different sourceType values available. For this example, I am referencing a URL in the storage account where the template was uploaded
  • Note the mode value (Microsoft.Kubernetes.Data). This value must be used so that the Azure Policy runners know this policy contains an OPA

In the case of our example, we are piggybacking off whatever kind the policy applies to and then indicating a specific namespace to make this restriction. We could achieve the same with the namespaces standard parameter but, in this case, I am using a singular namespace property to demonstrate passing properties.

Once this is created, create the Policy definition in Azure Policy using either the Portal or the Azure CLI, instructions here: https://docs.microsoft.com/en-us/azure/governance/policy/tutorials/create-custom-policy-definition#create-a-resource-in-the-portal

I also recommend giving the new definition a very clear name indicating it is custom, I tend to recommend this regardless of whether or not it’s for Kubernetes or uses OPA, just helps to more easily find them in the portal or when searching with the CLI.

Make the Assignment

As with any policy in Azure Policy, you must assign it to a specific scope for it to take effect. Policy assignments in Azure are hierarchical in nature thus, making the assignment at a higher level affects the lower levels. In MCS, we typically recommend assigning policies to Management Groups in order to create easier maintenance – but you may assign all the way down to Resource Groups.

More information on making assignments: https://learn.microsoft.com/en-us/azure/governance/policy/assign-policy-portal

Once the assignment is made you will have to wait for the AKS addon to pick it up and create the new type and constraint definition on your behalf – my experience has had it take around 15 minutes. At the time of this writing, there is no way to speed it up – even using a trigger-scan command from the AZ CLI does not work.

One critical detail while doing the assignment, ensure that the value provided to the effect parameter is in lower case and either audit (which will map to dryrun in OPA) or deny. In some of the built-in templates Audit and Deny are offered as options. In my experience, the addon gets confused by this.

Validate the assignment

After a period of time run a kubectl get <your CRD type name> and it will eventually return a generated resource of that type representing the policy assignment. Once you see the result there, you can attempt to create an invalid resource in your cluster to validate enforcement.

Something to keep in mind, is the CRD Constraints will NOT report on deny actions, only dryrun (at this time warn is not supported, nor do I see much sense in the team supporting it) enforcements get reported on – this makes sense since with deny in place, invalid resources cannot enter the system.

I also recommend starting with dryrun if your governance process is new, this will give teams time to make change per the policies. Starting with deny can cause work disruptions and less then chances of success.

Debugging the Assignment

One thing I found helpful is to use -o yaml to see what the generated CRD and Constraint look like in the cluster. I used this when I was working with the Engineering team to determine why parameters were not being mapped.

I still recommend building your Rego using the Rego Playground (https://play.openpolicyagent.org/)

Closing

I am deeply intrigued by the use OPA in Kubernetes to enforce governance, as I believe strongly in the promise of governance and its criticality in the DevSecOps movement. I also like what support in Azure Policy means for larger organizations looking to adopt OPA at a wider scale. Combine it with what Azure Arc brings to the table and suddenly any organization in any cloud has the ability to create a single pane of glass to monitor the governance status of their cluster regardless of platform or provider.

Please let me know if you have any questions in the comments.