Warning: This is a fairly lengthy one – if you are just here for the code: jfarrell-examples/private-endpoint-terraform (github.com) – Cheers.
In my previous post I talked about the need for security to be a top consideration when building apps for Azure, or any cloud for that matter. In that post, I offered an explanation of the private endpoint feature Azure services can use to communicate within Virtual Network resources (vnet).
While this is important, I decided to take my goals one step further by leveraging Terraform to create this outside of the portal. Infrastructure as Code (IaC) is a crucial concept for teams that wish to ensure consistency and predictability across environments. While there exists more than one operating model for IaC the concepts are the same:
- Infrastructure configuration and definition should be a part of the codebase
- Changes to infrastructure should ALWAYS be represented in scripts which are run continuously to mitigate drift
- There should be a defined non-manual action which causes these scripts to be evaluated against reality
Terraform is a very popular tool to accomplish this, for a number of reasons:
- Its HashiCorp Configuration Language (HCL) tends to be more readable than the formats used by ARM (Azure Resource Manager) or AWS Cloud Foundation
- It supports multiple cloud both in configuration and in practice. This means, a single script could manage infrastructure across AWS, Azure, Google Cloud, and others.
- It is free
However, it is also to note Terraform’s weakness in being a third party product. Neither Microsoft or others officially support this tool and as such, their development tends to be behind native tooling for specific platforms. This means, certain bleeding edge features may not be available in Terraform. Granted, one can mitigate this in Terraform by importing a native script into the Terraform file.
All in all, Terraform is a worthwhile tool to have at one’s disposal given the use cases it can support. I have yet to observe a situation in which there was something a client was relying on that Terraform did not support.
How to Private Endpoints Work?
Understanding how Private Endpoints work in Azure is a crucial step to building them into our Infrastructure as Code solution. Here is a diagram:
In this example, I am using an Azure App Service (Standard S1 SKU) which allows me to integrate with a subnet within the vnet. Once my request leaves the App Service it arrives at a Private DNS Zone which is linked to the vnet (it shown as part of the vnet, that is not true as its a global resource. But for the purposes of this article we can think of it as part of the vnet.
Within this DNS Zone we deploy an A Record with a specific name matching the resource we are targeting. This gets resolved to the private IP of a NiC interface that effectively represents our service. For its part, the service is not actually in the vnet, rather it is configured to only allow connections from the private endpoint. In effect, a tunnel is created to the service.
The result of this, as I said in the previous post, your traffic NEVER leaves the vnet. This is an improvement over the Service Endpoint offering which only guarantees workloads will never leave the Azure backbone. That is fine for most things but, Private Endpoints offer an added level of security for your workloads.
Having said all that, let’s walk through building this provisioning process in Terraform. For those who want to see code this repo contains the Terraform script in its entirety. As this material is for education purposes only, this code should not be considered production ready.
Below is the definition for App Service and the Swift connection which supports the integration.
Create a Virtual Network with 3 subnets
Our first step, as it usually is with any secure application, create a Virtual Network (vnet). In this case we will give it three subnets. I will take advantage of Terraform’s module concept to enable reuse of the definition logic. For the storage and support subnets we can use the first module shown below, for the apps we can use the second, as its configuration is more complicated and I have not taken the time to unify the definition.
Pay very close attention to the enforce properties. These are set in a very specific way to enable our use case. Do not worry though, IF you make a mistake the error messages reported back from ARM are pretty helpful to make corrections.
Here is an example of calling these modules:
One tip I will give you for building up infrastructure, while Azure documentation is very helpful, for myself I will create the resource in the portal and choose the Export Template option. Generally, its pretty easy to map the ARM syntax to Terraform and glean the appropriate values – I know the above can seem a bit mysterious if you’ve never gone this deep.
Create the Storage Account
Up next we will want to create our storage account. This is due to the fact that our App Service will have a dependency on the storage account as it will hold the Storage Account Primary Connection string in its App Settings (this is not the most secure option, we will cover that another time).
I generally always advise the teams I work with to ensure a Storage Account is set to completely Deny public traffic – there are just too many reports of security breaches which start with a malicious user finding sensitive data on a publicly accessible storage container. Lock it down from the start.
One piece of advice, however, make sure you add an IP Rule so that your local machine can still communicate with the storage account as you update it – it does support CIDR notation. Additionally, the Terraform documentation notes a property virtual_network_subnet_ids in the network_rules block – you do NOT need this for what we are doing.
Now that this is created we can create the App Service.
Create the App Service
Our App Service needs to be integrated with our vnet (reference the diagram above) so as to allow communication with the Private DNS Zone we will create next. This is accomplished via a swift connection. Below is the definition used to create an Azure App Service which is integrated with a specific Virtual Network.
Critical here is the inclusion of two app settings shown in the Terraform:
- WEBSITE_DNS_SERVER set to 188.8.131.52
- WEBSITE_VNET_ROUTE_ALL set to 1
This information is rather buried in the above link and it took me effort to find it. Each setting has a distinct purpose. WEBSITE_DNS_SERVER indicate where outgoing requests should look to for name resolution. You MUST have this value to target the Private DNS Zone linked to the vnet. The WEBSITE_VNET_ROUTE_ALL setting tells the App Service to send ALL outbound calls to the vNet (this may not be practical depending on your use case).
For those eagle eyed readers, you can see settings for an Event Grid here. In fact, the code shows how to integrate Private Endpoints with Azure Event Grid, the technique is similar. We wont cover it as part of this post, but its worth understanding.
Create the Private DNS Rule
Ok, this is where things start to get tricky, mainly due to certain rules you MUST follow to ensure the connection is made successfully. What is effectively going to happen is, our DNS Zone name is PART of the target hostname we need to match. The match will then resolve to the private IP of our NiC card (part of the private endpoint connection).
Here is the definition for the storage DNS Zone. The name of the zone is crucial, as such I have included how the module is called as well.
Ok there is quite a bit to unpack here, let’s start with the name. The name here is mandatory. If your Private Endpoint will target a Storage Account the name of the DNS Zone MUST be privatelink.blob.core.windows.net. Eagle eyed readers will recognize this URL as the standard endpoint for Blob service within a Storage account.
This rule holds true with ANY other service that integrates with Private Endpoint. The full list can be found here: Azure Private Endpoint DNS configuration | Microsoft Docs
A second thing to note in the call is the structure of the value passed to the vnet_id parameter. For reasons unknown, Terraform did NOT resolve this based on context, so I ended up having to build it myself. You can see the usage of the data “azurerm_subscription” block in the source code. All it does is give me a reference to the current subscription so I can get the ID for the resource Id string.
Finally, notice that, following the creation of the Private DNS Zone, we are linking our Vnet to it via the azurerm_private_dns_zone_virtual_network_link resource. Effectively, this informs the Vnet that it can use this DNS Zone when routing calls coming into the network – this back to the flags we set on the Azure App Service.
Now we can create the Private Endpoint resource proper.
Create the Private Endpoint
First, I initially thought that you had to create one Private Endpoint per need however, later reading suggests that might not be the case – I have not had time to test this so, for this section, I will assume it is one per.
When you create a private endpoint the resource will get added to your resource group. However, it will also prompt the creation of a Network Interface resource. As I have stated, this interface is effectively your tunnel to the resource connected through the Private Endpoint. This interface will get assigned an IP consistent with the CIDR range of the subnet specified to the private endpoint. We will need this to finish configuring routing within the DNS Zone.
Here is the creation block for the Private Endpoint:
I am theorizing you can specify multiple private_service_connection blocks, thus allowing the private endpoint resource to be shared. However, I feel this might make resolution of the private IP harder. More research is needed.
The private_service_connection block is critical here as it specifics which resource we are targeting (private_connection_resource_id) and what service(s) (group(s)) within that resource we specifically want access to. For example, in this example we are targeting our Storage Account and want access to the blob service – here is the call from the main file:
The key here is the output variable private_ip which we will use to configure the A record next. Without this value, requests from our App Service being routed through the DNS Zone will not be able to determine a destination.
Create the A Record
The final bit here is the creation of an A Record in the DNS Zone to give a destination IP for incoming requests. Here is the creation block (first part) and how it is called from the main Terraform file (second part).
It is that simple. The A Record is added to the DNS Zone and its done. But LOOK OUT back to the naming aspect again. The name here MUST be the name of your service, or at least the unique portion of the URL when referencing the service. I will explain in the next section.
This is less obvious with the storage account than it is with Event Grid or other services. Consider what your typical storage account endpoint looks like:
Now here is the name of the attached Private DNS Zone: privatelink.blob.core.windows.net
Pretty similar right? Now look at the name of the A Record – it will be the name of your storage account. Effectively what happens here is the calling URL is mystorageaccount.privatelink.blob.core.windows.net. But yet, the code we deploy can still call mystorageaccount.blob.core.windows.net and work fine, why? The answer is here: Use private endpoints – Azure Storage | Microsoft Docs
Effectively, the typical endpoint gets translated to the private one above which then gets matched by the Private DNS Zone. The way I further understand it is, if you were calling this from a peered Virtual Network (on-premise or in Azure) you would NEED to use the privatelink endpoint.
Where this got hairy for me was with Event Grid because of the values returned relative to the values I needed. Consider the following Event Grid definition:
The value of the output variable eventgrid_topic_name is simply the name of the Event Grid instance, as expected. However, if you inspect the value of the endpoint you will see that it incorporates the region into the URL. For example:
Given the REQUIRED name of a DNS Zone for the Event Grid Private Endpoint is privatelink.eventgrid.azure.net my matched URL would be: someeventgrid.privatelink.eventgrid.azure.net which wont work – I need the name of the A Record to be someeventgrid.eastus-1 but this value was not readily available. Here is how I got it:
It is a bit messy but, the implementation here is not important. I hope this shows how the construction of the Private Link address through the DNS zone is what allows this to work and emphasizes how important the naming of the DNS Zone and A Record are.
I hope this article has shown the power of Private Endpoint and what it can do for the security of your application. Security is often overlooked, especially with the cloud. This is unfortunate. As more and more organizations move their workloads to the cloud, they will have an expectation for security. Developers must embrace these understandings and work to ensure what we create in the Cloud (or anywhere) is secure by default.