Understanding Terraform Data Sources in Modules

So over the last week I have been battling an issue in Terraform that truly drove me nuts and I think understanding can help someone else who is struggling with the same issue.

What is a data source?

In Terraform there are two principle elements when building scripts: resources and data sources. Resource is something that will be created by and controlled by the script. A data source is something which Terraform expects to exist. Keep this in mind, as it will be important in a moment.

What is a module?

Modern Infrastructure as Code approaches focus on modules as a way to encapsulate logic and standards which can be reused. It is this approach which underpins the problem I found. That is, when a module is built that needs to look up existing resources to hydrate fields on encapsulated resources.

What is the problem?

The problems a sequence of events like this:

terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "=3.30.0"
}
}
}
provider "azurerm" {
features {}
}
resource azurerm_resource_group this {
name = "rg-terraform"
location = "East US 2"
}
module "storage-account" {
source = "./storage-account"
resource_group_name = azurerm_resource_group.this.name
}
module "image-container" {
source = "./storage-account-container"
storage_account_resource_group_name = azurerm_resource_group.this.name
storage_account_name = module.storage-account.storage_account_name
container_name = "images"
}
view raw main.tf hosted with ❤ by GitHub

Where the problem rears its head is in the source for the image-container

terraform {
}
variable storage_account_resource_group_name {
type = string
}
variable storage_account_name {
type = string
}
variable container_name {
type = string
}
data azurerm_storage_account sa {
name = var.storage_account_name
resource_group_name = var.storage_account_resource_group_name
}
resource azurerm_storage_container container {
name = var.container_name
storage_account_name = data.azurerm_storage_account.sa.name
}
view raw main.tf hosted with ❤ by GitHub

The key here is the data source reference to the azurerm_storage_account. Here, DESPITE the reference to the storage-account module used in the root, Terraform will attempt to resolve the data source within the image-container before ANYTHING, which results in this error:

As you can see, Terraform will NOT wait for the storage-account module to complete before trying to resolve the data source within the container module.

What is the solution?

Frankly, I am not sure I would classify this as a bug so much as “by design” but it still annoying. The way to get around it is to not have ANY data source in your modules that reference components created as part of your root scripts. So for example, our test script would look like this:

terraform {
}
variable storage_account_resource_group_name {
type = string
}
variable storage_account_name {
type = string
}
variable container_name {
type = string
}
resource azurerm_storage_container container {
name = var.container_name
storage_account_name = var.storage_account_name
}
view raw main.tf hosted with ❤ by GitHub

Notice how I have removed the data source and simply used the variable for the storage account name directly? This is all you have to do. This is, I admit, a simple script but, it is something you will want to be thinking about. Where I ran into this problem was defining custom domains and mTLS certificates for API Management,

So that is it, that is what I found. Maybe it is not new and it was something obvious, though I venture otherwise due to the lack of this mention in the Terraform documentation. It might be something HashiCorp considers “works as designed” but I still found it annoying. So if this helps you, let me know in the comments.

Cheers.

Leave a comment