I always believe in reflecting on one’s design and always trying to get a better understanding of the problem. In this sample, there are areas that are rather obvious for improvement.
Do you really need that queue?
In Part 1, I discussed a Timer job I had created which fed data into a Service Bus queue. This data is summarily dequeued in the next job and its data written off to Event Hub (one name at a time) and Blob storage (where the raw JSON can be stored). So I wondered if I actually needed to have that queue.
The answer is yes, here is why – the way the bindings work is it happens not when the method returns or data is written to the out parameter, it happens at the very start. Since our starter here is a Timer there is no incoming data to suggest what the id value of the outgoing blob will be.
So the queue really one serves the purpose to create an input with the id value embedded in it so the binding can work, which is why the blob return works for the Dequeue method but cannot work for the trigger.
What I can do is change the code so it looks like this:
As you an see, we moved the Event Hub code to the Timer function so the purpose of the Dequeue method is literally just to write the blob. I honestly dont like this even though it make more sense. I just dislike having an Azure function that is so simplistic that I feel like that functionality should just exist somewhere else.
What about storing Time Series data?
Initially I thought Redis might make a lot of sense since I cam easily expire old aggregate data – keep in mind, often the historical data for aggregates are not important as they can be derived from raw historical data sources. Further, storing excess data for systems that are high volume adds to cost. Work with your teams and leaders to determine how much historical aggregate data makes the most sense.
Azure does offer a platform know as Time Series Insights which is designed for MASSIVE scale, usually output from an IoT style platform where you are gathering telemetry data from all over the world. Such a thing would dwarf what we are after with this.
Storing data for these systems is always a challenge. In this situation the best solution is to write a Timer Azure function that deletes data from Cosmos to ensure ONLY the data for the time period that makes the most sense is kept. Again, this data can be derived again from historical data sources.
I was amazed at how easy this process was. If you really focus on Azure Functions being the glue in a distributed process it is amazing how much functionality you can achieved. I think I wrote around 40 lines of code in the backend, and I got so much.
When you start to work on real time or streaming data platforms it really is important to have conversations, discuss your data needs, and try new technologies. The simple truth is, you can pay for the top tier Azure SQL database and likely handle the volume but, your costs will be enormous. Understanding other options and patterns can help select a strategy that not only works but is cost effective.
I hope you have enjoyed this series. Good lucking with Azure and everything else.