This is it, our final step. This is basically saving our query result from the Analytics application to a DynamoDB where it can easily be queried. For this we will again use Lambda as a highly scalable and easy to deploy mechanism to facilitate this transfer.
Creating your Lambda function
In Part 1, I talked about installing the AWS Lambda templates for Visual Studio Code, if you do a list on your templates using:
dotnet new -l
You will get a listing of all installed templates. A casual observation of this shows two provided Lambda templates that target Kinesis events. However, neither of these will work. I actually thought I had things working with these templates, however, the structure of the event sent from Analytics is very different, but the Records property is the only one that maps, and its all empty objects.
I would recommend starting with the serverless.EmptyServerless template. What you actually need is to use the KinesisAnalyticsOutputDeliveryEvent object which is available in the Amazon.Lambda.KinesisAnalyticsEvents NuGet package (link). Once you have this installed, ensure the first parameter is of type KinesisAnalyticsOutputDeliveryEvent and Amazon will handle the rest. For reference, here is my source code which writes to my DynamoDB table:
Here we are using the AWSSDK.DynamoDbv2 NuGet package to connect to Dynamo (the Lambda role must have access to Dynamo and our specific ARN). We use AttributeValue here to write, essentially, a JSON object to the table.
In my case, I have defined the partition and sorting keys since the data we can expect is a unique combination of a minute based timestamp and symbol; this is a composite key.
As with the Kinesis Firehose write Lambda use the following command to deploy (or update) the Lambda function:
dotnet lambda deploy-function <LambdaName>
Again, the Lambda name does NOT have to match the local name, it is the name will be used when connecting with other AWS Services, like our Analytics application.
Returning to our Analytics Application
We need to make sure our new Lambda function is set up as the destination for our Analytics Application. Recall that, we created a stream when we did our SQL Query. Using the Destination feature, we can ensure these bounded results wind up in the table. Part 2 explains this a bit better.
Running End to End
So, if you followed all three segments you should now be ready to see if things are working. I would recommend having the DynamoDb table Items table open so you can see new rows as they appear.
Start your ingestion application and refresh Dynamo. It may take some time for rows to appear. If you are inpatient, like me, you can use the SQL Editor to see rows as they come across. I also like to have Console output message in the Lambda functions that I can see in CloudWatch.
With any luck you should start seeing rows in your application. If you dont it could be a few things:
- You really want to hit the Firehose with a lot of data, I recommend a MINIMUM of 200 records. If you send less the processing can be a bit delayed. Remember, what you are building is a system that is DESIGNED to handle lots of data. If there is not a lot of data, Kinesis may not be the best choice
- Check that your Kinesis Query is returning results and that you have defined an OUTPUT_STREAM. I feel like the AWS Docs do not drive this point home well enough and seem to imply the presence of a “default” stream; this is not the case, or not in so much as I found.
- Use CloudWatch obsessively. If things are not working, use Console.WriteLine to figure out where the break is. I found this a valuable tool
- Ensure you are using the right types for your arguments. I spent hours debugging my write to Dynamo only to discover I was using the wrong type for the incoming Knesis event
Taking a step back and looking at this end to end product, the reality is we did not write that much code. Between my Lambda’s I wrote perhaps lines of code, total. Not bad given what this application can do and its level of extensibility (Kinesis streams can send to multiple destinations).
This is what I love about Cloud platforms. In the past, designing and building a system like this would take as long as the application that might be using the data. But now, with Cloud, its mostly point and click and the platform handles the rest. Streaming data applications, like this, will only become more and more common as we see more systems move to the Cloud and more business want to gather data in new and unique ways.
Kinesis in AWS, EventGrid in Azure, and Pub/Sub in Google Cloud represent tooling that allows us to take this very valuable, highly complicated applications, and building them in hours instead of weeks. Though, in my view, AWS has the best support for applications like these, though I expect the others to make progress to catch up.
I hope you enjoyed this long example and I hope it gives you ideas for your next application and reveals how easy it is to get a value production Real Time Data application quickly.