Getting Started with Azure Event Grid

One of the most interesting scenarios the Cloud gives rise to is better support for building event driven backends. Such backends are necessary as the complexity and scale of applications grow and developers want to decouple service references. Additionally, cloud providers are raising events from their own services and allow developers to hook into these events, this is most noticeable on AWS.

While this tend is not new on Azure (EventHubs have existed for a while) support for it compared to AWS has lagged. But by introducing EventGrid, Microsoft seems to be positioning themselves to support the same sort of event driven programming as allowed with in Amazon with the added benefit of additional flexibility supported, mainly, by EventGrid.

What is Event Grid?

In the most simplest sense, EventGrid is a router. It allows events to be posted to a topic (similar to a Pub/Sub) and can support multiple subscriptions, each with differing configurations, including what eventType to support and what matching subjects will be sent to subscribers.

In effect, EventGrid allow you to define how you want the various services you are using to communicate. In that way, it can potentially provide a greater level of flexibility, I feel, than AWS, whose eventing generally lacks a filtering mechanism.

Creating an Event Topic

So, the first thing to understand is that the concept of topic will be appear twice. The first is the name of the EventGrid itself, which is a topic itself.

Choose to Add a Resource and search for event grid. Select Event Grid Topic from Microsoft from the returned results.

By default, the name of the topic will match the given name of the Event Grid so, pick wisely if this is for something where the name will need to have meaning.

Once this completes we will need to create a subscription.

Before you create a subscription

When you create a subscription you will be asked to give an endpoint type. This is where matched events to the topics will be sent. If you use a standard Azure service (ie Event Hubs) you can select and go, however, if you use an Azure Function you will need to do something special to ensure the subscription can be created.

If you fail to follow this, you will receive (at the time of this writing) an undefined error in the Portal. The only way to get more information is to use the Azure CLI to create the subscription.

The error is in reference to a validation event that is sent to the perscribed endpoint to make sure it is valid. There is a good guide on doing this here. Essentially, we have to look for a certain event type and return the provided validation code as proof that we own the event endpoint.

I do not believe you have to do this with other Azure service specific endpoints, but you certainly have to do it if your endpoint is going to be of type WebHook.

The destination for this Event Grid subscription is the Azure Function Url.

Regardless of your endpoint type…

You will need to setup the various filtering mechanisms that indicate which events, when received, are passed through your subscription to the endpoint.

Most of these are fairly self explanatory except for the Event Schema field which refers to the structure of the data coming through the topic. It is necessary for the subscription to understand the arrangement so it can be analyzed.

The default is the pre-defined EventGridEvent which looks like this:

[
    {
        "id": "string",
        "eventType": "string",
        "subject": "string",
        "eventTime": "string",
        "data": "object",
        "dataVersion": "string"
    }
]

The key here is the data object which will contain your payload.

I am not sure of the other schemas as I have not explored them, mainly because the .NET EventGrid libraries have good native support for the EventGridEvent schema.

Let’s Send an Event

There are quite a few ways to send events to EventGrids. For any attempt you are going to need your Topic Endpoint which you can find on the Overview section for your Event Grid. This is the Url you will submit new events to for propagation within Azure, it is also the value you will need to get from any Azure service that you want to submit events to the Event Grid.

Disclaimer: At the time of this writing, there was some weirdness in the portal where you needed to use the CLI to route events from Azure services (like Storage) to an EventGrid.

Receiving Events from an Azure Resource
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-event-quickstart?toc=%2fazure%2fevent-grid%2ftoc.json

Within this tutorial you are shown how to use the Azure CLI (via the command shell or through the portal) to create the appropriate subscriptions. Do note that, once created, you will not see this listed in the portal

Send a Custom Event
This is probably the case you came here to read about. Principally, sending custom events involves posting to the Event Grid Topic Url with a schema for the body that matches a subscription.

Doing this is easiest if you use the Microsoft.Azure.EventGrid NuGet library which will expose the the PublishEventAsync method, here is an example:

Selection_001

You can also send to this endpoint manually using the HttpClient library, but I recommend using the NuGet package. To be a better idea of what is in there, here is the source page: https://github.com/Azure/azure-sdk-for-net/tree/psSdkJson6/src/SDKs/EventGrid/DataPlane

Keep in mind that this will send an event using the EventGridEvent schema, which is what I recommend.

How do I test that things are working…

Once you have things in place, I recommend selecting the Azure Function and clicking the Run button. This will enter into a mode where you can see a streaming log at the bottom. This is the stream for the function and is not relegated to just what is passed via testing.

Once you have this up, send an event to your Event Grid Topic and wait for a log message to appear which will tell you things are connected correctly.

What are my next steps?

This gets into a bigger overall point but, in general, I do NOT recommend using the Event Grid Topic Url for Production because it couples you to a Url that cannot be versioned and breaks the identity of your API.

Instead, as should be the case when you use Azure Functions you need to set things up behind an Azure API Management service. This lets you customize what the Url looks like, version, and maintain better control over its usage.

Additionally, in an event driven system you should view events as a way to maximize throughput so users should be posting to this endpoint and the event should fan out.

To put our Event Grid Topic endpoint behind an Azure API Management Route, do this:

  1. Create an API Management service instance (this will take time) and wait for it to activate
  2. Create a Blank API
    • Tip: When you create the API you will be asked to select a Product. Without getting too much into what this is, select Unlimited to make things easy on yourself
  3. Access your API definition via the APIs link navigation bar
  4. Add a POST route –  you can provide whatever values you want to the required fields
  5. Copy the aeg-sas-key from the Access Keys section under your Event Grid Topic and make this a required header for your POST route
  6. Once the route is created selected the Edit icon for the Backend
  7. Select Http(s) endpoint and paste the Event Grid Topic Url, make sure to NOT include the /events at the end of this URL
    Selection_003
  8. Hit Save
  9. Click on your API Operation (POST route) in the selection method to the left
    Selection_004
  10. Select Inbound Processing
  11. Change the Backend line such that it reads with the POST method and the resource is /events
  12. Hit Save and you can click Test to check this. Building the payload is a bit tedious. Use the source here to understand how the JSON should look for the EventGridEvent that I assume you are sending.

Ok so let’s recap.

The first bit of this is to create an Azure API Management service instance and then create a POST path. Azure API Management is a huge topic that I have covered part of in the past. It is an essential tool when building a MicroService architecture in Azure as it allows for unification of different service types (serverless, containerized, traditional, etc) into what appears to be a single unified API.

All requests to Azure Event Grid require the aeg-sas-key so, we make sure that Azure API Management does not allow any requests to the proxy to come through unless that header is at least present. Event Grid will determine its validity.

Once the general route is created we need to tell it where to forward the request it receives. This is simply the Event Grid Topic endpoint that we can get from the Event Grid Topic Overview page in the Azure Portal. However, we do NOT want to take the entire Url as we will need to use Url Rewriting instead of Url Forwarding. So, I recommend taking everything but the trailing /events which we will use in the next section. Be sure to also check Override to allow for text entry into the Url field.

Ok, now lets complete the circle. You need to click Inbound Processing and change the default processing from URL Forwarding to URL Rewrite. There is a bit of a quirk here where you cannot leave the text field for your backend blank. This is where you will want to drop the /events that we omitted from our backend URL.

You will want to use the Test feature to test this. I provided the source code for the Azure Event Grid EventGridEvent class so you can see how the JSON needs to be in test.

Conclusion

Event Driven Programming is a very important and vital concept in enabling high throughput endpoints. The design is inherently serverless and enables developers to tap into more of Azure’s potential and easily declare n number of things that happen when an endpoint is called.

API Management is a vital tool in creating a unified and consistent look interface for external developers to use. There are many tools it comes with including Header enforcement and event event body validation and transformation and it can integrated with an authentication mechanism. If you are going to create a Microservice type API or if you intend to break up a monolith, API Management is a create way to get free versioning and abstraction.

The biggest thing to remember is that Event Grid is geared towards enabling Event Driven scenarios, it is not designed for processing a high number of events, that is the realm of Event Hubs. If you were going to do real time processing, you would feed your data pipeline into Event Hubs and route those events to Event Grid (potentially).

More information here

Advertisements

MVP for another year

It is with great humility that I announce Microsoft’s decision to renew my MVP status for an additional  year. While it is my second renewal, it comes as the first true renewal that I have had since being selected in 2016.

What I mean by that is, after I was selected in 2016 the program’s renewal cycle changed and as part of the change I was grandfathered into the MVP program for 2017 into 2018. This mean’t it would be my accomplishments in 2017 that would dictate whether my MVP status would continue.

I spoke and blogged quite a bit in early 2017 but, shut things down around August to focus on my wedding and honeymoon. What’s more, throughout 2017 I was tasked with a large $4mil web project using NodeJS, AWS, and ReactJS for West Monroe. I was worried as this certainly drew my focus away from what got me my MVP. In addition, I decided to also refocus on the web and away from Xamarin (this as part of an overall decision to focus more on the Cloud side of things).

2018 has also not been easy as the AWS project finishes up and I celebrate the birth of my first child, my son Ethan. But I am committed to finding the balance and have already spoken at two conferences (CodeMash and Chicago Code Camp) and am selected to speak at TechBash and have abstracts out to VSLive.

In the end, my willingness to share my ideas here and the awesome people who have to read what I wrote and even share some of my articles on forums and StackOverflow, helped get that MVP renewal and so, I send out a Thank You to all.

Scratching the Surface of Real Time Apps – Part 3

Part 1
Part 2

This is it, our final step. This is basically saving our query result from the Analytics application to a DynamoDB where it can easily be queried. For this we will again use Lambda as a highly scalable and easy to deploy mechanism to facilitate this transfer.

Creating your Lambda function

In Part 1, I talked about installing the AWS Lambda templates for Visual Studio Code, if you do a list on your templates using:

dotnet new -l

You will get a listing of all installed templates. A casual observation of this shows two provided Lambda templates that target Kinesis events. However, neither of these will work. I actually thought I had things working with these templates, however, the structure of the event sent from Analytics is very different, but the Records property is the only one that maps, and its all empty objects.

I would recommend starting with the serverless.EmptyServerless template. What you actually need is to use the KinesisAnalyticsOutputDeliveryEvent object which is available in the Amazon.Lambda.KinesisAnalyticsEvents NuGet package (link). Once you have this installed, ensure the first parameter is of type KinesisAnalyticsOutputDeliveryEvent and Amazon will handle the rest. For reference, here is my source code which writes to my DynamoDB table:

https://gitlab.com/RealTimeApp/SaveTransactionCount/blob/master/src/SaveTransactionCount/Function.cs

Here we are using the AWSSDK.DynamoDbv2 NuGet package to connect to Dynamo (the Lambda role must have access to Dynamo and our specific ARN). We use AttributeValue here to write, essentially, a JSON object to the table.

In my case, I have defined the partition and sorting keys since the data we can expect is a unique combination of a minute based timestamp and symbol; this is a composite key.

As with the Kinesis Firehose write Lambda use the following command to deploy (or update) the Lambda function:

dotnet lambda deploy-function <LambdaName>

Again, the Lambda name does NOT have to match the local name, it is the name will be used when connecting with other AWS Services, like our Analytics application.

Returning to our Analytics Application

We need to make sure our new Lambda function is set up as the destination for our Analytics Application. Recall that, we created a stream when we did our SQL Query. Using the Destination feature, we can ensure these bounded results wind up in the table. Part 2 explains this a bit better.

Running End to End

So, if you followed all three segments you should now be ready to see if things are working. I would recommend having the DynamoDb table Items table open so you can see new rows as they appear.

Start your ingestion application and refresh Dynamo. It may take some time for rows to appear. If you are inpatient, like me, you can use the SQL Editor to see rows as they come across. I also like to have Console output message in the Lambda functions that I can see in CloudWatch.

With any luck you should start seeing rows in your application. If you dont it could be a few things:

  • You really want to hit the Firehose with a lot of data, I recommend a MINIMUM of 200 records. If you send less the processing can be a bit delayed. Remember, what you are building is a system that is DESIGNED to handle lots of data. If there is not a lot of data, Kinesis may not be the best choice
  • Check that your Kinesis Query is returning results and that you have defined an OUTPUT_STREAM. I feel like the AWS Docs do not drive this point home well enough and seem to imply the presence of a “default” stream; this is not the case, or not in so much as I found.
  • Use CloudWatch obsessively. If things are not working, use Console.WriteLine to figure out where the break is. I found this a valuable tool
  • Ensure you are using the right types for your arguments. I spent hours debugging my write to Dynamo only to discover I was using the wrong type for the incoming Knesis event

Final Thoughts

Taking a step back and looking at this end to end product, the reality is we did not write that much code. Between my Lambda’s I wrote perhaps lines of code, total. Not bad given what this application can do and its level of extensibility (Kinesis streams can send to multiple destinations).

This is what I love about Cloud platforms. In the past, designing and building a system like this would take as long as the application that might be using the data. But now, with Cloud, its mostly point and click and the platform handles the rest. Streaming data applications, like this, will only become more and more common as we see more systems move to the Cloud and more business want to gather data in new and unique ways.

Kinesis in AWS, EventGrid in Azure, and Pub/Sub in Google Cloud represent tooling that allows us to take this very valuable, highly complicated applications, and building them in hours instead of weeks. Though, in my view, AWS has the best support for applications like these, though I expect the others to make progress to catch up.

I hope you enjoyed this long example and I hope it gives you ideas for your next application and reveals how easy it is to get a value production Real Time Data application quickly.

Scratching the Surface of Real Time Apps – Part 2

See Part 1 here

If you followed Part 1, we have a Lambda that sits behind an API Gateway that writes the contents of the payload it receives to an Amazon Kinesis Firehose Delivery Stream. This stream is currently configured to dump its contents, at set conditions, to S3.

It being S3 does not really lend itself to real time processing, though we could use something like Athena and query it, our aim is to see our data changes in real time. For that we want to hook our Firehose stream up to a Data Analytics stream where the principal job is to process “bounded” data.

Planning

The most important thing with this step is having an idea of what you want to look for before you start. Generally, these are going to be time based metrics, but they dont have to be. Once you have established this ideas we can move on to the creation phase.

Create the analytics application

Configure the source of you Analytics Application

Go to your Amazon AWS Console and select Services -> Kinesis. This will bring up your Kinesis Dashboard, you should see the Firehose stream you created previously. Click Create analytics application.

Similar to the Firehose process we give our new stream an application name and click the Create action button.

An analytics application is comprised of three parts: Source, Processor, and Destination. You need to configure these in order. Amazon makes it pretty easy, except for the part on debugging which we will talk about in a bit.

For the Source you will want to connect it to your Firehose Datastream that was created previous, if you are following this tutorial. You can also use a Kinesis Data Stream as well. This part is pretty straightforward.

The important thing to take note of is the name of you incoming In-Application stream, this will serve as the “table”.

In order to save the configuration the application will want to discover the schema from the incoming data. This is done using the Discover Schema button at the bottom, you need to have data flowing through.

As a side note, when I wrote my application with sends data I wrote logic to limit the size of the result set being sent. This helps speed up development and lets you explore other data scenarios you way want to explore (pre-processing). I find this approach is better than truly opening the firehose before you are ready, pun intended.

Configure the Processing of your streaming data

When you think about writing a SQL Query against a database like MySQL you probably dont often think about that data set changing underneath you, but that is what you need to consider for many operations within Analytics, because you are dealing with unbounded data.

The term “unbounded data” specifically applies to doing things like averages, mins, maxs, sums, etc, values that are calculated from the complete set of data. This cant happen with Analytics because you are dealing with a continuous stream of data. So you could never compute the average because it will keep changing. To mitigate this, query results need to operate on bounded data, especially for aggregates.

This bounding can be done a few ways but the most common is the use of a Window. Amazon has some good documentation on this here, as well as the different types. Our example will use Tumbling Window because we will consider the set of data relevant for a given minute, specifically the number of stock shares purchased or sold within a given minute,

https://docs.aws.amazon.com/kinesisanalytics/latest/dev/tumbling-window-concepts.html

For stage 2 (Real Time Analytics) of the Analytics Application you will want to click the Go to SQL Results button which will take you into the SQL Query editor. This is where you define the SQL to extract the metric you want. Remember you, MUST provide criteria to make the query operate on a bounded data set, otherwise the editor will throw an error.

For my application this is the SQL I used:
Selection_006

There is a lot going on here so let’s unpack it.

The first step you need to take is to create a stream that will contain the results of your query. You can KIND OF think of this as a “temp table” from SQL land, but its more sophisticated.

In our example, we defining a STREAM called OUTPUT_STREAM, we will be able to select this in Step 3, when we determine what happens after the query. This stream will feature 3 columns (timesymbol, and shareChange).

Our second step focuses on getting data from the query into the stream, which is done using a Pump. This will run the INSERT statement with each query result and insert the data into the stream.

The final bit is the query that actually does the lifting. Its pretty standard with the exception of the STEP operator in the GROUP BY clause – this is what creates the tumbling window we mentioned before.

Once you have set your query you can click the Save and Run, this will save the query and execute it. You will be much more pleased with the outcome of this if you are sending data to your Firehose so the Analytics application has data it can show.

Connect to a Destination

The final configuration task for the Analytics Application is to specify where the query results are sent. For our example we will use a Lambda. However, doing this with C# can be somewhat confusing so, we will cover it in the next part.

You can specify this configuration by access the Destination section from the Application homepage (below):
Selection_007

Alternatively, you can select the Destination tab from within the SQL Editor. I find this way to be better since you can directly select your output stream as opposed to free typing it.

For both of these options, Amazon has made it very easy and as simple as point and click.

Closing

Congrats. You now have a web endpoint that will automatically scale and write data to a pipe that will store things in S3. In addition, we created an Analytics Application which queries the streamed data to derive metrics based on a window (per minute in this case). We then take these results and pass them to a Lambda which we will cover in Part 3, its going to write the results to a DynamoDB table that we can easily query from a web app and see our results in real time.

Part 3

Scratching the Surface of Real Time Apps – Part 1

Recently, I began experimenting with Amazon Kinesis as a way to delve into the world of real time data applications. For the uninitiated, this is referred to a “streaming” and it basically deals with the ideas of continuous data sets. By using streaming, we can make determinations in real time which is allows a system to generate more value for those wanting insights into their data.

Amazon has completely embraced the notion of streaming within many of their services, supported by Lambda. This notion of being event driven is not a new pattern but, by listening for events from within the Cloud they become much easier to develop and enable more scenarios than ever before.

Understanding Our Layout

Perhaps the most important thing when designing this, and really any Cloud based application, is a coherent plan of what your after and what services you intend to use. Here is a diagram of what we are going to build (well parts of it).

RealTimeApp

 

Our main focus will be on the Amazon components. For the Console App, I wrote a simple .NET Core app which randomly generates stock prices and shares bought. Really you can build anything that will just generate a lot of data for practice.

This data gets written to an API Gateway which allows us to create a more coherent mapping to our Lambda rather than exposing it directly. API Gatways enable a host of useful scenarios such as unifying disparate services behind a single front or starting the breakup of a monolith into microservices.

The Lambda takes the request and writes it to a Kinesis Firehose which is, as it sounds, a wide pipe that is designed to take A LOT of data. We do this to maximize throughput. Its like a queue, but much more than that and its all managed by Amazon and so will scale automatically.

The Firehose than writes the data it receives to another Kinesis Stream, but its an Analytics stream, designed to be queried to see trends and other aspects of the incoming data. This outputs to a stream which is then sent to another Lambda. The thing to remember here is that we are working against a continious data set, so bounding our queries is important.

The final Lambda, takes the query result from analytics and saves it to Dynamo. This gives us our data, time based, that we can query against from a web application.

 

Creating our Kinesis Ingestion Stream

Start off from the AWS Console, under Analytics selecting Kinesis. You will want to click the Create Firehose Delivery Stream.

For the first screen, you will want to give the stream a name and ensure the Source is set to Direct PUT. Click Next.

On the second screen, you will be asked if you want to do any kind of preprocessing on the items. This can be very important depending on the data set you are injesting. In our case, we are not going to do any preprocessing, click Next.

On the third screen you will be asked about Destination. This is where the records you write to Firehose go after they pass through the buffer and into the ether. For our purposes, select S3 and create a new bucket (or use an existing one). I encourage you to explore the others as they can serve other use cases. Click Next.

On the fourth screen, you will be asked to configure various settings. including the all important S3 Buffer Conditions. This is when Firehose will store the data it receives. Be sure to set your IAM Role, you can auto create if you need to. Click Next.

On the final screen click Create Delivery Stream. Once the process is complete we can return and write some code for the Lambda which will write to this stream.

Creating the Ingestion Lambda

There are many ways to get data into Kinesis each of which is geared towards a specific scenario and dependent on the desired ingestion rate. For our example, we are going to use a Http Trigger Lambda written in C# (.NET Core 2.0).

Setup: I am quite fond of Visual Studio Code (download) and it has become my editor of choice. To make this process easier, you will want to run the following command to install the Amazon AWS Lambda templates (note: this assumes you have installed .NET Core SDK to your machine)

dotnet new -i Amazon.Lambda.Templates::*

More information here

Once this is installed we can create our project using the EmptyServerless template, here is the command:

dotnet new lambda.EmptyServerless –name <YourProjectName>

This will create a new directory named after your project with src/ and test/ directories. Our efforts will be in /src/<ProjectName> directory, you will want to make sure you cd here as well as you must be in the project root to run the installed dotnet lambda commands.

Our basic process with this Api call is to receive a payload, via POST, and write those contents to our Firehose stream. Here is the code that I used: https://gitlab.com/RealTimeApp/StockTransactionApi/blob/master/src/StockTransactionApi/PostTransaction.cs

A few things to take note of here:

  • The first parameter is of type object and it needs to match the incoming JSON schema. Amazon will handle the deserialization from string to object for us. Though we end up needing the raw contents anyway.
  • We need to install the AWSSDK.KinesisFirehose (I used version 3.3.5.3) which gives us access to AmazonKinesisFirehoseClient
  • We call PutItemAsync to write to our given Delivery stream (in the case of this code that stream is named input_transactions)
  • I have changed the name of the function handler from FunctionHandler (which comes with the template) to PostHandler. This is fine, but make sure to update the aws-lambda-tools-defaults.json file to ensure function-handler is correct

Once you have your code written we need to deploy the function. Note that I would rather do it from the command line since it avoids the auto creation of the API Gateway path. I have done both and I find I like the clarity offered with doing it in stages. You need to run this command to deploy:

dotnet lambda deploy-function <LambdaFunctionName>

LambdaFunctionName is the name of the function as it appears in the AWS Console. It does NOT have to match what you call it in development, very useful to know.

As a prerequisite to this, you need to define a role for the Lambda to operate within. As with anytime you define an IAM Role, it should only have access to the bare minimum of services that callers will need.

When you run the deploy, you will receive a prompt to pick an applicable role for the newly created function, future deployments of the same Lambda name will not ask this.

Our next step is to expose this on the web, for that we need to create a API Gateway route. As a side note, there is a concept of lambda-proxy which I could not get to work. If you can, great, but I made due with the more standard approach.

Add Web Access

I think API Gateways are one of the coolest and most useful services offered by Cloud providers (API Gateway on Azure and Endpoints on Google). They enable us to tackle the task of API generation in a variety of ways, offer standard features prebuilt, and offer an abstract to let us make changes.

I will be honest, I can never find API Gateway in the Services list, I always have to search for it. Click on it once you find it. On the next page press Create API. Here are the settings you will need to get started:

  • Name: Anything you cant, this doesnt get exposed to the public. Be descriptove
  • Endpoint type: Regional (at this point I have explored the differences between the types)

Click Create and you will be dropped on the API configuration main page.

The default endpoint given is / or the root of the API. Often we will add resources (i.e User, Product, etc) and then have our operations against those resources. For simplicity here we will operate against the root, this is not typical but API organization and development is not the focus of this post.

With / selected click Actions and select Create Method. Select POST and click the checkmark. API Gateway will create a path to allow POST against / with no authentication requirements, this is what we want.

Here is a sample of the configuration we will specify for our API Gateway to Lambda pipe, note that my function is called PostTransaction, yours will likely differ.

Selection_005

Click Save and you will get a nice diagram showing the stages of the API Gateway route. We can test that things are working by clicking Test and sending a payload to our endpoint.

I wont lie, this part can be maddening. I tried to keep this simple to avoid problems. There is good response logging from the Test output you just have to play with it.

So we are not quite ready to call our application via Postman or whatever yet, we need to create a stage. Stages allow us to deploy the API to different environments (Development, QA, Staging, Production). We can actually combine this with Deploy API.

IMPORTANT: Your API changes are NEVER live when you make them, you must deploy, each time. This is important because it can be annoying figuring out why your API doesnt work only to find you didnt deploy your change, as I did.

When you do Actions -> Deploy Api you will be prompted to select a Stage. In the case of our API, we have no stages, we can create one, I choose Production, you can choose whatever.

Once the deploy finishes we need to get the actual Url that we will use. We can go to Stages in the left hand navigation and select our stage, the Url will be displayed along the top (remember to append your resource name if you didnt use /). This is the Url we can use to call our Lambda from Postman.

The ideal case here is that you create an app that sends this endpoint a lot of data so we can write a large amount to Firehose for processing.

Debugging

In this part, we did not write that much code and the code we did write could be tested rather easily within the testing harnesses for both API Gateway and Lambda. One thing that is useful is to use Console.WriteLine as CloudWatch will capture anything in Logs from STDOUT.

Next Steps

With this we have an endpoint applications can send data to and it will get written to Firehose. By using Firehose we can support ingesting  A LOT of data and with Lambda we get automatic scaling. We abstract the Lambda path away using API Gateway which lets us also include other API endpoints that could be pointing at other services.

In our next part we will create a Kinesis Analytics Data Stream which allow us to analyze the data as it comes into the application – Part 2

Exploring Cassandra

Continuing on with my theme this year of learning the integral parts of High Performing systems, I decided to explore Cassandra, a multi node NoSQL database that acts like a RDBMS. Cassandra is a project that was born out of Facebook and is currently maintained by the Apache Foundation. In this post, we are going to scratch the surface of this tool. This is not an indepth look at the project, but rather a “get it to work” approach.

At its core, Cassandra is designed to feel like an RDBMS but provide the high availability features that NoSQL is known for, attempting to bridge the best of both worlds. The query language used internally is called CQL (Cassandra Query Language). At the most basic level, we have keyspaces (one per node) and within those keyspaces we have tables. The big selling point of Cassandra is its ability to quickly replicate data between its nodes allowing it to automatically fulfill the tenant of Eventual Consistency.

For my purposes I wanted to see how fast I could read and write with a single node using .NET Core. This is mostly to understand the general interactions with the database from .NET. I decided to use Docker to standup the single node cluster and as such generated a Docker compose file so I could run a CQL script at startup to create my key space and table.

The Docker File

I started off trying to use the public cassandra image from Docker Hub but I found that it does not support the entry point concept and required me to create the key space and table myself. Luckily I found the dschroe/cassandra-docker image which extends the public Cassandra image to support this case.

I wanted to just write a bunch of random names to Cassandra so I create a simple keyspace and table. Here is the code: https://gitlab.com/xximjasonxx/cassandra-sandbox/blob/master/schema/init.cql

I worked this into a simple Docker Compose file so that I could do all of the mounting and mapping I needed. You can find the compose file in the GitLab repo I reference at the end. Once you have it, simply run docker-compose up and you should have a Cassandra database with a cluster called names up and running. Its important to inspect the output, you should see the creation of the keyspace and table in the output.

CassandraCSharpDriver

This is the NuGet package I used to handle the connection with Cassandra. At a high level, you need to understand that a cluster can have multiple keyspaces, so you need to specify which one you want to connect to. I found it useful to view the keyspaces as databases since you will see the USE command with them. They are not databases per say, just logical groupings that can have different replication rules.

This connection creates a session which will allow you to manipulate the tables within the keyspace.

Insert the Data

I have always been amused by the funny names Docker will give containers when you dont specify a name. It turns out someone created an endpoint which can return you the names: https://frightanic.com/goodies_content/docker-names.php. This delighted me to no end and so I used this for my data.

You can find the logic which queries this in this file: https://gitlab.com/xximjasonxx/cassandra-sandbox/blob/master/Application.cs

First we want to get the data into Cassandra, I found the best way to do this, especially since I am generating a lot of names is to use the BEGIN and APPLY BATCH wrappers for INSERT commands.

By doing this you can insert however many you like and have little chance of draining the cluster connections (I did this when I did an insert per approach).

Read the Data

When you perform Execute against a Session the result is a RowSet which is enumerable and can be used with LINQ. The interesting thing I found here is that while I specify my column names as firstName and lastName when I get back the row from RowSet the columns are named in all lower case: firstname and lastname. By far this was the most annoying part when I was building this sample.

Delete the Data

I do not know why but Cassandra is very weird about DELETE SQL statements. If I had to venture a guess, its likely restricted due to the replication that needs to happen to finalize the delete operation. It also appears that, if you want to delete, you have to provide a condition, the DELETE FROM <table> to delete everything does not appear to be supported, again I think this is due to the replication consideration.

Instead you have to use TRUNCATE to clear the table.

Thoughts

Cassandra is a big topic and I need more than a partial week to fully understand it but, overall my initial impression is good and I can see its use case. I look forward to using it more and eventually using it in a scenario where its needed.

Here is the code for my current Cassandra test, I recommend having Docker installed as it allows this code to be completely self contained, cheers.

https://gitlab.com/xximjasonxx/cassandra-sandbox

Speaking at TechBash

Proud to announce that I have accepted the offer to speak at TechBash in Pennsylvania. I will be giving my talk on Blazor, which is Microsoft’s WebAssembly SPA Framework (recently blogged here) and I will also be debuting my newest talk, Building Serverless Microservices with Azure a topic I have touched on here as part of a larger series.

This will be my first time at TechBash and I have heard great things from past speakers, so I very much look forward to this event. I hope to see you there