RAG (Retrieval Augmentation Generation) refers to the process of using a custom data with GPT queries. The goal being to “augment” existing results with custom data to make GPT responses more appropriate for specific scenarios, ex. “How much was our EBITA in the last quarter?”
SDKs like Semantic Kernel aim to make doing this easier as they can enable a GPT-like chat experience against data sources which may present data in a way that is not consistent with what GPT typically wants.
Checking my Spending
For this example, I wanted to take a data dump from Monarch Money of all 6,000 transactions that I have logged to the platform and “chat” to ask about certain spending habits. The structure of this data is relatively simple:
Merchant: The merchant which processed the transaction
Date: The date the transaction occurred
Amount: The amount of the transaction
Category: The category of the transaction
As you can see, this is highly structured data. Originally I thought about putting it into Azure AI Search but, shortly thereafter it became clear that unless we can do keyword extraction or semantic meaning, AI Search is not good for this sort of data. So what to do?
Storing it in Cosmos
I decided to create a simple Azure Data Factory project to move the data from the CSV file into Cosmos. I created a collection called byCategory under a database called Transactions. This is part of another experiment I am doing with highly read data whereby the data is duplicated so I can specify different partition keys for the data, more on that, hopefully, in the future.
Now the issue here is, there is no way for OpenAI to query this data directly. And while REST calls do allow a file to be passed that can be referenced in the SYSTEM message, I would quickly overrun my token allowance. So, I needed a way to allow a natural chat format that would then translate to a Cosmos Query. Semantic Kernel Planner to the rescue.
Planner’s are just amazing
As I detailed here, Semantic Kernel contains a construct known as Planner. Planner can reference a given kernel and, using the associated ChatCompletion model from OpenAI deduce what to call and in what order to carry out the request by understanding code through the Description attribute. It really is wild watching the AI construct itself from modules to carry out an operation.
So in this case, we want to allow the user to say something like this:
How much did I spend on Groceries in June 2024?
And have that translate to a Cosmos query to bring back the data.
To begin, I created the CosmosPlugin as a code based plugin in the project. I gave it one initial method which, as shown, perform a queries to gather the sum of transactions for a category over a time range.
Now the insane thing here is, going back to our sample request:
How much did I spend on Groceries in June 2024?
The Planner is going to use the LLM model to determine that Groceries is a category and that the start date 2024-06-01 and end date 2024-06-30 is needed, which blows my mind. It knows this because it is reading all of the Description attributes of the parameters and method.
Once this is done, the rest is simple – we execute our query, and the result is returned. Now the issue I have is, by itself I would just back a number, saying 142.31. Which, while correct, is not user friendly. I wanted to format the output.
Chaining Plugins
I created a Prompt Plugin called FormatterPlugin and gave it a method FormatCategoryRequestOutput. Prompt plugins do NOT have any C# code, instead they specify various data values to send to the LLM model, including the prompt.
You can see the use of the Handlebars syntax to pass values from previous agents into the plugin. These need to match the values specified in config.json. Notice again, the use of a description field to allow SK to figure out “what” something does or represents.
Using this, our previous query would return something like this:
You spent $143.01 on Groceries in June 2024
That is pretty cool you have to admit. With relatively little effort I can now support a chat experience against custom data. This type of functionality is huge for clients as it now allows them to ask for certain bits of data.
Code: https://github.com/xximjasonxx/SemanticKernel/tree/main/TransactionChecker
Next Steps
To finish this sample off, I want to introduce a prompt plugin that runs against the request to convert natural idioms into functional bits. For example, saying something like:
How much did I spend last month?
Would result in an error because the LLM cannot decipher what is meant by “last month”. You would need something to return a result contains the start and end date for the “last month” or “last year”.
I am also concerned about the number of requests you would have to write to support a complex case. I always understood the promise of GPT to not need that code as it can “figure it out”. More research and testing is needed here.
One thought on “Using Semantic Kernel to perform RAG Queries”