LINQ: A Language within a Language

For those of you who are like me and like to see whats new in programming, in particular Microsoft, you have at least heard of Language Integrated Query or LINQ. For those of you who don’t know what it is, it is basically something that we developers have wanted a way to naturally do with out datasets since the term “Disconnected Dataset” was coined, perhaps even before. And now, in .NET 3.0, we see the addition of LINQ to C# 3.0. LINQ takes new features, in particular Lambda expressions, and uses them to allow programmers to naturally query against all types of complex data types, in particular: XML, Database, Collections, etc. There is a wealth of Beta Documentation available at this point. As for this article I will walk you through the steps I went through to create two basic LINQ applications. The first one is a simple query that queries the list of running processes and filters them and then writes them out using Console.WriteLine. The second is more complex involving two data tables with information from my Anime database.

A small side note: How this all works at a deeper level is a very fascinating, and complex topic of discussion. As opposed to attempting to explain it myself, I instead give you this link to a demo by the creator of C# and lead designer of LINQ, Anders Hejlsberg. Where he explains how this is all broken down by the compiler.

Lets start with the first one:
This is the query we are going to build:
var query = from p in Process.GetProcesses() where p.WorkingSet >= 4 * 1024
select new { p.ProcessName, p.WorkingSet };

So if your like me, you looked at that for the first time and went, “Wow that looks really strange and very cool”. This is a LINQ query, and you guessed right it is very reminiscent of SQL. So lets look at this. The first step is declare a variable to hold what the right hand side of the expression evaluates too. Well we don’t know the type, so what ‘var’ does for us is essentially say “Give me a local variable of the same type as what is on the right”.

Next we start the actual query on the right side. ‘from p in Process.GetProcesses()’ is similar to saying:
foreach ( p in Process.GetProcesses() )’. This is a bit backwards, but you can think of ‘p’ as being the variable you use to reference the columns being extracted throughout the query, it is not available outside the LINQ expression. Next we have the option of filtering the data in this query, using the trust where operator. Notice, however, the difference here from normal SQL, when we reference the column to be filtered by we must precede the “column name” with the variable that we declared using ‘from’.

I want to stop for a moment here, note how I have column in double quotations; I did this intentionally. If you think back to the foreach reference, what the variable declared by ‘from’ is, is a member of the IEnumerable that you are querying. Hence, ‘p’ is an instance of the Process class, thus we have access to all the normal members we would have with a Process object in code. In the case of the filter, we simply ask for the WorkingSet (that is how much memory they are consuming in RAM) and validate it against a number (in this case 4KB).

Now the next small segment is the select, where we define what the objects that go into query will “look like”. That is, the columns we retrieve will have the same names as the properties as the type of ‘p’, so in this case we want the ProcessName and WorkingSet properties. These become the properties in our instance objects as you will see below.

foreach (var item in query)
Console.WriteLine(“{0,-25}{1}”, item.ProcessName, item.WorkingSet);

Looking at this, you should quickly be able to draw the line as to why we can access these particular properties. If you cant, simple refer to the select clause that the query used. Now your probably wondering, well thats good, but what if I want to alias a column, and to that I say, “No Problem, in fact that will be shown in the next section, along with how to join two collections in the query”.

Part 2: An Advanced Example
So in the first part, I demonstrated a very basic LINQ example of querying and filtering a list of currently running processes on a computer. Now we are going to do a more advanced example. I am going to extract from a database on my system a list of Anime Series and a list of stored genres. I am then going to join these tables and produce output showing the series name and its related genre. For the sake of simplicity, I am going to skip the portion that speaks to how to get data from the database into datasets and datatables in .NET, I assume the reader is familiar with this. So lets start with how to make our datatables LINQable.

Since LINQ can only work with objects that implement IEnumerable or IQueryable; DataTable does not implement either of these interfaces so we use a method to create a Queryable object, like so:
var seriesQuery = ds.Tables[“series”].ToQueryable();
var genreQuery = ds.Tables[“genres”].ToQueryable();

With that we can now use these tables in our LINQ query so here is our query. Don’t worry if your confused by it, Ill explain it line by line.
var query = from o in seriesQuery
join g in genreQuery on o.Field(“genre_id”) equals g.Field(“id”)
orderby o.Field(“name”)
select new { name = o.Field(“name”), id = o.Field(“id”), genre = g.Field(“name”) };

Interesting isn’t it, looks almost identical to a query you see done in SQL. What we are doing here is taking the two tables loaded in .NET datasets and joining them and then getting series name, id, genre combinations. Well go through this a bit at a time, but some of it should look familiar from the first example. We again define what table and reference variable to use for access the data coming from the main table (in this case seriesQuery).

Next if you follow what is being done, this line:
join g in genreQuery on o.Field(“genre_id”) equals g.Field(“id”)
should be easily understandable. We are doing in essence the same thing as what is doing using from. Except we are doing some filtering such that the genres line up when we join the schemas. We are also creating the g reference variable for referring to the data in the joined table.

The next line ( orderby o.Field(“name”) ) is one of my favorite features of LINQ. I like the idea of being able to even sort this data along with do selections, joins, and filtering. As you can tell from reading this, it will look in the table data references by ‘o’ and use the Field ‘name’ to sort the data. Ascending and Descending keywords are applicable here, which makes this very flexible.

If you remember in the first example, I mentioned I would show you how to creates column aliases in the returned LINQ data. Simply its a simple expansion of the select line. In the case of this example, the line:
select new { name = o.Field(“name”), id = o.Field(“id”), genre = g.Field(“name”) };
Selects the fields named ‘name’ and ‘id’ in the table referenced by o and the field ‘name’ in the table referenced by ‘g’. Notice the assignment of these fields, these are the aliases that are created, thus this dataset will contain three columns named: name, id, genre. As with normal SQL if we were not to alias certain columns they would simply default to the name in the table. However, we have to alias one of the name columns, otherwise the result is ambiguous.

So the result of this query is a collection of ‘var’ variable which as I mentioned as simply of the type IEnumerable. This attribute allows us to use it in a foreach statement, among other uses. The object contained within the collection are simple objects with properties name, genre, and id.

Part 3: Conclusions
So the question is, where would this be useful? Well while I considered it in disconnected data applications, I always wonder about the level of atomicity that we have. Clearly if you take a large chunk of data and store it in memory you have the possibility that others who attempt to access the data may get a dirty read. Primarily, this would be good to take data, retrieve it, and then manipulate it in projections using LINQ commands as opposed to constantly querying the database. It is definitely something to watch as .NET 3.0 nears its final release and with it VB9 and C#3.

What was really fascinating when learning about this is how it gets broken down by the compiler, as explained in the afore mentioned video. The entire syntax is really abstracted for the purpose of readability and ease of programming, but at its basic level its very functional and takes full use of lambda expressions and type inferencing that are introduced in C# 3.0

Advertisements

One thought on “LINQ: A Language within a Language

  1. Thanks for some quality points there. I am kind of new to online , so I printed this off to put in my file, any better way to go about keeping track of it then printing?

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s