Perhaps I share this hobby with many other programmers, in particular those at Google. I love data, I love having lots of data on various things. I like to use this data to play with new technologies and such. Often the first thing I will do when I am learning a language is attempt to understand database access and the limitations there in. Such was the case with PostgreSQL, an RDBMS engine that I have always had recommended to me. So recently I decided to download it and see how it worked.
Understanding it from a manual perspective was fairly easy. MySQL, in my opinion, was easier to understand, but as I am told Postgres is a more professional RDMBS engine then is MySQL. However, understanding anything by reading is hardly worth anything in the computer industry. The question is: Can you Use what you Know? I have also gotten to the point where small scripts that access the DB and return some results are so trivial that it hardly proves anything to me. No, I need a project of some sort to really work with the database. So an idea came to mind: why not write a word counting bot for an IRC channel. We could, at some point, use this for stat tracking to see who talks a lot. Now, I elected to write this in .NET, I will explain why in the next paragraph.
Given that reading from the connected socket was likely to be faster then the database read/write actions. Given that I was going to be breaking these sentences down by the word and checking if I should update or insert, I felt that by making this application multithreaded would benefit the system. One of .NET strengths is Asynchronous programming and how easy this type of programming is made. Given this set of circumstances I felt this was the natural choice. The next part was to design my databases. As I mentioned, I wanted to test my skills with PostgreSQL. I elected to use pgAdmin III to create the tables, relations, and sequences needed to build the tables. Five tables where created: users, words, channels, saycount, talkcount. The first three are self explanatory, saycount is often a word was said in a given channel. talkcount is simple the number of times a stored user has spoken.
To start this program I needed to connect to IRC. I have done this by hand before in .NET, but it was quite convoluted and required a lot of time. So I scoured the Internet for a pre-existing library and I found Meebey’s SmartIRC Lib for developers. I have to give Meebey credit, using this lib it was very easy to make the connection to IRC and join my test channel. However, my next issue came with getting a connection to my PG Database from .NET.
I had originally planned to use the built-in ODBC Data lib that comes with .NET but despite my efforts I could not get a properly initalized connection. I had been using ConnectionStrings.com for research into PGSQL Connect Strings when I came across a lib called Npgsql, and decided to look it up. This lib reminds me very fondly of the .NET Connector for MySQL as they work with the same abstract base class I imagine.
Using these two tools I easily constructed my databases and set up a connection, now I had to get it working. The first idea I needed to understand was how my initial thread was going to populate and my secondary thread was going to consume that data. I decided to create my own class that held all the information I wanted to take from an IRC Channel message and store it in a type-safe collection using generics. I made this member static so that I could ensure their was only one copy of the variable and to make access to it from the secondary thread easier.
So my idea is to have the secondary thread ALWAYS attempt to consume the first element in the List. If it is not there, an exception is thrown and caught and the secondary thread will sleep as it waits for data to appear in the list that it can use. Keep in mind the main thread is populating this List each time it detects a channel message. So what do I do with this data. Well I created several objects that have a single constructor which is the string that their IDs refer to in the database. I did this to prevent duplicate words (in a channel) from appearing. These constructors look for an ID, if one is present they use it, otherwise they insert and use the newly generated ID. This is how I stored each user/channel/word that I parsed out of the incoming MemberInfo object (the objected added by the Main thread). To attempt to minimize some work for the database – only words greater then four characters are counted.
Much to what I thought when I undertook this process, the implementation was actually quite straightforward and easy. In fact, aside from a slight misunderstanding I had with PG I encountered no problems that were not solved by simple reading. Total project development time was about 2 hours. Once I had the application go live on one of the IRC channels I frequent I was also able to gauge PG speed as a database which looks much faster then it would have been had this been MySQL; thus my friends recommendations appear to be true.