Common Misconception #1 – Entity Framework Needs a Data Layer

This is the first post in what I hope to be a long running series on common misconceptions I come across in my day to day as a developer and architect in the .NET space though, some of the entries will be language agnostic. The goal is to clear up some of the more common problems I find teams get themselves into when building applications.

A little about me, I am a former Microsoft MVP and have been working as a consultant in the .NET space for close to 15yrs at this point. One of the common tasks I find myself doing is helping teams develop maintainable and robust systems. This can be from the standpoint of embracing more modern architecture such as Event Driven systems or using containers or it can be a modernization of the process to support more efficient workflows that enable teams to deliver more consistent and reliable outcomes while balancing the effort with sustainability.

The first misconception is one which I run across A LOT. And that is the way in which I find teams leveraging Entity Framework.

Entity Framework is a Repository

One of the most common data access patterns right now is the Repository pattern – Link. The main benefit is that it enables developers to embrace the Unit of Work technique which results in simpler more straightforward code. However, too often I see teams build their repository and simple create data access methods on the classes – effectively creating a variant of the Active Record or Provider pattern with the name Repository.

This is incorrect and diminishes much of the value the Repository pattern is design to bring mainly, that operations can work with data in memory as if they were talking to the database and save their changes at the end. Something like this:

Repository patterns works VERY well with web applications and frameworks like ASP .NET because we can SCOPE the database connection (called Context in Entity Framework) to the request allowing our application to maximize the connection pool.

In the above flow, we only talk to the database TWO times despite the operations, everything is done in memory and the underlying framework will handle the details for us. Too often I see code like such:

public async Task<bool> DoWork(IList<SomeItem> items)
{
foreach (var item in items.Where(x => x.Id % 2 == 0))
{
await _someRepo.DeleteItem(item.Id);
}
}
view raw bad-repo.cs hosted with ❤ by GitHub

This looks fairly benign but it is actually quite bad as it machine guns the database with each Id. In a small, low traffic application this wont be a problem but, in a larger site with high volume this is likely to cause bottlenecks, record locking, and other problems. How could this be written better?

// variant 1
public async Task<bool> DoWork(IList<SomeItem> items)
{
// assume _context is our EF Context
foreach (var item in items.Where(x => x.Id % 2 == 0))
{
var it = await _context.FirstOrDefaultAsync(x => x.Id == item.Id);
_context.Remove(it);
}
await _context.SaveChangesAsync();
}
// variant 2
public async Task<bool> DoWork(IList<SomeItem> items)
{
// assume _context is our EF Context
var targetItems = await _context.Items.Where(
x => items.Where(x => x.Id % 2 == 0).Contains(x.Id)).ToListAsync());
foreach (var item in targetItems)
{
_context.Remove(item);
}
await _context.SaveChangesAsync();
}
view raw good-repo.cs hosted with ❤ by GitHub

In general, reads are less a problem for locking and throughout that write operations (create, update, delete) so, reading the database as in Variant 1 is not going to be a huge problem right away. Variant 2 leans on EF for SQL Generation to create a query which gets our items in one shot.

But the key thing to notice in this example is the direct use of the context. Indeed, what I have been finding is I dont create a data layer at all and instead allow Entity Framework to be the data layer itself. This opens a tremendous amount of possibilities as we can then take a building block approach to our service layer.

Services facilitate the Operation

The term “service” is horrendously overused in software engineering as it applies to so many things. In my case, I am using it to describe the classes which do the thing. Taking a typical example application here is how I prefer to organize things:

  • Controller – the controller is the traffic cop determining if the provided data meets acceptable criteria such that we can accept that request. There is absolutely no business logic here HOWEVER, for simple reads we may choose to inject our Context to perform those reads
  • Service – the guts of the application, this contains a variety of services varying in size and types. I try to stick with the Single Responsibility Principle in defining these classes. At a minimum we have a set of facilitators which facilitate a business process (we will cover this next) and other smaller services which are reusable blocks.
  • Data Layer – this is the EF context. Any custom mapping or definitions are written here

The key feature of a facilitator is the call to SaveChanges as this will mark the end of the Unit of Work. By taking this approach we get a transaction for free since the code can validate the data as it places it into the context, instead of waiting for a SQL Exception to indicate a problem.

By taking this approach, code is broken into reusable modules which can be reinjected and reused, plus it is VERY testable. This is an example flow I wrote up for a client:

Here the Process Payment Service is the facilitator and calls on the sub-services (shaded in blue). Each of these gets a context injection but, since the context is scoped each gets the same one. This means everyone gets to work with what is essentially their own copy of the database during their execution run.

The other benefit this approach has is avoid what I refer to as service wastelands. These are generic service files in our code (PersonService, TransactionService, PaymentService, etc) which becoming dumping grounds for methods – I have seen some of these files have upwards of 100 methods. Teams need to avoid doing this because the file becomes so long that ensuring uniqueness and efficiency among the methods becomes an untenable task.

Instead, teams should focus on creating purpose driven services which either facilitate a process or contain core business logic that may be reused in the code base. Combined with using Entity Framework as the data layer, code becomes cleaner and more straightforward.

What are Exceptions?

So, am I saying you should have NO Data Layer ever? No. As with anything this not black and white and there are cases for a data layer of sorts. For example, some queries to the database are too complex to put into a LINQ statement and developers will need to resort to SQL. For these cases, you will want to have a wrapper around the call for both reuse and maintenance.

But, do not take that to mean you need a method to ensure you do not rewrite FirstOrDefault in two or more spots. Of course, if you have a particularly complex LINQ query you might chose to hide it. However, keep in mind the MAIN REASON to hide code is to avoid requiring another person to have certain intimate knowledge of a process to carry out the operation. It is NOT, despite popular opinion, to avoid duplication (that is an entirely separate issue I will discuss later).

Indeed, the reason you should be hiding something is because it is complex in nature and error prone in its implementation such that problems could arise later. A simple Id look up does not fall into this category.

Conclusion

The main point I made here is Entity Framework IS an implementation of the Repository pattern and so, placing a repository pattern around it is superfluous. ASP .NET Core contains methods to ensure the context is scoped appropriately and disposed of with the end of a request. Leverage this and use the context directly in your services and lean on the Unite of Work pattern while treating the Context as your in-memory database. Let Entity Framework take responsibility for updating the database when you are complete.

4 thoughts on “Common Misconception #1 – Entity Framework Needs a Data Layer

  1. Thanks for the very good post, Jason. Using the EF context as the data layer is indeed a good idea. The software architect (and perhaps some developers working with him) of a team I was once on came to this same conclusion. We were using a repository on top of our EF implementation and he realized that this was totally unnecessary and counterproductive. He ripped out all of the repository code and we started using the context directly, through a service layer, in the manner you described.

    Like

Leave a comment