Common Misconception #4 – Duplication is bad

Perspectives on duplication are as varied and wide-ranging as any topic in software development. Newer programmers are often fed acronyms such as DRY (Dont Repeat Yourself) and constantly bombarded with warnings from senior programmers about the dangers of duplication. I can even recall getting these lessons when I was still in academia.

So is duplication actually bad? The simpler answer ‘Yes’, but the real answer is much more nuanced and boils down to one of the most common phrases in software engineering: ‘it depends’. But what does it depend on? How can teams prevent themselves from going over the deep end and introducing leaky abstractions or over complexity all in the name of adhering to DRY?

It is about the Lifecycle

Whenever a block of code is authored it immediately transitions to something which must be maintained. I once heard a senior engineer remark “there is no such thing as new code. There is only legacy code and code which is not yet written”. So, once code is written we must immediately begin discussing its lifecycle.

Code evolves over time and this evolution should be encouraged – something that is actively pursued through attempts to decouple code and promote cohesion. Too often the reason something is “shared” is because we created a Person class in one system and felt that creating a second Person class in another system would be duplicative. However, in making this assumption, developers will unknowingly increase system coupling resulting in a far greater problem. A problem they could avoid if they considered the “lifecycle” of each Person class.

More prominently, this notion gets applied to business logic and, in that case, it is correct. Certain business logic absolutely has a standard lifecycle. In fact, each line of code you write will have a lifecycle, and this what you need to use to decide whether duplicating something makes sense.

An example

When I was working as a Principal at West Monroe Partners some years ago I was assigned to a project in which through a combination of misgivings a multitude of mistakes had been made which hampered team progress and efficiency. One of these was a rather insane plan to share database entities through a NuGet package.

Database entities, in theory, do not change that often once established, but that is theory. More often, and especially as a system is being actively developed, they change constantly – this is especially true in the case of this project which had three active development teams all using the same database. The result was near constant updates across every project whenever a change was made – and failure to do so would often manifest as an error in a deployed environment since Entity Framework would complain the schema expected did not match.

While the team may have had a decent sense to reduce duplication by sharing entities it is a high risk move in complex systems. In the best case, you end up with bloated class definitions and API calls where returned objects may or may not have all fields populated. This becomes even more true if you are approach system design with a microservice based mindset – as each service should contain its own entities (unless you are sharing the DB which is a different problem altogether).

Should all code be segregated then?

The short answer is “No”. Again, we return to point on lifecycle. In fact, this relates to the core principle in Microservice design where services and their lifecycles are independent of each other. Spelt out “no service should be reliant on another service in deployment” – if this rule is broken then the advantages of microservices is effectively lost. The lifecycle of each service must be respected.

It is the same with code. Code lifecycles must be understood and respected. Just because you define two Person class definitions does not mean you have created duplication, even if the definitions are the same. You are giving yourself the ability to change each over time according to system needs.

Some code, logging code or perhaps a common set of POCO classes may need to be shared – this is where I tend to lean on custom NuGet feeds. But, generally this is a last resort as it is easy to go overboard with things like NuGet and fall into the “pad left” problem – where you decompose everything so much that you wind up with an extensive chain of dependencies which need to be revved for a release. Link.

As with most things, there is a necessary balance to strike here and you should not expect to get it right immediately – frankly the same lesson is applied in Microservice where, you never start with Microservices, you create new services as needed.

Why is it a misconception?

I find that the concept of DRY is overused and, more often, taken way too literally. I think we can all agree that what is often meant by DRY is to ensure we dont need to update the same logic in multiple places. DRY is not telling us that have two Person classes is bad. It can be bad but whether that is so is determined by circumstances and is not a hard and fast rule.

The misconception is dangerous because strict adherence to DRY can actually make our code LESS maintainable and sacrifice clarity for reduced keystrokes. As developers, we need to constantly be thinking and evaluating whether centralization and abstraction make sense or if we are doing it because we may be overthinking the problem or taking DRY too literally.

One thought on “Common Misconception #4 – Duplication is bad

Leave a comment