Abstraction vs Duplication. Why DRY is bad advice

In this post, I want to challenge the DRY mantra.


Don’t Repeat Yourself!

This is something that we are fed on a seemingly continual basis.

No matter how you came into software engineering, whether through a computer science degree, or through less formal training, you have probably heard many people exclaim that we should not repeat ourselves.

The DRY principle is stated as

“Every piece of knowledge must have a single, unambiguous, authoritative representation within a system”.

If for some reason, this has bypassed you, or if you haven’t read the book, then I would encourage you to checkout The Programatic Programmer, by Andrew Hunt and David Thomas.


There are some problems though with the way that DRY is often interpreted that I want to highlight.

It’s something that I see come up frequently to the detrement of good software design. It’s something that can cause more issues than it solves despite coming from a place of good intention.

If we go back to the wikipedia definition of DRY, we can see it talks about DRY vs WET solutions

Violations of DRY are typically referred to as WET solutions, which is commonly taken to stand for either “write everything twice”, “we enjoy typing” or “waste everyone’s time”.

This sounds fine in theory, but the problem is when we seek to avoid to “write everything twice”, sometimes this can become never write anything twice. And this is where I think the problem lies.

Instead of us seeking to avoid encoding the same knowledge in more than one place, it often manifests itself as never write the same or similar looking code twice.

And this causes problems when we have similar looking code, similar business processes in our code, similiar sequences of method or funciton calls that are only similar, not the same.

In my view, attempting to coerce them into some common place, so that we don’t have similar looking code, or the same line of code in two different places is the wrong thing to do. Far too often I see developers prematurely create abstractions out of fear of having duplicated code.

The problem is that these abstractions are often very weak abstractions, or only appear to be a meaningful abstraction at surface level.

I want to look at a couple of examples of this in action, firstly in application code, and then in test code.

Here is an example of some application code that certainly suffers from over abstraction, of being too DRY.

package com.seriouscompany.business.java.fizzbuzz.packagenamingpackage.impl.strategies;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import com.seriouscompany.business.java.fizzbuzz.packagenamingpackage.impl.strategies.adapters.LoopContextStateRetrievalToSingleStepOutputGenerationAdapter;
import com.seriouscompany.business.java.fizzbuzz.packagenamingpackage.interfaces.loop.LoopContextStateRetrieval;
import com.seriouscompany.business.java.fizzbuzz.packagenamingpackage.interfaces.loop.LoopPayloadExecution;
import com.seriouscompany.business.java.fizzbuzz.packagenamingpackage.interfaces.strategies.OutputGenerationStrategy;

@Service
public class SingleStepPayload implements LoopPayloadExecution {

	private final OutputGenerationStrategy _outputGenerationStrategy;

	@Autowired
	public SingleStepPayload(final OutputGenerationStrategy _outputGenerationStrategy) {
		super();
		this._outputGenerationStrategy = _outputGenerationStrategy;
	}

	@Override
	public void runLoopPayload(final LoopContextStateRetrieval stateRetrieval) {
		final LoopContextStateRetrievalToSingleStepOutputGenerationAdapter adapter =
				new LoopContextStateRetrievalToSingleStepOutputGenerationAdapter(stateRetrieval);
		this._outputGenerationStrategy.performGenerationForCurrentStep(adapter);
	}

}

 

This is from a version of the classic FizzBuzz problem.

It’s intended to poke fun at the common ‘enterprise’ Java code that exists, but it’s not too far removed from reality. It has all of the hallmarks of developers being too eager to generalise, to introduce abstractions.

Here’s an example of the same kind of problem from a different project, but this time in test code.

public class IntegrationTestUtil {

    public static final int AVAILABLE_SEATS = 20;


    public static Pair<Event, String> initEvent(List<TicketCategoryModification> categories,
                                                OrganizationRepository organizationRepository,
                                                UserManager userManager,
                                                EventManager eventManager) {

        String organizationName = UUID.randomUUID().toString();
        String username = UUID.randomUUID().toString();
        String eventName = UUID.randomUUID().toString();

        userManager.createOrganization(organizationName, "org", "email@example.com");
        Organization organization = organizationRepository.findByName(organizationName).get(0);
        userManager.insertUser(organization.getId(), username, "test", "test", "test@example.com", Role.OPERATOR, User.Type.INTERNAL);
        userManager.insertUser(organization.getId(), username+"_owner", "test", "test", "test@example.com", Role.OWNER, User.Type.INTERNAL);

        LocalDateTime expiration = LocalDateTime.now().plusDays(5).plusHours(1);

        Map<String, String> desc = new HashMap<>();
        desc.put("en", "muh description");
        desc.put("it", "muh description");
        desc.put("de", "muh description");

        EventModification em = new EventModification(null, Event.EventType.INTERNAL, "url", "url", "url", "url", null,
                eventName, "event display name", organization.getId(),
                "muh location", desc,
                new DateTimeModification(LocalDate.now().plusDays(5), LocalTime.now()),
                new DateTimeModification(expiration.toLocalDate(), expiration.toLocalTime()),
                BigDecimal.TEN, "CHF", AVAILABLE_SEATS, BigDecimal.ONE, true, Collections.singletonList(PaymentProxy.OFFLINE), categories, false, new LocationDescriptor("","","",""), 7, null, null);
        eventManager.createEvent(em);
        return Pair.of(eventManager.getSingleEvent(eventName, username), username);
    }

}

 

This is some test code taken from an open source project that I won’t name, and it’s a little different to the application code example. This time, we are looking at some so called ‘helper’ code that is used by many different tests.

There are less technical abstractions present, but there is still a high degree of coupling. We have common test setup methods that at first glance seem like they might provide a benefit to us. After all, it saves us from having to do all of that same setup in each test method.

The problem with this is though, that far too much setup has made it’s way into this common method. Every line of that shared test setup code is relevant to one or more of the test methods, but not all of that setup is revelant to every test.

This creates brittle and fragile tests, where for any given test method, we don’t know what input is relevant to the scenario being tested.

And, if we need to have more or slightly different input data for a particular test, then if we add to or modify that shared test setup, we don’t know what effect that might have on the other tests using this setup method.

I have seen test code with a large number of shared test setup helper methods get to the point over time that these shared methods have been modified so much, some of the tests are no longer testing what they original were, because of the lack of clarity of the relevant input data for each test.


To summarise the examples, they both suffer from the problem of developers being too keen on not duplicating code.

This has been taken to the point of not wanting to duplicate the same or similar set of method calls in a codebase, and has resulted in abstractions exisitng not to encapsulate something from our problem domain, but to simply avoid code duplication.

These are vague abstractions that increase complexity.

This kind of coupling can result in:

  • Developers being fearful of changing the common code in case it breaks some unknown part of the system.
  • Excessive parameters being added to methods to control different branches or use cases of the shared code.
  • Logic relating to a particular use or case or area of business logic becomes fragmented, and distributed between classes.

Signs that we should be duplicating code

Thankfully though, there are some things to look for in our codebase that could indicate we are creating an artificial abstraction that might increase rather than reduce our complexity.

These traits in your codebase don’t necessarily mean you have this problem, but should be indicators you need to pause for a minute and work out whether you are heading in this direction

Some of these are:

  • Classes with only one methods
  • Classes with methods that are named pretty much the same as the class itself
  • Big if statements in methods with a parameter used to control which branch executes
  • Classes named after what they do rather than what they represent i.e. named after the code they call, and typically contain suffixes such as ‘Orchestrator’, ‘Manager’, ‘Utility’, ‘Helper’ etc
  • Abstractions that exist for ‘future’ benefit e.g. to allow a part of the system to be swapped out in future.
  • Large amounts of configuration required just to wire up dependencies

So, what should we do?

So to conclude, if you see yourself spotting traits like this in your codebase, then I would recommend you start off by duplicating code instead of creating abstractions.

Then, once you have a good understanding of how the different parts of your codebase use the duplicated code, you can decide whether or not there is a genuine common abstraction that should exist.

Sometimes there may well be a valid abstractions that should exist to encapsulate something about your problem domain and it’s behaviour.

Often though, we don’t actually need to introduce new abstractions. Remember that they are not cost free. They introduce coupling and complexity.

In my experience it’s almost always cheaper and easier to fix the codebase, starting from a point of having a small amount of duplicated code rather than try to deal after the fact with high complexity and coupling with the wrong abstractions in place where you have to try and unpick the shared code.

Duplication is far far cheaper than the wrong abstraction.

Leave a Reply