Skip to main content

Refactoring vs. Rewriting: How to Decide

A framework for deciding whether a system needs a localized refactor or a complete rewrite - and how to avoid the framework rewrite trap.

You read a book on software architecture over the weekend, and on Monday you open a core billing module with the intention of cleaning it up. You start renaming variables. Then you extract a 500-line method into five smaller ones. Because the state is tightly coupled, you have to update the function signatures in three other files.

You do this on the main branch. You don't write tests first because you assume you know the workflows. On Tuesday, production goes down because you missed a null check hidden in the original spaghetti code. You freeze deployments for three days while the team reverts the commits.

The 'Delete' key is a dangerous drug

Clean code is not a style preference. When you change the structure of a system without validating its behavior, you break the system.

To avoid bringing down production, you need a framework for deciding whether a system needs a localized refactor or a complete rewrite. You evaluate this by looking at two things: the quality of the internal structure, and the accuracy of the business logic.

If the application calculates shipping rates correctly but the codebase relies on six levels of nested callbacks and global variables, the business logic is sound. You refactor. You write characterization tests around the inputs and outputs of the shipping calculator to lock in the behavior, then you swap out the internals.

If the system is a legacy inventory tracker where the code is perfectly tested and modular, but the underlying assumptions are wrong - for example, it assumes single-warehouse fulfillment but the company just opened three new distribution centers - the logic is solving the wrong problem. You don't refactor code that solves the wrong problem. You rewrite.

The most common trap teams fall into is the framework rewrite. Your team blames the old routing library for the application's instability, so you spin up a new repository in a modern framework. Six months later, you discover the actual problem was a race condition in the database layer. You just ported the race condition to a new technology stack.

Legacy code is a repository of discovered edge cases. That cryptic regex parsing user inputs exists because a specific enterprise client sends malformed data every Friday. If you throw away the code without understanding it, you throw away the knowledge, and you will have to relearn every edge case in production.

If you decide to rewrite, look at your test coverage. If you cannot write a test suite that captures the current system's behavior, you cannot safely rewrite it. You will spend months chasing feature parity.

If you decide to refactor, don't isolate the work on a long-lived branch. You will spend six weeks rewriting the data layer, only to discover that the rest of the team merged forty conflicting pull requests into main while you were isolated. You will spend more time resolving merge conflicts than you did writing the new logic. Ship the new structure alongside the old one, hide it behind a feature flag, and migrate the traffic incrementally.

You know the code is bad, and you know how to fix it. But rewriting a core service means halting product feature delivery for a quarter. If you walk into a planning meeting and tell your product manager you need three months to improve code quality, they will say no. In part three of this series, we look at how to get a product manager to actually say yes to three months of zero new features.