Legacy Code: How to Change It Safely and Effectively
Working Effectively with Legacy Code
Legacy code is a term that often evokes negative feelings among software developers. It refers to code that is difficult to change, understand, or maintain, usually because it lacks proper tests, documentation, or structure. Legacy code can be a source of frustration, technical debt, and bugs.
Working Effectively with Legacy Code
However, legacy code is also a valuable asset for many organizations. It represents years of investment, business logic, and customer feedback. It may be running critical systems or generating revenue. Legacy code is not something that can be easily discarded or replaced without risking functionality, quality, or customer satisfaction.
Therefore, working effectively with legacy code is an essential skill for software developers. It means being able to make changes to legacy code without breaking it, while improving its design, performance, and testability. It also means being able to add new features to legacy code without introducing new problems or increasing complexity.
In this article, we will explore some strategies and techniques for working effectively with legacy code, based on the book Working Effectively with Legacy Code by Michael Feathers. We will cover how to get legacy code into a test harness, how to add features to legacy code, and how to fix bugs in legacy code.
What is Legacy Code?
Before we dive into the strategies and techniques for working effectively with legacy code, let's first define what legacy code is. According to Michael Feathers, legacy code is simply code without tests. He argues that the main problem with legacy code is not its age, size, language, or style, but its lack of tests.
Tests are crucial for working effectively with legacy code because they provide feedback and confidence. Feedback means that you can quickly verify if your changes have the desired effect or if they break something else. Confidence means that you can make changes without fear or hesitation, knowing that you have a safety net of tests to catch any errors.
Without tests, working with legacy code becomes a risky and tedious activity. You have to rely on manual testing, debugging, or code analysis tools to check your changes. You have to spend a lot of time understanding the code and its dependencies before making any changes. You have to be very careful not to introduce new bugs or regressions.
The Challenges of Legacy Code
Working with legacy code poses many challenges for software developers. Some of the common challenges are:
Complexity: Legacy code tends to be complex and convoluted, with many interrelated classes, methods, variables, and parameters. It may also have duplicated code, dead code, unused code, or commented-out code. Complexity makes it hard to understand what the code does and how it works.
Coupling: Legacy code tends to be tightly coupled, with many dependencies between different parts of the system. It may also have external dependencies on libraries, frameworks, databases, or third-party services. Coupling makes it hard to isolate and test individual components of the system.
Documentation: Legacy code tends to have little or no documentation, or outdated or inaccurate documentation. It may also have misleading or inconsistent comments, variable names, or method names. Documentation makes it hard to find the intent and rationale behind the code.
The Benefits of Legacy Code
Despite the challenges, working with legacy code also has some benefits for software developers. Some of the benefits are:
Learning: Legacy code provides an opportunity to learn new skills, techniques, and tools for improving code quality, testing, and refactoring. It also provides an opportunity to learn from the mistakes and successes of other developers who wrote the code.
Value: Legacy code represents a valuable business asset that has been proven to work and meet customer needs. It also represents a competitive advantage that can be leveraged or enhanced by adding new features or improving performance.
Satisfaction: Legacy code provides a sense of satisfaction and accomplishment when you manage to make it better, faster, or more reliable. It also provides a sense of pride and ownership when you contribute to the evolution and maintenance of the system.
How to Work Effectively with Legacy Code
Now that we have defined what legacy code is and why it matters, let's look at some strategies and techniques for working effectively with legacy code. The main goal of these strategies and techniques is to enable us to make changes to legacy code in a safe and controlled manner, while improving its quality and testability.
The first step in working effectively with legacy code is to get it into a test harness. A test harness is a set of automated tests that can verify the behavior and functionality of the code. A test harness allows us to make changes to the code with confidence, knowing that we can detect any errors or regressions quickly.
Getting Legacy Code into a Test Harness
The challenge of getting legacy code into a test harness is that legacy code is often not designed or written with testing in mind. It may have dependencies, side effects, or hidden states that make it hard to test in isolation. It may also have long methods, large classes, or deep inheritance hierarchies that make it hard to test in detail.
To overcome this challenge, we need to apply some techniques that can help us break dependencies, identify seams, and write characterization tests. These techniques are described below.
Identifying Seams and Breaking Dependencies
A seam is a place where you can alter the behavior of the code without editing it. For example, a seam can be a method call, a class instantiation, a parameter passing, or an interface implementation. By identifying seams in the code, we can insert test doubles (such as mocks, stubs, or fakes) that can simulate or replace the dependencies of the code under test.
Breaking dependencies means modifying the code in a way that allows us to replace its dependencies with test doubles. For example, breaking dependencies can involve extracting interfaces, adding setters or getters, parameterizing constructors or methods, or using dependency injection frameworks. Breaking dependencies allows us to isolate and test the code under test without invoking its dependencies.
Writing Characterization Tests
A characterization test is a test that describes the current behavior of the code. It does not verify if the behavior is correct or desired, but only if it is consistent. By writing characterization tests, we can capture the existing functionality of the code and use them as a baseline for future changes.
Writing characterization tests involves using a tool (such as a debugger or a code coverage tool) to identify the inputs and outputs of the code under test. Then, we write tests that use those inputs and outputs as assertions. We run the tests and check if they pass or fail. If they pass, we have successfully characterized the behavior of the code. If they fail, we have found a bug or an inconsistency in the code.
Refactoring Legacy Code
Refactoring is the process of changing the structure of the code without changing its behavior. Refactoring improves the design, readability, and maintainability of the code. It also makes it easier to write more tests and add more features.
Refactoring legacy code involves applying small and safe transformations (such as renaming variables, extracting methods, or introducing polymorphism) that preserve the functionality of the code. We use the characterization tests as a safety net to ensure that we don't break anything while refactoring. We also use tools (such as IDEs or refactoring tools) that automate or assist us with refactoring.
Adding Features to Legacy Code
Adding Features to Legacy Code
The next step in working effectively with legacy code is to add features to legacy code. Adding features means extending or modifying the functionality of the code to meet new or changing requirements. Adding features can be challenging because legacy code may not have a clear or modular structure that supports new functionality. It may also have hidden assumptions or dependencies that can conflict with the new functionality.
To overcome this challenge, we need to apply some techniques that can help us find the change points, apply the sprout and wrap techniques, and use mock objects and fakes. These techniques are described below.
Finding the Change Points
A change point is a place in the code where we need to make a change to implement a new feature. For example, a change point can be a method, a class, a module, or a subsystem. By finding the change points, we can focus our attention and effort on the relevant parts of the code and avoid unnecessary changes.
Finding change points involves using a tool (such as a debugger or a code analysis tool) to trace the execution path of the code related to the new feature. Then, we identify the places where we need to insert, modify, or delete code to implement the new feature. We also identify the places where we need to add tests to verify the new feature.
Applying the Sprout and Wrap Techniques
The sprout and wrap techniques are two ways of adding features to legacy code without changing its existing structure or behavior. They allow us to isolate the new functionality from the old functionality and minimize the risk of breaking something.
The sprout technique involves adding a new method or class that contains the new functionality and calling it from the existing code. For example, if we want to add logging to a legacy method, we can create a new method that does the logging and call it from the legacy method.
The wrap technique involves creating a new class that wraps the existing class and adds the new functionality. For example, if we want to add caching to a legacy class, we can create a new class that delegates to the legacy class and adds caching logic.
Using Mock Objects and Fakes
Mock objects and fakes are two types of test doubles that can help us test legacy code that has dependencies. They allow us to simulate or replace the dependencies of the code under test and control their behavior and output.
A mock object is an object that mimics the interface of a dependency and verifies that it is called correctly by the code under test. For example, if we want to test a legacy class that sends emails, we can create a mock object that implements the email service interface and checks that it is invoked with the right parameters by the legacy class.
A fake is an object that provides a simplified or partial implementation of a dependency and returns predefined values or results. For example, if we want to test a legacy class that reads data from a database, we can create a fake object that implements the database access interface and returns hard-coded data instead of querying a real database.
Fixing Bugs in Legacy Code
The final step in working effectively with legacy code is to fix bugs in legacy code. Fixing bugs means correcting errors or defects in the code that cause incorrect or unexpected behavior. Fixing bugs can be difficult because legacy code may not have tests that can reproduce or isolate the bug. It may also have side effects or interactions that can cause new bugs when fixing an existing one.
To overcome this challenge, we need to apply some techniques that can help us pinpoint the bug location, write a failing test case, and apply the scratch refactoring technique. These techniques are described below.
Pinpointing the Bug Location
Pinpointing the Bug Location
Pinpointing the bug location means finding the exact place in the code where the bug occurs. For example, pinpointing the bug location can be a line of code, a variable assignment, a method call, or a conditional statement. By pinpointing the bug location, we can focus our attention and effort on the root cause of the bug and avoid unnecessary changes.
Pinpointing the bug location involves using a tool (such as a debugger or a code analysis tool) to trace the execution path of the code related to the bug. Then, we identify the place where the code deviates from the expected behavior or produces an incorrect output. We also identify the inputs and outputs of the code at that point.
Writing a Failing Test Case
Writing a failing test case means writing a test that reproduces the bug and fails because of it. For example, writing a failing test case can involve using the inputs and outputs that we identified when pinpointing the bug location as assertions. By writing a failing test case, we can verify that we have correctly understood and isolated the bug and use it as a guide for fixing it.
Writing a failing test case involves using a testing framework (such as JUnit, NUnit, or TestNG) to write a test that invokes the code under test with the inputs that trigger the bug. Then, we write assertions that check for the expected outputs or behavior. We run the test and check if it fails because of the bug.
Applying the Scratch Refactoring Technique
The scratch refactoring technique is a way of fixing bugs in legacy code without changing its existing structure or behavior. It allows us to isolate and fix the bug in a separate copy of the code and then merge it back to the original code. It also allows us to write more tests and refactorings along the way.
Applying the scratch refactoring technique involves creating a copy of the code under test and its dependencies in a separate project or workspace. Then, we fix the bug in the copy by applying small and safe transformations (such as renaming variables, extracting methods, or introducing polymorphism) that preserve the functionality of the code. We use the failing test case as a safety net to ensure that we don't break anything while fixing the bug. We also write more tests and refactorings as needed to improve the quality and testability of the code. Finally, we merge the fixed code back to the original code and run all tests to verify that everything works as expected.
Conclusion
In this article, we have learned some strategies and techniques for working effectively with legacy code, based on the book Working Effectively with Legacy Code by Michael Feathers. We have covered how to get legacy code into a test harness, how to add features to legacy code, and how to fix bugs in legacy code.
Working effectively with legacy code is an essential skill for software developers because legacy code is a common and valuable reality in many software projects. By applying these strategies and techniques, we can make changes to legacy code with confidence, while improving its design, performance, and testability. We can also add new features to legacy code without introducing new problems or increasing complexity. We can also fix bugs in legacy code without breaking existing functionality or creating new bugs.
Working effectively with legacy code is not only beneficial for our software systems, but also for ourselves. It allows us to learn new skills, techniques, and tools for improving code quality, testing, and refactoring. It also allows us to deliver value to our customers and stakeholders by enhancing or maintaining their software systems. It also allows us to experience satisfaction and accomplishment by making legacy code better, faster, or more reliable.
FAQs
What is legacy code?Legacy code is simply code without tests. It refers to code that is difficult to change, understand, or maintain, usually because it lacks proper tests, documentation, or structure.
Why is working effectively with legacy code important?Working effectively with legacy code is important because legacy code is a common and valuable reality in many software projects. It represents years of investment, business logic, and customer feedback. It may be running critical systems or generating revenue.
How can I get legacy code into a test harness?You can get legacy code into a test harness by applying some techniques that can help you break dependencies, identify seams, and write characterization tests. These techniques include extracting interfaces, adding setters or getters, parameterizing constructors or methods, using dependency injection frameworks, using debuggers or code coverage tools, and writing tests that use the inputs and outputs of the code under test as assertions.
How can I add features to legacy code?You can add features to legacy code by applying some techniques that can help you find the change points, apply the sprout and wrap techniques, and use mock objects and fakes. These techniques include using debuggers or code analysis tools, creating new methods or classes that contain the new functionality and calling them from the existing code, creating new classes that wrap the existing classes and add the new functionality, and using test doubles that simulate or replace the dependencies of the code under test.
How can I fix bugs in legacy code?You can fix bugs in legacy code by applying some techniques that can help you pinpoint the bug location, write a failing test case, and apply the scratch refactoring technique. These techniques include using debuggers or code analysis tools, writing tests that invoke the code under test with the inputs that trigger the bug and check for the expected outputs or behavior, creating a copy of the code under test and its dependencies in a separate project or workspace, fixing the bug in the copy by applying small and safe transformations that preserve the functionality of the code, and merging the fixed code back to the original code.
71b2f0854b