I like to think that a legacy system is code that’s being used in production and still works fine, but is using older designs or older techniques that are no longer in common use. Legacy isn’t a label that means something is broken or retired, it means the software still works but might not be modifiable for some reason – possibly the software only runs on an older version of Linux or Microsoft Server because the language or libraries it depends on has been deprecated.
When I mean the code isn’t modifiable or updateable this can be for a variety of different reasons such as the language it was built in is no longer generally available or can’t be built on a more modern machine. Possibly, your company is already in the middle of updating/re-writing the legacy system or has lost the source code over the years. Over the years, I’ve spent time trying to decompile code and sometimes this works really well and other times it’s completed but the code is impossible to read.
In his book, Working Effectively With Legacy Code, Michael Feather’s describes legacy code as “To me, legacy code is simply code without tests”. I don’t completely agree with Michael’s description of legacy code but I do agree that a lot of legacy systems didn’t have tests – and usually they have very little documentation. I believe that a large part of the reason systems don’t change is because it becomes too difficult to change them and I think a lack of tests are part of the problem, but I don’t think it’s the complete cause either.
I began my first programming contracts in 2003 – back then Microsoft was first announcing it was going to end support for Visual Basic in favour of Visual Basic.NET or C#.NET. I spent a lot of the next five years helping companies plan their migration and then implement the plan. Rewriting an application entirely usually fails, it needs to be done in pieces which is really difficult when changing languages and environments. I believe that most of my career and most of your career will involve working with legacy systems whether you want to or not.
So, how does code become legacy?
Code becomes legacy for a variety of reasons, but usually the reasons are related to a lack of regular maintenance or software rot. Anything that stops development or slows it down will cause legacy systems to become obsolete.
Lack of Regular Maintenance
Usually code doesn’t get maintained because it’s difficult to get it running or because its become so complicated that it gets hard to work with. Both of these types of “code rot” happen over time, usually over years and things continue to get worse.
Lack of Tests
Nobody wants to change a system that has no documentation, no automated tests and takes at least hours to manually test. The risk of breaking things becomes greater and greater as a system grows and as there’s more code to change.
Source Code No Longer Available
Over time, people leave companies or hardware dies. Companies sometimes lose access to the source code of a vital program over time. The end result is that there’s a system in production that can’t be changed because there’s no code.
Make sure you always use source control and that the source control system is backed up so this can be avoided in the future.
How do we deal with legacy code?
Having the right attitude when dealing with legacy code is essential. Reading through the code and thinking that the code is terrible or that you could have constructed better code isn’t the right attitude. Legacy code isn’t your enemy, most likely you wouldn’t be able to work for the company if this system wasn’t in production and the company is probably built around this system. Instead of a really negative attitude, we need to look at it from a different perspective – a positive perspective.
The software developer that wrote the original system may not have had access to the development community, blogs, youtube videos, and documentation that’s available now. Software developers 5 or 10 or 20 years ago weren’t usually able to just throw more hardware at a problem and see it quickly resolved. Working with legacy code is usually a tremendous learning opportunity, developers that worked on legacy systems usually wrote really tight code that had to be very performant in low memory low disk environments. A lot of the SQL Optimization and performance hacks I’ve learned about came from working with older, legacy systems or reading older books. I still routinely refer to my The Guru’s Guide to SQL Server as it’s still full of golden nuggets even though the Author died years ago and the books were last updated in 2004.
Understand the Architecture
Understanding the architecture of the system is essential – take the time and read through any documentation. If there’s no documentation, take the time and draw some really rough drawings of the different tiers/layers of the app, application module diagrams, and ERD diagrams. Keep in mind that reverse engineering, source code to understand the architecture is incredibly inefficient and won’t be 100% accurate, so when making changes try and keep update the docs or create the docs.
Write Tests
If the language and environment supports automated testing, then write some tests if they don’t exist and if they do exist try to get them running. Read through the tests as usually these are the easiest way to understand the system and how it actually works.
One of the key points that Michael Feathers wrote was about introducing or finding seams in the code that can allow you to use dependency injection to slowly add tests. Separate and decouple code as you can, and the system can more easily be refactored and changed in the future.
Start Small
Making changes in legacy code, can feel like you are walking into a mine field. Small incremental changes are less likely to break things.As you make small incremental changes it becomes easier to see the big picture of how the system works and allows you to better document things which allows for greater refactoring.
Follow the Code’s Coventions
Your changes should appear like they are original, meaning the code as you change it should conform to the way other things look. Variables and methods should have similar names. There’s one caveat to this. I believe you should always consider introducing smaller functions even if the convention is larger functions.