The Black Hole in the Source Code
At the centre of most galaxies lies a supermassive black hole. Similarly, at the heart of many existing codebases lies a supermassive tangle of legacy code. One of these phenomena is hard to understand: it seems to defy the laws of physics, and sucks the life and energy out of anyone who approaches. The other is a black hole. In this post, I will compare these two mysterious concepts and suggest some strategies for dealing with legacy code that are inspired by some of the postulated solutions to the black hole information paradox.
ParadoxAccording to quantum mechanics, information cannot be copied or destroyed. However, when we consider information that has entered a black hole, a problem arises. Since not even light moves fast enough to escape a black hole and information can’t be transmitted more quickly than the speed of light, it is clear that information can’t escape either. When a particle falls into the black hole, the information about the particle’s state is also trapped inside. But black holes have another interesting feature: Hawking Radiation. This radiation, which is predicted to be given off by black holes due to quantum effects, will eventually cause a black hole to evaporate and disappear. Where does the information go when this occurs? According to general relativity, it can’t have escaped the black hole, so it must have been destroyed – which would violate quantum mechanics. This is the black hole information paradox.
Now, consider a codebase that has existed for some length of time. Like a black hole, over time it has grown larger and larger as more code is added by numerous developers. Working with legacy code can cause several problems. If you don’t completely understand how the code is working, it is much easier to break things and introduce bugs. Also, old code can be dependent on older frameworks and languages that may be deprecated and no longer supported. These factors combine to make it difficult to add new features or make changes to a software platform. This is a major problem when the existing codebase is too large to rewrite and contains critical functionality.
Just as it seems, paradoxically, that it is impossible to recover information about the internal state of a black hole, it seems impossible to recover information about what exactly this existing code is supposed to do, despite the fact that the information supposedly existed when the original developer wrote it.
So, where does the information go? How does code turn into legacy code?
Crossing the Event HorizonWhen writing the original code, every developer is (hopefully) trying to do a good job of keeping the code maintainable. However, as time passes, project requirements change, features get added, programmers move on, and frameworks become outdated. One day, you join the team as a new developer and you take a look at the existing codebase. This is a confusing experience.
“Why is there an unfinished todo from 9 years ago? I hope that wasn’t important.”When does code turn into legacy code? The boundary of a black hole is called the event horizon. This is not so much a physical place as it is a point of no return. Just like an observer travelling towards a black hole does not notice a physical change when crossing the event horizon, a development team doesn’t notice that code has become poorly maintained until it’s too late. There isn’t one single event that turns that well-planned, shiny new feature you’re working on into the next developer’s nightmare. Instead, it is often compounded from the following issues:
“Why is this method 800 lines long?!?”
“I didn’t even know it was possible to run out of CSS selectors!”
- Frameworks going out of date
- Changing project requirements causing haphazard changes and additions
- Different developers following different conventions while working on a project
- Most importantly, a lack of unit tests available for the code
To be Useful, it Must be TestableThe ‘no hair’ theorem states that, according to general relativity, the only information needed to describe a black hole is its mass, angular momentum, and electric charge. These three quantities appear to be the only information that can be measured about a black hole, all the rest of the information about the black hole’s internal state being the aforementioned ‘hair’. In a piece of legacy code with low cohesion, it can be bald in a similar way, lacking ways to access the state information from the outside, and making it hard to test. An example of this would be a piece of browser-based code that is tightly coupled to the DOM. Without mocking the entire web layout, it can be very hard to test the individual parts. Even the most obfuscated, tangled piece of legacy code can be tested as a black box. Internally, before refactoring any functions, it is very important to write unit tests for every bit of functionality you are touching before changing anything. The number one cause of issues and mysterious silent failures when working with legacy code is a lack of tests. If you are doing any refactoring or adding functionality in the existing codebase, make sure that area is fully covered by unit tests. Without tests, you have no way of knowing if you accidentally break an existing feature.
Postulated SolutionsOne proposed solution to the black hole information paradox is complementarity. The gist of this idea is that an observer on the inside of the event horizon sees that the information is contained inside the black hole, and an observer on the outside sees the information being emitted with the Hawking radiation. No observer sees two copies of the information, so this could satisfy the condition that the information is not copied, and both observers can see the information, so it has not been destroyed. When dealing with legacy code, it is important to consider other observers as well. If a senior developer who worked on the code is still around, they can provide a great resource for understanding the codebase. Barring death, illness, relocation, or a terrible space adventure mishap, our electronic messages travelling at light speed can still reach them. It could be worth sending them a message on Slack. The ideal situation would be to have them review your code.
Another postulated solution to the paradox involves multiverse theory. Some researchers propose that a black hole actually splits off into a new child universe. The information paradox could then be resolved if the information is now contained in this new universe. In this scenario, the event horizon of a black hole is actually a boundary between this universe and another one. When refactoring code, it is helpful to identify similar boundaries where you can split pieces of functionality apart. This is easy to do in code with low coupling and high cohesion between modules, but in some old codebases this is not the case and these boundaries are not as apparent. Unit tests are a good resource because when writing them, you naturally have to break the code apart. A good idea for a first pass of refactoring could definitely be splitting up the existing code into a few smaller functions. Look out for duplicate code and long method code smells, and refactor accordingly. This allows you to write more tests for the functions you have just created, and break fewer things.
One final suggestion by some physicists is that information loss in black holes is not a paradox at all, but a natural consequence of the Heisenberg Uncertainty Principle. This principle puts a fundamental limit on the amount of information we can know about a particle. It is also important to consider the limited information we possess about an existing piece of code. Any piece of code could have existing bugs we are unaware of. Refactoring and adding new pieces of functionality offer a good opportunity to take a second look at the code to check for bugs. It’s important to never blindly copy and paste existing code, even if it seems to work already. Take the time to understand every change you are making properly.
A Jump to the LeftAccording to Einstein’s theory of general relativity, large masses affect the passage of time. The stronger the gravitational potential, the slower time passes. As an observer carrying a clock travels closer to a black hole, the clock will tick slower. Engineers working on the Global Positioning System had to factor in general relativity and time dilation in order to make accurate measurements. As a developer, you should take into account the time-warping properties of legacy code when coming up with project estimates. Luckily, no tensor calculus is needed. Simply remember that it will take some additional time to become familiar with the codebase in the beginning. If no one is familiar with an existing piece of code, consider holding a research spike before choosing a number for your estimate. The actual time taken for a task can always vary wildly from the estimate, and this is especially true when the existing code is not well understood. It’s better to bring up unknowns right from the beginning to help product managers or clients better understand why it’s taking you so long to figure out a simple feature.
ConclusionJust like the paradoxical nature of the singularity at the core of a black hole confounds theoretical physicists, legacy code can be similarly confusing for programmers to work with. Both have grown large and unwieldy after spending many years accumulating information, whether in the form of matter or added functionality. However, there are many attributes that the two phenomena do not have in common. Legacy code doesn’t need to be terrifying and mysterious. Taking the time to create a plan of attack and ensuring the existing codebase is carefully tested can allow you to make changes with confidence.
About the Author
Anya McGee is a bipedal ape on co-op with the Publisher team at Hootsuite. She enjoys roasting vegetables and colour-coordinating her Apple products.