I was asked the question a while ago, How do I handle tech debt? and I am not sure I ever put it down in a form that makes sense; so this is an attempt at trying to convey several tools I use.
Pay back early
The initial thinking to handling tech debt is not my idea, but stolen from Steve McConnells' wonderful Code Complete. Steve shows in his book that is if you can tackle tech debt earlier in the lifecycle of a project, the cost of tech debt is a lot less. Basically the quicker you get back to repaying that debt, the cheaper it will be.
One aspect to keep in mind is that while the early part of a project may refer to greenfields/new projects in your case, it is not limited to that idea. The early part could also mean epic-level-sized pieces of work that are started on an existing project, thus the "catch it early and it is cheaper" applies to existing teams just as much as it does to teams starting a new project.
When I do think of what to do specifically to handle tech debt, the one that comes to mind first is also the only one that won't fit into an existing team easily and that is team organisation.
I've seen this at Equal Experts, and previously when I worked at both AWS and Microsoft. The simple answer is that teams above 10 people fail. Why is 10 the magic number though?
- First is a bit of how we are wired, 15 is the number of close relationships we can have and a closer team performs better; but the eagled eyed reader you are will note that 10 and 15 aren't the same and this is because you need to allow team members develop close relationships across an organisation, not just in their team.
- The second reason why 10 is the magic number is that as we develop increasingly complex systems, the ability for people to hold all the information in their heads gets increasingly difficult. If we maintain teams at about 10, it forces limits on the volume of work they can build. This natural limit on size limits the complexity too meaning you end up with many small teams.
When I say team, I am not referring just to engineering resources but the entire team; POs, QAs, BAs, and any other two-letter acronym role you may come up with. The entire team is 10 or less - so you may find you only have 4 engineers in a team.
There is also something just right about having teams of 8 to 10 people who have the majority of the skills they need to deliver the team end-to-end deliverables. The idea of a dedicated front-end team or dedicated back-end team, where they own part of a feature should be more unique in an organisation. The bulk of teams in a healthy organisation should own features end-to-end. This will force teams to work together and when you have teams holding each other responsible for deadlines and deliverables that helps trim fat in many aspects of delivery.
Having a team responsible to other teams, equally empowers the team to push back on new work because making sure their debt doesn't overwhelm them and prevent them from actually meeting the demands of the teams which they are responsible for.
The focus in an area also helps people become more a master of the tech they use, and less a generalist and that mastery means the understanding of required trade-offs that lead to tech debt are better understood, compensated for and implemented right and that will, over time, lower the tech debt.
I always encourage clients to adopt the DevOps mindset. The above cross-skill, end-to-end ownership hints at that, but one of the best pillars in DevOps to get in early is "You build it, you run it". This term is simply the idea a team can write code, deploy it, monitor it and support it.
This mindset might feel like it goes against the above idea of a team having the skills they need and being empowered for a whole feature because how can a team own everything from first principles? But we will get to that solution later on.
Where I want to focus on how DevOps helps in regards to lowering tech debt, is that while a lot of serious outages are not caused by tech debt; how long it takes to recover from an outage is often directly related to the tech debt of the team. Recovery from an outage is not just "the website is back", but includes all the post-incident reviews and working out how many clients were impacted etc... Nothing I have found motivates a product owner and empowered team to cut down on tech debt than the risk they will be woken up for incidents at 2 am and not spending a lot of time after trying to root cause it.
Happy I can even share more specific information from my latest project, as I have a YouTube video the client made with us on this I very aspect.
Reign in tech
A powerful tool in large organisations is to limit technology choices across an organisation, as tech which is kept up to date and used is a lot less of an issue for tech debt, compared to an old system written in a language or tooling that few understand.
Bleeding edge is called that cause it hurts
A small solution to tech debt is to pick stable tech for your organisation. Nothing builds up tech debt faster and is more painful to deal with than bleeding-edge tech.
Horizontal scaling teams
That cost of getting started and frustration just leads teams to not invest which is yet another major worry. This is also the solution to how a small team deals with the first principles that I mentioned earlier.
To solve this we often build horizontally focused teams that have a single feature or set of features that other teams build on top of. IDPs are a great example of this. Another example is having a team that handles all the web routing, bot detection, caching etc... and other teams plug into their offerings. In this case, the team building the web tech might say "We only support caching with React ESIs" so teams in the business can choose to use React and get the benefit of speed and support. They are still empowered to choose something else, but they now need to justify the trade-offs of lost speed compared to using the "blessed tech" from the horizontal teams.
A great example of this is covered in the Equal Experts playbook on Digital Platforms.
Trickle down updates
An interesting side effect of the horizontally scaled teams is that they also become their mini places which force other teams to keep their tech up to date. This happens naturally as the horizontal team updates and forces new updates to consumers of their solutions.
I was looking at that recently where the team responsible for the deployment pipeline runners we use, issued new runners and that meant we needed to take on operational work to migrate from the old runner system to the new runner system. This forced work meant cleaning and fixing how we worked with the runners; it wasn't a great time for the team but the system coming out at the end is in better shape.
An aspect that was unique to AWS which I loved, and also is easier to adopt than organisational changes, is the concept of bar raisers.
The idea is the bar raisers are a group of people who give guidance to others to improve them, but they are not responsible for the adoption of that guidance.
For example, at AWS if you wanted to do a deployment which was higher than normal risk, you would be required to complete a form explaining how it will happen, how you will test and how you will recover if it does wrong. You would then take that to the bar raisers who would review the document and give you feedback. This is great because they are not gatekeepers, they are not there to prevent you from doing a deployment (again teams need to own what they build), but they bring guidance and wisdom to the teams.
We had set times for the bar raisers and set days when each of us would do it, which helped the senior people not be overwhelmed with requests. The concept of bar raisers was used in all aspects, including security and design. This sharing helped teams find out about each other's capabilities, and shared knowledge and helped teams from falling into holes others had found while not bringing in the dreaded micromanagement.
Tech debt is normal work
The last two concepts are two of the easiest to adopt in any organisation. The first is just to capture all tech debt as normal work in your backlog. This helps teams prioritize and understand the lifecycle of their projects better.
We have done some experiments recently to measure avg. ticket time, and when coupled with operation tickets (as we call them) they drag your avg. down if they not getting attended to. This helps the product owner to prioritize correctly and understand the impact.
Even if a team doesn't pick up the work immediately that is ok because an important aspect of teams that do adopt "you build it, you run it" is they will have natural ebbs and flows in their work. For example, the festive season might be very quiet since you'll have someone on call in case something goes wrong, but a lot of the team is not there. This quiet time becomes a great opportunity to get tech debt resolved.
Lastly on capturing it; you can't fix what you can't see - so shining a light on it, and just going "Well that is worse than we expected" is a great first step.
Tech debt sprints
The last one is the idea we had from my days at Microsoft: tech debt sprints. I spoke of this at Agile Africa, in case you want to watch a video. The idea is to add an extra sprint into every feature at the end and just allow the team to tackle tech debt. At Microsoft, this let us go fast, ship MVPs to customers, get feedback and make trade-offs all knowing we were piling up the tech debt, but also gave the team confidence that it would be fixed sooner, rather than later or never.