An Elegant Puzzle / Will Larson
Will Larson is a SWE manager, latterly at Stripe, who also published this book.
"..others become managers in a cynical pact, exchanging excitement in their current role for the prospect of continued salary bumps and promotions"
Chapter 1, Organisations
"It'll be an unusual month that you won't consider some aspect of team design."
A team should be six to eight people when it's in it's stable configuration. At times this can flex, but it should not be the status quo.
When more teams are required, teams should grow above this to eight-to-ten, and then split into two smaller budding teams of four-to-five, and then build back from there.
Keep innovation and masintainace together. This avoids a two-tier system of engineers, where innovation is seen as the more desirable role.
Generally a manager shouldn't be expected to support more than eight individuals. More than this, and the manager can only move into a coach role, which can be a safety net to support specific problems, but can't keep up with the cadence and detail of the engineers.
Teams have four states:
|Working very hard, but still falling behind. Morale Low.
|Getting critical work done, but neither paying down technical debt, or able to take on new projects. Morale Low.
|Reaping benefits of repaying debt. Each piece repayed makes the next easier. Snowball effect. Morale good.
|Debt low. Working on new projects. Satisfying clients/users. Morale high.
Entropy drags teams backwards through the states. Good management and decisions help teams move forwards through the states. It can be easy to stray into 'tactical' approaches to soling a teams problem, while the correct approach is holistic and the results may be slow to appear. Keep the faith!
Strategy: Add people. The net number of organisation memebrs should rise. Don't reassign from other teams as they will then slide back.
Tactics: Set client expectations. Beat the drum about easy wins which can be found in the workload. Inject optimism and build in relaxation and recovery time.
|Budget issues need to be navigated. More communication overhead. More people to coordinate. Hiring usually creates political effects.
Strategy: Reduce WIP. Focus on finishing open tasks, and reducing concurrency. If the team is working on 5 things, make it 2.
Tactics: Focus on helping the team see the value of their contributions to the team. Don't put the team first, but focus on respect and trust and how that pays dividends for the whole team.
|Less work gets done from the client perspective; the 'outsiders view' is that the team has slowed down it's delivery rate.
Strategy: Add time. Everything IS working, but you need to wait for the compounding value of debt repayment to grow.
Tactics: Manage client expectations. It's easy to fall into a hole of debt-repayment activity which to an outsider may seem like work has stopped. Retain some work on visible small features. Be aware of communications, as client impatience may cause a slide back into Treading Water.
|Morale will rise as the team can see the manager valuing debt and considering future productivity.
Strategy: Add slack. You want the team to stay in this state for as long as possible (it won't be forever!). Encourage the team to build quality into their work, and innovate. Encourage open communication in the team to identify causes of backsliding.
Tactics: Promote the work being done to outsiders. Don't let this team be seen as working on 'science projects'. Bang the drum about value delivered and future potential!
|Less work gets done. Morale can suffer.
Remember, changes are slow, and the strategy and tactics need to be kept in a repetitive cycle. Communicate both the strategy and tactics to the team and the stakeholders.
You usually have multiple teams which require help, as they'll all be at a different place on the team scale.
That doesn't mean that you move all teams equally, or prevaricate in the middle. Deal with the issues in one team at a time, while keeping the others steady.
Don't move 'troubleshooter' team members around, despite the temptation to do so. They are valuable where they are. If you have one team which is at the Innovation stage, and you move a SWE to one which is treading water, then you may raise one team, but you lower another. You end up with neither team doing well.
Teams take a long time to jell. Once you move people around, the jelling cycle starts again.
If you have create a high-performing team, then afterwards it's better to move that teams scope, rather that dismantle the team in favour of other priorities.
You only get value from projects once they finish. To make any progress, you must ensure that some of your projects finish.
Finishing projects is really hard when your productive time is competing with other priorities.
Ad-hoc interruptions are a huge time thief for your teams. Chat messages, emails, service status alerts and so on are death by a thousand cuts.
Teams and individuals need to learn how to funnel these into smaller and smaller buckets of time, and then automate/document as much as possible. It should be simple to turn a service alert into a ticket. Telling a colleague to read a wiki page is more efficient than walking them through troubleshooting.
Create rotations for people to do support and question answering duty. This is INCREDIBLY uncomfortable for most engineers: they don't want to do support, and everyone wants to answer questions to appear helpful -- but you need to persist through the discomfort and create systems; otherwise the number of interruptions becomes higher and higher to the point it's unsustainable.
ICs should be encouraged to block out calendar time to work uninterrupted. They should work hard to protect this time when needed, to the point of obstinance.
Avoid letting any ICs become gatekeepers. If your systems are so fragile, that they need a gatekeeper to run the playbooks, then fix the system as a priority. From time to time, gatekeepers may be necessary for legal or compliance reasons, but these can be augmented with documentation, rotas, and having multiple ICs in each gatekeeper role.
None of the above are quick wins. But as a pattern of behaviours (avoid interuptions, document well, respect boundaries, built rotations), they add up.
Chapter 2, Tools
"The best changes go unnoticed, moving from one moment of stability to another."
Tools for leading these transitions are systems thinking, metircs, and vision.
When teams destabilise, then gaps open in the team; it's the managers job to step in and fill that gap, in product, engineering, sales, or whatever is needed. Be the glue.
Stocks and Flows
The links between events are often more subtle than they seem at first.
Sometimes large changes appear to happen instantly, but in truth they are a large accumulation of smaller changes which have built.
We outline these systems in terms of stocks and flows and links.
- A stock is the number of trained software engineers in the company.
- Changes to this stock is a flow:
- Engineers being hired and trained in an inflow.
- Engineers who depart the company are an outflow.
- The overall engineering productivity of the organisation holds a link to the stock of software engineers.
It's important to remember that every flow has a rate, and every stock has a quantity.
Applied to Engineering Velocity
Developer velocity is measured in four ways (Source: "Accelerate: The Science of Lean Software and DevOps", by Forsgren, Humble and Kim):
- Delivery Lead Time: time from creation of code to it's ready to be deployed in production.
- Deployment Frequency: how often code is deployed to production.
- Change Failure Rate: how often a deployment causes a failure in production.
- Time To Recover: how long it takes to recover from a failure in production.
So, to convert this into a system of stocks and flows:
- Code goes through the engineering flow, eventually terminating in a pull request to become ready code, based on the code review rate.
- Ready code goes through the deployment flow, at the deploy rate.
- Deployed code converts into failures at the defect rate.
- Failures are remediated at the recovery rate.
- Code is debugged at a given debug rate, and returns to the first flow, to become ready code again.
This is an emergent feedback loop: with any sufficiently high rates, the system's problems will compound and each deploy can leave us further behind: with any sufficently high defect rate or recovery rate.
The model is a good one, and it helps to identify where to invest if we think in stocks and flows. If the team doesn't have large stock of code ready to deploy, then improving the deploy rate may not be valuable. Equally, if the team experiences a very high recovery rate, then lowering the defect rate will mitigate this.