An Optimistic Approach to Building Software
We spend a lot of time figuring out what makes good engineering teams. For our clients and for ourselves, it's the difference between a successful project and wasting time and money. There are three metrics most important in determining the success of an engineering team.
Before we cover these three metrics, it's probably good to talk about the differences between a commercial-grade engineering team and a suboptimal engineering team. One that comprises of beginner and junior level engineers and even potentially senior level engineers but is ultimately mismanaged and therefore achieves low-quality results. The difference is often the development methodologies and technologies that the team is using. While having smart people helps, if smart people are using the wrong technologies, and using the suboptimal development methodologies, they will be far less successful than they could be.
The job of the leader of an engineering team is not to make the engineers feel comfortable. This is a mistake. The thought process behind making engineers comfortable is that once they're comfortable they will do what is necessary to make their engineering top-notch. When in reality comfortable engineers typically stop learning. There is a distinction between an uncomfortable engineer who is highly motivated and feels secure in their job and an engineer who is being pressured and unappreciated. It doesn't work to pressure and underappreciate your engineers to get more out of them. You see this a lot, however, what you see even more of is engineering managers who do very little to encourage or motivate their engineers. They do not encourage their creativity and they do not encourage them to expand their career. This is a danger because this leads to engineers who stop learning and become complacent. The same issue occurs when tech teams focus on seniority, hiring only those who are “senior” engineers. Understandably, they want to make sure that the engineer is going to be able to get the job done. However, the problem with optimizing for seniority in a specific library is that a majority of engineers will quickly become proficient in that library and then, if not encouraged to use their ingenuity, will become complacent. While this might not seem like a problem because “we got what we needed”. It misses the point of building technology in the first place. Engineers are the reason why building technology makes you money. Engineering is one of the best ways to encode human ingenuity. In other words, it's a great way to allow humans to figure out how to do hard things in an easier way and then allow something else (a machine) to do that thing for us (automatically). When your engineers stop doing that, when they stop using their human ingenuity and stop codifying it into the project, the project suffers. The engineering team as a whole will become complacent, they will quickly stop creating valuable features. And finally, the product itself will become less valuable, ultimately hurting the end-users.
There are three places a human can be mentally when they're working. They can be completely comfortable, called the green zone. They can be learning and therefore slightly uncomfortable, called the yellow zone. Or they can be very uncomfortable and in a state of anxiety, called the red zone. What has been found is that when a human is in the yellow zone as close as possible to the red zone, the most learning occurs. This is where the most amount of human ingenuity takes place. Therefore, if you are in charge of an engineering team the most important thing that you can do is to figure out when your team is in the green zone and when your team is in the red zone. And if they're in the green zone push them up into the yellow zone, and if they're in the red zone push them down to the yellow zone.
I found that there are three metrics to best help do this. The metrics that you want to focus on are:
How fast are new features being completed? Called feature rate.
What is the rate these new features are causing regressions? Called the regression rate.
And is the mental and physical fatigue and stress on the engineers increasing or decreasing as the feature complexity and quantity increases? Called the automatic rate.
Many people think that as features increase the stress on the engineers increases. But that's actually not true. An effective engineering team will be able to increase the complexity of the features that they are working on while decreasing their mental and physical stress. Human ingenuity is the mechanism that allows for these improvements.
A great way to figure out if you are actually tapping into the most valuable resource on the planet, human ingenuity, or whether you're just milking and running your engineers down into the ground…is to see: as more complex features get developed does the strain and stress on your engineering team decrease or increase?
This gets complicated to figure out because many people confuse the discomfort of learning with the distress of a failing engineering team. A good way to tell the difference between the two is if the engineers are excited about the future of the product and are eager to work on it in the future, or if they're very unhappy to be working on the project because they see no end in sight. They see everything getting harder and harder as they continue.
The most important metric in differentiating a successful engineering team from a failing one, from a team just struggling to learn, is the regression rate. If an engineering team has a high regression rate, meaning that with every feature they add they experience exponentially more regressions, they are not learning. The engineers on that team are not using human ingenuity, the engineers have become complacent and the engineers are not equipped to solve the problems at hand. On the other hand, the most successful engineering teams are ones that have a very low regression rate on projects that they develop. That regression rate decreases as they add features, which proves they are using human ingenuity.
This is why test-driven development has been shown to be the most effective development methodology when developing quality products. Most people resist test-driven development because the initial incremental feature development time is greater than the control development time. With a feature set that is non-zero, this quickly becomes irrelevant. The initial incremental feature development time is the time that it takes to build the next feature.
When comparing test-driven development to all other development methodologies test-driven development has the greatest initial incremental feature development time. That is because with test-driven development, you not only have to write the feature, but you also have to write the test for that feature. For many inexperienced engineering leaders, this seems like a complete waste of time and therefore a complete waste of money. It seems like a much better idea to do away with the testing on each feature and be able to save some money, right? Wrong.
Most development processes have testing and bug fixes at the end of their process as a final step. The problem with doing so is that the development time to add a feature goes up exponentially as the total number of features increases, regardless of development methodology. The reason that the development time goes up exponentially with every incremental feature is that the risk for regressions goes up exponentially.
For example, if you had an app with zero features and you added one feature there would only be a chance for that one feature to regress. In other words, adding that first feature only risks breaking that first feature. If you had an application that had 10,000 features and you went to go add the 10,001 feature you run the risk of breaking the initial 10,000 features. To mitigate this risk most developers, when they're developing the 10,001 feature try and look through all of the other features to make sure that they are not broken. Therefore, the time that it takes to do this increases exponentially as the number of features increases. The reason why the time increases exponentially and not linearly is that the number of possible places where bugs could be increases exponentially as the number of features increases linearly. Using test-driven development we're able to decrease the amount of time that it takes to detect regressions and therefore decrease overall development time, as the developer (and the project manager and the quality assurance engineer) doesn't have to spend as much time figuring out if the new changes broke anything. That is why we focus heavily on the regression rate as a metric to determine the success of an engineering team.
What is a regression?
A regression is not a purposeful change in the way a feature works, nor in the way a collection of features work. A regression is not a design change that is made by the designers. A regression is not a business logic change that is made by the project managers. A regression is an unintended mistake that breaks a previously implemented feature rendering it useless or partially useless to the user.
How to establish a regression rate.
The best way to establish a regression rate is on every merge of a pull request that contains a single feature, count the number of previously built functioning features that are now broken because of the introduction of the new feature.
High-functioning engineering teams should have a regression rate of 0. Pull requests should not be made unless the engineer is confident that the number of regressions introduced will be 0. This requires each engineer to have access to a complete testing environment, where they can run all tests for the entire system as they develop.
To make this possible, realistic, and economical the use of automation in detecting regressions must be used. Using manual regression testing can be effective, however with large projects automated regression testing must be used. Automated regression testing should take approximately 10% or less time than manual regression testing. If your automated regression tests take greater than 10% of the manual time required to detect regressions you're doing your automated tests wrong.
Establish your feature rate
Once you have established your regression rate the next thing to do is to simply count the number of features that you are introducing per month. This is your feature rate. The higher the feature rate and the lower the regression rate the more performant your engineering team is.
Each month you should be able to increase the complexity and number of features that your team is developing while maintaining a regression rate of 0. If this is not possible, your team is not using human ingenuity and is becoming complacent.
The third, final and most important metric is the number of hours that your team is working in a month. This number should decrease as you increase the complexity and number of features you add while maintaining a regression rate of 0. If this number is increasing then you are inefficiently leading your team. As your team works, the work they are doing should build traction, gain momentum and start to organize itself automatically.
This methodology taken to its limit results in a team that no longer works, a product that has everything that a customer wants, and it never breaks. This should be your Northstar when managing an engineering team, creating a new project, or starting a new company. This is possible to achieve. Many engineering teams have achieved it, you just don't think they are engineering teams. In fact, many hedge funds are effectively engineering teams that have achieved a product that is everything they and their customers could want, the engineers don't have to work, and it never breaks. Larry and Sergey also achieved this with Google, they no longer work, the machine of Google fixes and develops itself, it has all the features that they ever wanted and it doesn't break. Larry now sits on a beach doing god-knows-what.
You'll get what you aim for
When building an engineering team, when building a product, when building a new company, the metrics that you optimize for will manifest in your reality. If you optimize for the amount of time that you and your engineers work 10 years down the line, all you'll do is work. If you optimize for the number of features developed in 2 weeks (regardless of the number of regressions) you will end up with a bird's nest of an application, loaded with features, that is always broken.
Monitoring the regression rate keeps you from going backward, monitoring the feature rate keeps you going forwards, and monitoring the working hours of your team keeps you from working more. Outside of intentional work (like going to the gym), work is not the goal. Writing code is not the goal of your engineering team. Fixing bugs is not the goal of your engineering team. Integrations are not the goal of your engineering team. Meetings are not the goal of your engineering team. Documentation is not the goal of your engineering team. Metrics are not the goal of your engineering team. The goal of your engineering team is to create things that people want and are ultimately helpful in some way.