Software projects are notorious for blowing past their projected timelines. Despite numerous techniques to try to mitigate this, delays and cost overruns are the norm, and have been forever. This is illustrated by such phenomena as the Ninety-Ninety Rule:

The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time.

Tom Cargill


Most software projects follow a long-tailed lifecycle. In the beginning, progress is fast as the basic functionality is built out. Whether you're building a peer-to-peer networking library, an AI contract review product or a self-driving car, you can probably quickly knock out something demo-ready. You might even convince people to buy it! But there will be bugs - holes in your logic, or edge cases you haven't considered. Your demo software will be filled with bugs, and your typical user will complain that your software is "crap."

The process of going from demo software to mature software is the process of hammering down the long tail of bugs. How long will you have to hammer for? It depends on the domain.

A wise man once said:

grug understand all programmer platonists at some level wish music of spheres perfection in code. but danger is here, world is ugly and gronky many times and so also must code be

The insight we can draw from Grug's words is that, the closer your problem is to the "ugly and gronky" real world, the longer and fatter the tail will be.

Self-driving cars are very close to the real world, so their tail is long. The DARPA Grand Challenge was in 2004, and after twenty years of hammering down the tail, Waymo still only operates in five cities. Peer-to-peer networking libraries only have to deal with other computers, so their tail is relatively minuscule - Bitcoin is still running on the same chain debuted by Satoshi Nakamoto in 2009.

Ivo's domain sits in the middle. We’re not driving cars, but after decades of evolution, Microsoft Word barely qualifies as "software" anymore - rather it's some semi-organic digital lifeform that no human fully understands. Contracts, while they bear deceptive resemblance to trees and other familiar datastructures, are unapologetically and often deliberately flawed human constructions. And LLMs...don't get me started on LLMs.

The result of this is it's easy to build demo-ready contracting software, but the road to non-crap contracting software involves hammering down a very long tail.

Hammering down the tail

In 2014, I interned on a self-driving car team. The engineers had a team-wide mandate to increase our "Miles-Per-Intervention" (MPI) to 1 million. My memory is fuzzy, but when I joined, I think our MPI was around 0.7. Every week, we would have an org-wide meeting where we went through every single incident where a human driver had had to intervene.

All sorts of stuff came up in the meeting. There was one t-junction in Mountain View where the double-yellow lines extended out pretty far, and rather than risk clipping the painted lines as it cut the corner, the car would just stall forever. There was another intersection where the setting sun would reflect off the red traffic light, making it look active even when the light was green. Also, one time, they took the car to Arizona and it thought all the cactuses were people.

All the interventions were filed as bugs and assigned to the appropriate teams, who would diligently fix them. Fixing some of these bugs was hard. Entire modules had to be written, others thrown away, others endlessly tweaked. By the time my internship ended, the combined efforts of all these dozens of brilliant hard-working people had pushed our MPI to slightly above 1.

The main reason why there aren't that many self-driving car companies is because most software teams don't have the patience or diligence to take on a multi-decade bugfixing marathon. It's much more satisfying to work far away from the real world and chase music of spheres perfection. At Ivo, we're grateful that our domain isn't as long-tail as a self-driving car. Still, in a domain where it's easy to build a demo-ready solution, our willingness to spend months and years hammering down the tail is our biggest competitive advantage.

Our threshold for what qualifies as a "bug" is very low. Basically, any time our software does something worse than a human, it's a bug - even if it hasn't done anything technically "incorrect". For example, see these three redlines.

After the redlines have been applied and the changes have been accepted, all of these diffs result in the exact same text in the document. They all do the same thing, so in a sense, they're all "correct." But the leftmost diff is unnatural. It's noisy and hard to understand. Moving from the leftmost screenshot to the middle screenshot took us around a year and a half of iteration on a "diff coalescing algorithm," which we hammered down bug report after bug report. Our diff.js file is now over a thousand lines, and diff.test.js is even longer.

The middle screenshot still has a problem though - the words "if Metadata becomes aware of a" are redundantly removed and reinserted. This is a particularly tricky case because the string "if Metadata becomes aware of a" constitutes the suffix of the deletion and the prefix of the insertion, so even though the redundancy is obvious to a human, it's hard for a computer to efficiently detect. This bug was assigned to our newest full-time engineer, and he was able to figure out a clever way to modify the KMP algorithm to detect the redundancy in linear time and coalesce it into the final result on the right.

The reason that Ivo has the best redlining on the market has nothing to do with AI, and everything to do with an extraordinarily talented engineering team who is willing to spend years diligently hammering down the long tail.

Do you want to build something great too? Join us!