The Importance of Fast Computation

In 2015, I was hanging out at our cowork space and the concentrated face of one of the folks there caught my attention. Hesitant to break his focus, but extremely curious and thinking I could help, I asked “What are you doing?”. “I’m baffled of how slow starting nodejs is.” he replied. “I can’t make it load in less than 100ms, and I’m pretty sure I’m off by one order of magnitude of what could be”.

I’ve always valued strong engineering, and these kinds of delays bother me too. But watching someone obsess over shaving off 90 milliseconds from the startup time of a small program that I used every day made me revalue the importance of the speed of every program, every function, and how much interoperability between systems take. That conversation stuck with me a lot.

This kind of obsession with performance is how the most famous backdoor of this decade was found. Andres Freund, a Microsoft engineer, noticed that SSH logins were taking 500ms longer than expected. Most people would have ignored it. But he dug in, and he a sophisticated supply chain attack embedded in the xz compression library that had been years in the making. The attacker had spent years building trust, contributing patches, and slowly maneuvering into a position to inject malicious code. All of it was undone because one person thought “this is too slow.”

Now working on a personal AI assistant, I’m obsessed with having a powerful, fast LLM. The difference between 10 tokens per second, 100 tokens per second, and 1000 tokens per second is brutal. At 10 tokens/sec, you’re waiting 10 seconds for a single paragraph. At 100, you get it in a second. At 1000, you can have the model iterate on its own output multiple times before you even notice the delay. It stops being a quantitative difference, becomes qualitative.

Look for yourself how different the experience is.

There’s a famous story about how Ken Thompson would go get coffee while waiting for his code to compile. Fast-forward to today: developers with sub-second hot reload times think differently than developers waiting 30 seconds for a build. The former experiment freely, try wild ideas, iterate rapidly. The latter batch their changes, think harder before pressing enter, and lose their train of thought. The tooling shapes the mind.

And it saves lives. In Walter Isaacson’s biography of Steve Jobs, there’s a scene where Jobs confronts engineer Larry Kenyon about the Macintosh boot time. Jobs did the math on a whiteboard: if five million people were using the Mac, and it took 10 seconds longer to boot than necessary, that added up to 300 million hours wasted per year. “That’s the equivalent of at least 100 lifetimes a year,” Jobs told him. Kenyon came back a few weeks later having cut the boot time by 28 seconds.

In our human lives, running faster means spending more energy. With compute, that’s not the case, something that could be counterintuitive. Faster computation often means less energy consumed overall. A processor that finishes a task quickly can return to a low-power state sooner. A server that handles requests in 10ms instead of 100ms can serve 10x the load on the same hardware, or serve the same load on 1/10th the hardware.

Make your code as fast as possible.