Cheating as a programming discipline

16 Jun, 2026 · 15 min read

Contents

Great programmers cheat. A hard problem gets quietly swapped for an easier one; a transaction-grade database is replaced by a flat file nobody misses; machinery everyone else considers mandatory simply never gets built. They know a lot — and that’s exactly why they get away with it.

Cheating as a programming discipline

The title is deliberate hyperbole: nothing here is fraud. The cheating I mean is a cheaper substitute nobody notices — exploiting what we know to skip work the situation doesn’t actually require. That knowledge isn’t arcane, either. Every shortcut of this kind comes from consulting an input that sits in plain sight, the one most of us walk straight past. Read those inputs, and we can skip work nobody else dares to skip, and be right.

Our work usually starts from business requirements, and requirements come with leeway. One part of it every developer knows and exercises: we choose the stack, pick or recommend providers and dependencies, sometimes the language itself. The rest of the leeway is routinely forgotten — and it is the point of this post. A handful of inputs unlocks it, and none of them is exotic:

the shape of the problem — what it actually requires, and what it never will;
the precision the answer truly needs (almost never “exact”);
the machine and its real limits;
the human on the other side — their physiology and their psychology;
and the plain numbers, the ones a back-of-the-envelope sketch produces in a minute.

Consulting them feels like cheating because the payoff is so lopsided: a moment’s thought, and a week of work disappears. It isn’t cheating — Larry Wall lists laziness among a programmer’s three great virtues, and this is the kind he meant: laziness earned by knowledge, not by indifference. It is the difference between engineering a thing and merely assembling one.

Most developers never reach for these inputs — not because they can’t, but because there’s an easier reflex: reach for more machinery. A bigger database, another cluster, a service that promises to handle it. “It works, doesn’t it?” usually does, at a price nobody measured. That reflex earns its own post; here I only want to lay out the inputs it skips.

Let’s start with the simplest source: the problem itself.

Cut corners

Cutting a corner is easy. Knowing which corner is safe to cut — that’s the whole skill.

I once needed a B+ tree as an external index for an application. Searching it was easy, and building it was easy too — but deleting nodes was a nightmare: garbage-collecting empty nodes, randomized access wrecking performance, all of it.

Then I noticed something about the problem: the index was static. I built it once, at some expense, and after that I only ever read from it. Deletion solved a situation I would never be in. So I cheated — I didn’t implement it at all. Half the hard part of a B+ tree gone, not because I was clever but because I knew my problem.

That move has a name: YAGNI , You Aren’t Gonna Need It. The cheat is exploiting an invariant only someone who knows the problem can see — here, this index is read-only. The same logic retires whole tools: a relational database can delete and update and join all day, and if the problem needs none of that, all that machinery is dead weight.

The danger is cutting a corner we don’t actually own. I could skip deletion because I controlled the application. Two situations change the math:

We don’t know how it will be used. Written as a library for strangers, that B+ tree can’t just drop deletion — we have no idea who will need it. That calls for a different way to decide what stays in and what comes out — a whole post of its own (a forthcoming series on API design).
It has to live a long time. Built for years of maintenance, the specs will shift. We still don’t implement what we don’t need today — but we design so that adding it later is cheap instead of a rewrite.

One more degree of freedom hides a level up: the requirements themselves. They are not gospel — they were written by people, and people can be asked. We can go back and clarify. We can warn when a requirement veers needlessly into the technical side (“I want a Visual Basic program” — really? Why not C++?). We can re-negotiate a feature that is hard or expensive to build and maintain: done differently, it may cover 97% of the actual need much faster and much cheaper — a trade most clients take gladly once it’s offered. We can propose a different implementation order, or a different overall structure. The problem we consult is softer than the document it arrived in.

So: cut what the problem allows — once we are sure the problem is really ours to cut.

Approximate

The second input is the precision the answer actually needs — and the honest requirement is almost never “exact”. In many cases an approximate answer now beats an exact answer at great expense.

Examples:

A calculator doesn’t compute transcendental functions to infinite precision. Why would it, when there are only 10 digits to show?
Google doesn’t search the entire internet in a split second, and the trick isn’t caching. Searching for the same phrase twice, I sometimes get different results. Did the internet just change? Nope — the set of machines that answered did. A query fans out to thousands of index shards, and to hold latency the root drops the stragglers and assembles the page from whoever made the deadline — Dean and Barroso lay out the machinery in “The Tail at Scale” . Approximate by design, useful all the same.
Even the “About 4,310,000 results” line is an estimate, not a count. Exact distinct-counting at that scale costs more than the number is worth; the cultivated tool for the job is HyperLogLog , which delivers the answer within a couple of percent for a few kilobytes of memory.

While building a dashboard, my team realized we could easily estimate a certain KPI from secondary data, but tallying the primary data was hard. So we showed the estimate, clearly marked as one — “~75%” instead of “76.123456789%” — and ran the exact calculation only on demand.

One common form of approximation is time-based. In complex cases we can take a timed snapshot plus a velocity and do a linear approximation — or something fancier, depending on the task. In the simplest case we just say “76.12% as of 9:00 AM today”. Which brings us to:

Caching

A cache is approximation along the time axis: we serve an answer that was exact a moment ago and declare it good enough now. The contract is a precision budget — “no more than 15 minutes stale” — agreed with the business like any other requirement, and inside that window we answer at memory speed. (If someone pays for a fresh recount anyway, we fold the result back into the cache and the clock restarts.) Scale the same cheat out to a fleet of replicas and it has an industry name: eventual consistency — everyone converges on the truth, just not yet.

This is also the first cheat that bites back. Phil Karlton’s line — “there are only two hard things in Computer Science: cache invalidation and naming things” — points exactly here: staleness already served can’t be recalled. The page cached at 9:00 keeps telling its truth until it expires, no matter what changed at 9:01. We chose the window; we own everything that happens inside it.

Know the machine

The third input is the machine — not the abstract one from the textbook, the real one, with published limits.

Several cheats above were quietly co-authored by it. Caching works because the machine’s memory is layered — registers, caches, RAM, disk, network — each layer markedly slower than the one above it, so moving an answer one layer up is pure profit. The B+ tree from “Cut corners” was shaped by it: disks reward sequential reads and punish random ones, hence the fat nodes and the shallow tree. And a float64 carries 15–17 significant digits — a result claiming more precision than that is fiction, no matter how long we computed it.

None of this is secret knowledge. The latency table every programmer is told to know fits on an index card. The cheat is consulting it at design time — instead of meeting it later, in a profiler.

Use physiology

The next input lives on the other side of the screen: the human body. People are slow at producing input and slow at consuming output — and their speeds are published numbers we can design to.

Debounce

A good example is debouncing. John Hann (AKA @unscriptable on X) coined the term back in 2009, and it’s a simple idea: when a user types their request, there’s no need to bother the server until they finish. In olden times, “finish” meant an actual “Submit” button or something to that effect. These days we wait until the user stops typing before showing autosuggest or results. How do we know they’ve finished? When a pause exceeds their typing rhythm by some margin — 200–500 ms is a typical delay.

Can we trigger faster? We could — and we’d fire mid-word, burning a request on a half-typed token that the very next keystroke invalidates. The delay isn’t timidity; it’s tuned. Here’s the arithmetic it’s tuned to.

How fast do people type? The internet tells us an average adult types 40–52 words per minute (WPM), a professional manages 60–90 WPM, and competitive typists exceed 120 WPM. Barbara Blackburn reportedly reached 170 WPM and sustained an average of 145 WPM for 55 minutes straight.

But people don’t type English word by word — they type letters. An English word averages 4.7 characters, which puts an average adult at 188–244 characters per minute, or 3–4 characters per second — roughly 250–333 ms between keystrokes. The typical debounce isn’t an arbitrary number; it’s our physiology, converted to milliseconds.

Which humans?

One catch: those are population numbers. The average adult sits at 250–333 ms between keystrokes; a professional data-entry operator runs closer to 150 ms. Feed the same arithmetic a different population and it returns a different debounce — tune to the people actually typing, not to the average of everyone with hands. If we onboarded Martians as customers tomorrow, we’d measure how fast they type and set a different delay for them.

I learned how real that is on a banking app, where complaints arrived that the forms’ tab order wasn’t preserved. I could see the complainers from my desk: data-entry specialists. They never touched the mouse — they leaned on focus-on-load and tab order to enter data faster than I could follow. And they were not who the old CLI-versus-GUI folklore would predict: not bearded terminal dwellers, just quiet professionals paid for speed. Asked why keyboard-only, they showed me: a hand traveling from keys to mouse costs time, aiming the pointer costs more (Fitts’s law prices every reach-and-aim), and one hand on each doesn’t work at speed. They knew exactly what they wanted.

And the deepest cost of the mouse trip wasn’t travel time at all: every instrument switch broke their rhythm and dropped them out of flow. That cost isn’t physiological. It belongs to the next input.

Use psychology

Its companion input is the human mind: what people perceive, attend to, and can hold at once. A system’s perceived speed is negotiable separately from its actual speed — and perception is what the user walks away with.

When a screen loads, the user orients first: finds the table, reads the headline, decides what matters — and placement, color, and type steer where the eye goes first. Orientation takes time, and that time can be spent: surface the most important information first (likely an approximation) and fetch the rest while they’re busy with it. The system didn’t get faster, but it’s perceived that way. That’s how many dashboards work: give the user something to chew on, deliver the rest a beat later. To be clear, this argues for emphasis, not obstacles — the system should stay as easy to use as possible; the cheat is in the ordering, never in artificial hurdles.

Where things sit is part of the same input. Data needed at the same time should sit together — the Gestalt law of proximity , if we want the cultivated name — and our habits here are inherited from paper. A total under a long table is a print-era norm: on paper there was nowhere else to put it. On screen the total most probably belongs up top, seen immediately, with the table kept below for checking details — and since that headline number is likely an approximation anyway, it’s ready before the table is.

The ceiling we’re leaning on has a name: Miller’s Law — “The Magical Number Seven, Plus or Minus Two” (George Miller, 1956). Working memory holds only a handful of things at once — Miller said seven; later work (Cowan ) puts it nearer four. The ceiling plays two roles for us. It caps presentation: past the handful, extra data on screen produces mistakes instead of insight, so trimming a screen to what matters costs the user nothing. And it licenses delay: whatever didn’t fit the handful can arrive late, because people rarely miss what they didn’t immediately need. The delay is itself the cheat — and it buys the system time to work.

One more property of the same mind: it trains. People automate the screens they use — after enough repetitions the hand finds the button before the eyes do, and that automaticity is free performance the system did nothing to earn. It’s also fragile. Redo a screen and the training is void: even a version better in every respect disorients everyone accustomed to the old one, and their performance drops until the new layout becomes mechanical again. (The data-entry pros above complained the moment a familiar tab order stopped being preserved.) So the general placement decisions — layout, color, type — are worth getting right early and moving last: of everything we’re free to redo, they should probably be the most conservative.

Long operations

Some operations are just slow. What breaks users isn’t the wait — it’s the silence around it. Faced with a quiet screen, many feel a subconscious fear: the system is down, it hung, the network dropped my request. They start refreshing and clicking everything — and after a reasonable 2-second wait, some honestly report “your system is slow — I waited forever”.

The thresholds here are measured, not folklore. Nielsen’s response-time limits : about 0.1 s reads as instantaneous; about 1 s is a noticeable hiccup that still preserves the flow of thought; by about 10 s attention is gone. The Doherty threshold puts the point where productivity takes off near 400 ms. Anything slower owes the user a signal:

Acknowledge the request — even a simple animated spinner works. (Users know it isn’t wired to the server. It reassures them anyway.)
Ideally, estimate the progress: seconds, a percentage, units processed so far.
- Update it every ~500 ms — see physiology above; 1–2 s still reads as alive.
- A changing status also occupies the user: unlike a spinner, it takes a moment to read — and that moment is bought time.

A mistake I see over and over: a web application performs I/O with no indication anything is happening. The usual excuse is that nobody can say in advance which operation will be slow — and there’s a time axis that changes the answer under our feet.

A company started with ~1k clients, and everything was flying. At ~10M clients the picture inverted: the most frequent users — the most valuable customers — had accumulated the most transactions, and precisely because of that volume they were hit hardest by the slowdowns. Resources went where they had to go, into the backend — but the immediate relief came from the frontend: fade in a spinner whenever an I/O operation runs past some threshold, say 750 ms. Showing it on fast operations irritates people (“all that blinking!”), and flipping it on and off abruptly irritates them too — so fade in, and fade out faster than in: we’re back in business. The system was no faster than the day before — but now it talked while it worked.

Run the numbers

The last input has been hiding inside every section above. The debounce came from multiplying words-per-minute into milliseconds. The cache window is a staleness budget someone had to put a number on. Google’s stragglers are dropped against a latency budget. None of these cheats started from courage — they started from arithmetic short enough to fit on an envelope, a minute of multiplication that says whether the corner is safe to cut.

That arithmetic is the discriminator running through the whole post: cheap because measured, never cheap because nobody checked. The unmeasured version of every move above has a different name — negligence. Back-of-the-envelope estimation is a discipline of its own, and it will get a post of its own.

Summary

Cheating well isn’t doing less work — it’s knowing enough to skip the work that doesn’t matter. The knowledge comes from six inputs sitting in plain sight: the problem’s real shape, the precision actually required, the machine’s real limits, the body’s speeds, the mind’s ceilings, and the numbers that license all of the above. We cut the corners the problem allows, approximate where exactness is wasted, and spend what we save where users actually feel it — while checking who those users really are, because they’re rarely who we assume. Done right, that’s not a shortcut around competence — it is competence.

Cheating as a programming discipline

Cut corners

Approximate

Caching

Know the machine

Use physiology

Debounce

Which humans?

Use psychology

Long operations

Run the numbers

Summary

Popular tags