The Apollo 11 Guidance Computer Had a Four-Byte Bug. It Hid for 57 Years.
title: The Apollo 11 Guidance Computer Had a Four-Byte Bug. It Hid for 57 Years. published: true tags: programming, discuss, career, javascript cover_image: https://files.catbox.moe/1mk11s.png
The most reviewed code ever written had a four-byte bug. No bug detector found it. No static analyzer warned about it. No end-to-end test case triggered it. 57 years. Four lines.
The Apollo Guidance Computer source code has been public since 2003. Thousands of developers have read it. Academics published papers on its reliability. Emulators run it instruction by instruction.
The transcription was verified byte-for-byte against the original core rope dumps.
A team at JUXT just found a resource lock leak in the gyro control code that could have silently killed the guidance platform's ability to realign.
Four bytes. Two missing instructions.
The Lock That Nobody Released
Here's what happened. The AGC manages the spacecraft's Inertial Measurement Unit through a shared lock called LGYRO. When the computer needs to torque the gyroscopes to correct drift or perform a star alignment, it grabs the lock, does the work across three axes, and releases it when done.
ā Normal path: lock acquired, torque completes, lock released. Clean.
But there's a third path. "Caging" is an emergency measure where a physical clamp locks the gyroscope gimbals in place to protect them. The crew could trigger it with a guarded switch in the cockpit.
When caging interrupts a torque in progress, the code exits through a routine called BADEND. It cleans up every shared resource correctly. Except LGYRO.
Once that lock is stuck, every future gyro operation finds it held, sleeps waiting for a wake signal that never comes, and hangs. Fine alignment, drift compensation, manual torque. All dead.
No alarm. No error light. The DSKY display accepts inputs and does nothing. Everything else on the computer works fine.
Only gyro operations are silently bricked.
Behind the Moon, Alone
Now picture this. Michael Collins is orbiting alone in the Command Module while Armstrong and Aldrin walk on the Moon. Every two hours he disappears behind the Moon, completely cut off from Earth.
He runs a star-sighting alignment to keep the guidance platform pointing the right direction. If the platform drifts, his engine burn to get home fires the wrong way.
If Collins had accidentally bumped the cage switch during a torque, the first alignment would fail with a clear cause. He'd uncage the IMU and try again.
The second alignment would hang with no explanation.
His training said restart after unexplained failures. But commands were being accepted. Everything else worked. It would look like broken hardware, not a stuck software lock.
Behind the Moon, alone, no radio contact, with two astronauts on the surface waiting for a rendezvous burn that depends on a platform he can no longer align.
He never bumped that switch. The bug never fired. But it was there the whole time.
Why Nobody Found It
The reason nobody found it is actually the interesting part. The AGC's restart logic clears the lock as a side effect of full memory initialization. Any test that triggered a restart after the bug would see the system recover seamlessly.
The defensive coding that Hamilton's team built in actually hid the problem instead of eliminating it.
And the scrutiny was a particular kind of scrutiny. People read the code. People emulated the code. People verified the transcription.
Nobody wrote a formal specification that tracked every resource lifecycle across every code path.
What Actually Found It
That's what found it. The team used a behavioral specification tool called Allium to distill 130,000 lines of AGC assembly into 12,500 lines of specs. The spec models each shared resource as an entity with a lifecycle: acquired, held, released.
Then it checks whether every acquisition has a matching release on every path.
The normal completion path releases LGYRO. The cage-interrupted path through BADEND does not. Two missing instructions, four bytes, 57 years.
Your Code Has This Bug Too
Modern languages have tried to make this structurally impossible. Go has defer. Java has try-with-resources. Rust's ownership system turns lock leaks into compile-time errors.
But not all resources live inside a language runtime. Database connections, distributed locks, file handles in shell scripts, infrastructure teardown ordering.
Anywhere the programmer manually writes the cleanup, this exact bug is waiting.
The most reviewed code ever written, by one of the best engineering teams in history, had a resource leak hiding in an error path.
What's hiding in yours?