When I first started learning to program, it was in BASIC on my old 386. No frameworks, no libraries — just line numbers, GOTO
, and me, trying to make something happen on the screen.
I taught myself to make extremely rudimentary computer graphics. I wasn’t drawing circles or sprites — I was manipulating individual pixels. Literally placing and moving little coloured dots around on a black screen. It felt like magic. Primitive, yes — but still magic.
Back then, I had a theory.
A computer screen (or a printed page) is just pixels. Pixels in a fixed grid, each with a limited set of colour values. But the combinations — the permutations — are practically infinite. Every image you’ve ever seen, and every image that could ever be, is just one permutation of coloured squares.
So I imagined building a program that simply randomised pixels. No input, no prompt. Just an engine churning out pixel combinations.
If I let it run long enough, I reasoned, it would eventually output:
- A perfect photo of me shaking hands with the Queen (despite that never happening)
- A frame-by-frame recording of the Big Bang
- Or a never-before-seen Renaissance masterpiece, signed and dated
Anything you could imagine — and far more that you couldn’t — would emerge. Given enough time, every image must appear.
That was the theory, anyway.
Fast forward to today, and I can’t help but see the echo of this idea in diffusion models — the AI systems that now generate images from noise.
These models don’t throw pixels around randomly, of course. They’re trained on billions of images, gradually learning how to de-noise static into recognisable patterns. But at their core, they still rely on navigating a vast space of possible pixel combinations, nudged by probability and learned associations.
Where my naive image randomiser would take trillions of years to stumble across anything coherent, today’s models can conjure photorealistic scenes, fake memories, or imaginary places in seconds.
What changed? We taught the machine to guess which pixel patterns matter.
It’s funny — I used to dream about building a pixel engine that could surface the impossible. In a way, we’ve built that now. But instead of waiting for randomness to do the work, we’ve given it direction, context, and taste.
And now, it’s not just possible to generate a photo of me shaking hands with the Queen — I could do it in 30 seconds, with perfect lighting, at Buckingham Palace, smiling just the right way.
It still feels like magic.
Only now, it’s programmable.