the universe organizes itself
there's a quiet assumption running through almost every conversation about intelligence: that self-organization—the spontaneous emergence of order from chaos—is fundamentally a biological phenomenon. cells divide and differentiate. bacteria form colonies. brains wire themselves. and so, when we build artificial intelligence, we look to biology for inspiration.
but this assumption is wrong. or at least, dramatically incomplete.
self-organization isn't a biological invention. it's a cosmic principle. long before the first cell emerged, the universe was already organizing itself—stars condensing from gas clouds, galaxies spiraling into form, planets settling into stable orbits. the same patterns appear at every scale, from the distribution of matter across the cosmos to the branching of rivers and the folding of proteins.
in beyond the brain, we argued that AI shouldn't limit itself to mimicking neurons. here, we take that argument further. if self-organization is universal—if it operates in galaxies and cells and algorithms alike—then we can tap into it directly. we don't need to copy biology. we need to understand the mathematics that biology itself discovered.
this post traces self-organization across three domains: cosmic, biological, and computational. the goal isn't just to show parallels—it's to reveal the shared mathematics that underlies them all. by the end, you'll see how a neural classifier organizing decision boundaries is doing exactly what stars do when they organize into galaxies.
the first theory: vortices all the way down
let's start with a forgotten moment in the history of science. before newton, before gravity as we know it, rené descartes proposed a radical theory of cosmic organization: the universe is filled with invisible vortices.
in descartes' vision, space itself wasn't empty—it was a plenum, filled with swirling matter. the sun sat at the center of a giant vortex, and the planets were carried along in its flow like leaves in a whirlpool. each planet, in turn, generated its own smaller vortex, explaining the orbits of moons. it was vortices all the way down.
newton eventually replaced this with his inverse-square law of gravity—a much cleaner mathematical description that could actually predict planetary positions. but here's what's remarkable: descartes' intuition wasn't entirely wrong. he was reaching for a deep truth that his mathematics couldn't yet express.
that truth? sources create fields, and fields organize motion.
the sun doesn't reach out and grab the earth. instead, it creates a field around itself—a region where the rules of motion are different. objects entering that field are organized by it, pulled into orbits, arranged into stable configurations. descartes' vortices were a mechanical intuition for something that would later be expressed precisely as gravitational fields.
watch the visualization above. when you add multiple vortex centers, something interesting happens: the space gets partitioned. each center claims a territory—a region where its influence dominates. particles in that region orbit that center, not the others.
this is our first glimpse of self-organization. no one designed these territories. no central planner decided which particles go where. the partition emerges from the interactions of the fields. stronger sources claim larger territories. weaker sources get pushed to the margins.
hold this image in your mind. we'll return to it when we discuss classifiers.
from vortices to geometry
descartes' vortices gave us the right intuition, but newton gave us the right math. his law of universal gravitation:
$$F = G \frac{m_1 m_2}{r^2}$$
the inverse-square relationship. this simple law explains planetary orbits with remarkable precision—and it contains the seed of self-organization. the $1/r^2$ means nearby objects interact strongly, distant objects weakly. locality.
but einstein went deeper. in 1915, he showed that gravity isn't a force at all—it's geometry. mass doesn't pull on other masses through some mysterious action-at-a-distance. instead, mass curves spacetime, and objects simply follow the straightest possible paths through that curved geometry.
this is a profound shift. in newton's universe, space is a stage and gravity is an actor. in einstein's universe, space itself is shaped by the actors. the stage responds to its performers.
look at the visualization above. it shows gravitational lensing—one of einstein's most striking predictions. the deflection angle $\theta$ of light passing near a mass follows:
$$\theta = \frac{4GM}{rc^2}$$
again, the inverse relationship with $r$. closer to the mass, stronger bending. this formula—confirmed by eddington's 1919 eclipse expedition—was one of general relativity's first triumphs. today, astronomers use lensing to map dark matter and study galaxies billions of light-years away.
the connection to descartes' vortices is now complete. both describe the same phenomenon: sources curve the space around them, and that curvature organizes motion. descartes imagined mechanical whirlpools. einstein revealed the true mechanism: geometry itself bending in response to mass.
and here's the key insight for our purposes: this organization is local. each mass only affects its immediate neighborhood. there's no cosmic blueprint, no master plan. yet from these local interactions, the entire large-scale structure of the universe emerges—galaxies, clusters, the cosmic web spanning billions of light-years.
biology: a latecomer to the game
now we turn to biology—but with fresh eyes. the cosmos was self-organizing for 10 billion years before the first cell appeared. biology didn't invent self-organization. it discovered it, through evolution, and exploited it with extraordinary creativity.
the clearest example comes from alan turing—yes, the same turing who invented the computer. in 1952, he published a paper called "The Chemical Basis of Morphogenesis" that explained how complex patterns could emerge from simple chemical reactions.
turing's recipe is elegant. he proposed a system of two coupled partial differential equations:
$$\frac{\partial u}{\partial t} = D_u \nabla^2 u + f(u, v)$$ $$\frac{\partial v}{\partial t} = D_v \nabla^2 v + g(u, v)$$
here $u$ is the activator concentration, $v$ is the inhibitor. $D_u$ and $D_v$ are diffusion rates. the key: $D_v > D_u$—the inhibitor diffuses faster. the functions $f$ and $g$ describe the local reactions. when these dynamics play out spatially, patterns spontaneously emerge—spots, stripes, spirals.
watch the simulation above. these patterns aren't pre-programmed. no gene specifies where each spot goes. the pattern emerges from simple local interactions—just like planetary orbits emerge from local gravitational effects.
the parallels run deep:
biological self-organization
• cells communicate through local chemical signals
• patterns emerge from reaction-diffusion dynamics
• no central controller—distributed and robust
• inverse-square-like decay of signal strength
cosmic self-organization
• masses communicate through local field curvature
• structures emerge from gravitational dynamics
• no central controller—distributed and robust
• inverse-square decay of gravitational force
the same mathematical structure underlies both. local interactions with inverse-square (or similar) falloff. competition between forces. positive and negative feedback loops. from these ingredients, complexity self-organizes at every scale.
there's another beautiful example from biology: bacterial chemotaxis. bacteria like E. coli can't see—they're too small for eyes. yet they navigate toward food with remarkable precision. how? through a simple algorithm called run-and-tumble.
the bacterium alternates between two behaviors: running (swimming straight) and tumbling (randomly reorienting). the clever part is how it modulates these. the tumble rate $\lambda$ follows:
$$\lambda = \lambda_0 \cdot e^{-\alpha \cdot \frac{dc}{dt}}$$
here $dc/dt$ is the rate of change of chemical concentration. when the bacterium swims up a gradient (toward food), $dc/dt > 0$, so the exponential term decreases $\lambda$—fewer tumbles, longer runs toward food. when swimming away from food, $dc/dt < 0$, tumble rate increases, reorienting the bacterium.
no centralized controller tells the bacteria where to go. each organism makes local decisions based on gradient sensing. yet collectively, they partition the space, clustering around food sources—self-organization emerging from simple local rules.
the parallel is striking. biology and cosmology use different substrates—chemicals vs. spacetime—but the underlying logic is the same: local interactions, mediated by diffusing fields, produce global organization.
can algorithms self-organize?
so here we are. we've watched galaxies spiral into form through curved spacetime. we've watched zebra stripes emerge from two diffusing chemicals. we've watched bacteria find food using nothing but a random walk modulated by gradients.
the same story, told three times: local interactions produce global order. no central planner. no blueprint. just elements responding to their neighbors, and complexity crystallizing from the interactions.
now comes the question that matters most for AI: can we make this happen in silicon?
the mainstream answer, implicit in most of machine learning, is "no—at least not directly." if you want self-organization, you need to simulate biology. neurons, synapses, spikes, dendrites. backpropagation is tolerated because it works, but it's treated as a necessary evil—"unnatural," "biologically implausible." the real goal, we're told, is to make our algorithms more brain-like. then self-organization will follow.
but this gets the causality backwards.
the brain doesn't self-organize because it's biological. it self-organizes because it implements the right mathematical dynamics—the same dynamics we saw in galaxies and chemical gradients. biology is one substrate that supports these dynamics. silicon can be another. we don't need to copy the wetware. we need to copy the math.
what are the essential ingredients? strip away the biological details and you find three universal requirements:
- local interactions—each element affects only its neighbors, not distant strangers
- competition—elements vie for limited resources, territory, or activation
- feedback—the current state shapes future dynamics, creating memory
that's the recipe. you don't need proteins or ion channels. you need locality, competition, and feedback. give an algorithm these three properties, and self-organization becomes not just possible but inevitable.
the most famous proof is also the simplest: conway's game of life. a cellular automaton with rules so minimal they fit on a napkin. let $s_{i,j}^t \in \{0, 1\}$ be the state of cell $(i,j)$ at time $t$, and let $N_{i,j}^t$ count its living neighbors:
$$s_{i,j}^{t+1} = \begin{cases} 1 & \text{if } N_{i,j}^t = 3 \\ s_{i,j}^t & \text{if } N_{i,j}^t = 2 \\ 0 & \text{otherwise} \end{cases}$$
birth when surrounded by exactly three neighbors. survival with two or three. death otherwise. that's the entire specification.
from these trivial rules, watch what emerges: oscillators that pulse forever, gliders that travel across the grid, even structures that compute. the game of life is turing-complete—it can simulate any computer. all from two rules about counting neighbors.
if two rules can generate universal computation, imagine what carefully designed learning algorithms can achieve. let's see it in action.
proof of concept: kohonen's self-organizing maps
in 1982, teuvo kohonen introduced the self-organizing map (SOM)— a neural network that organizes itself through competition and cooperation. it's a perfect demonstration that algorithms can self-organize just like galaxies and zebra stripes.
the setup is simple: you have a grid of neurons, each with a weight vector $\mathbf{w}_i \in \mathbb{R}^n$. when you present a data point $\mathbf{x}$, the neurons compete. the winner is:
$$c = \arg\min_i \|\mathbf{x} - \mathbf{w}_i\|$$
the neuron closest to the input wins. then all neurons update:
$$\mathbf{w}_i \leftarrow \mathbf{w}_i + \eta \cdot h(i, c) \cdot (\mathbf{x} - \mathbf{w}_i)$$
where $h(i, c)$ is the neighborhood function—large for neurons near the winner, small for distant neurons. typically a Gaussian $h(i,c) = \exp(-d^2/2\sigma^2)$. the learning rate $\eta$ and neighborhood width $\sigma$ shrink over time.
the key ingredient is the neighborhood function. it starts wide—when one neuron wins, many neighbors update too. over time, the neighborhood shrinks. early in training, the network organizes globally. later, it fine-tunes locally.
sound familiar? it should. this is exactly the pattern we saw in cosmic and biological self-organization:
- local interactions—neurons update based on neighborhood proximity
- competition—winner-take-all dynamics
- feedback—the network's state determines future updates
watch the grid unfold in the visualization above. it starts as a tangled cluster at the center. as training progresses, it stretches and bends to cover the data distribution. the network learns the topology of the data—nearby neurons represent nearby regions of data space.
this is self-organization in pure computation. no biological simulation. no cellular machinery. just the right mathematical rules applied iteratively.
full circle: the classifier as vortex field
the SOM shows that algorithms can self-organize. but it's still an unsupervised method—it discovers structure, but doesn't classify. can the same principles apply to classification?
now we return to where we started: descartes' vortex fields. remember how each star created a region of influence, and together they partitioned space? a classifier with the Yat metric does exactly this. each neuron (class prototype) creates a "vortex" in representation space. decision boundaries emerge from the interaction of these vortices—exactly like the territories between planetary centers.
the Yat metric, which we explored in depth in the duality of information, combines alignment and distance into a single measure:
$$\text{Yat}(\mathbf{x}, \mathbf{w}) = \frac{(\mathbf{x} \cdot \mathbf{w})^2}{\|\mathbf{x} - \mathbf{w}\|^2}$$
look at the structure. the denominator has $d^2$—distance squared. this is the same inverse-square relationship we saw in newton's gravity and einstein's spacetime curvature. neurons curve representation space in exactly the way masses curve physical space.
the analogy is now complete. toggle the "DOMINANCE" view on the Descartes visualization and compare it to the YAT classifier. the patterns are strikingly similar: curved regions of influence, boundaries that bend based on source strength, space partitioned by competing fields.
this isn't metaphor—it's mathematics. the same equations that govern planetary vortices govern the flow of classification gradients. biology didn't invent this geometry. the universe did. we're just learning to compute with it.
training dynamics: trajectories in weight space
if neurons are like masses curving space, what happens during training? the weights move through high-dimensional space, following gradient descent toward optimal configurations.
in dynamical systems theory, we study these trajectories using concepts like phase portraits, attractors, and basins of attraction. each training run traces a path through weight space. good configurations are fixed points—attractors that capture nearby trajectories.
the visualization shows neurons starting from a random cluster, then gradually separating as they find their optimal positions. several forces are at play:
- attraction to data—neurons move toward the patterns they classify
- repulsion from competitors—neurons push away from each other (like the orthogonality pressure we discussed in The Meaning of Non-Linearity)
- momentum—past gradients influence current motion
this is orbital mechanics in miniature. neurons finding stable orbits in weight space. the mathematics of gravitational systems—Hamiltonian dynamics, Lyapunov stability, chaos theory—applies directly to understanding neural training.
the unity of organization
let's tie it all together. we've traced self-organization across three domains:
- cosmic: vortices, curved spacetime, gravitational organization
- biological: reaction-diffusion, morphogenesis, neural wiring
- computational: SOMs, Yat classifiers, training dynamics
the patterns are strikingly similar. local interactions. competition for territory. emergent global structure. inverse-square dynamics. stable attractors.
this isn't coincidence. these are the universal signatures of self-organization. they emerge whenever you have interacting elements in a space with feedback dynamics. biology discovered them through evolution. physics describes them through equations. computation can exploit them through the right algorithms.
the implication for AI is profound. we don't need to simulate neurons to achieve intelligent organization. we need to understand the mathematics of organization itself. the brain is one implementation. the cosmos is another. and our algorithms can be a third.
this is the foundation of physics-inspired AI. not brain-inspired. universe-inspired. because the patterns that make intelligence possible aren't biological accidents—they're cosmic necessities. and any system that wants to understand the world must learn to organize itself the way the world does.