The preceding papers in this series established reasoning atoms as verifiable inference primitives (Paper I) and demonstrated that atoms self-assemble into knowledge graphs through energy-based bond formation (Paper II). Both frameworks, however, require an external orchestrator to sequence atomic activations during inference—a limitation that prevents truly autonomous reasoning. In this paper, we complete the Atomic Intelligence framework by defining a global energy functional over the space of all possible atomic activation configurations and showing that multi-step reasoning can be achieved through gradient descent in this energy landscape. We prove that the energy landscape for well-formed atomic systems is smooth, connected, and possesses a unique global minimum corresponding to the optimal reasoning chain for any given query. We introduce Atomic Gradient Reasoning (AGR), an algorithm that navigates the energy landscape to produce globally coherent reasoning chains without explicit chain-of-thought prompting. On a comprehensive evaluation spanning mathematical theorem proving, multi-hop scientific reasoning, strategic planning, and open-ended scientific discovery, AGR achieves state-of-the-art performance while providing full compositional verifiability, adaptive computational depth, and the ability to discover novel reasoning pathways not present in the training data.
1 Introduction
Intelligence, in its most general form, is the ability to select actions that maximize the probability of achieving a goal given incomplete information about the world. This selection process requires reasoning: the construction of inferential chains that connect available evidence to conclusions that inform action. In the atomic framework developed in Papers I and II, reasoning chains are molecular structures composed of verified atoms. The remaining question is: how should the system decide which atoms to activate, and in what order, to produce a reasoning chain that is globally coherent and optimally addresses a given query?
Current approaches to this sequencing problem fall into two categories. Prompting-based approaches (Wei et al., 2022; Yao et al., 2023) rely on external instructions to elicit step-by-step reasoning from a monolithic model, but provide no guarantees about the coherence or completeness of the resulting chain. Planning-based approaches (Hao et al., 2023) use explicit search algorithms to explore the space of possible reasoning steps, but suffer from combinatorial explosion as the search depth increases.
We propose a third approach grounded in energy-based modeling. Drawing on LeCun's (2022) vision of autonomous machine intelligence through energy minimization and on the classical physics of potential energy landscapes, we define a global energy functional over the space of all possible atomic activation configurations. The energy of a configuration measures its inferential coherence: configurations in which atoms are activated in a logically sound, causally consistent, and evidentially supported sequence have low energy, while incoherent or contradictory configurations have high energy. Reasoning then reduces to finding the minimum-energy configuration—a process that can be implemented through gradient descent in the energy landscape.
This approach has four key advantages. First, it produces globally coherent reasoning: because the energy functional is defined over the entire configuration, gradient descent naturally balances local inferential steps against global consistency constraints. Second, it is autonomous: the system requires only a query as input and produces a complete reasoning chain without step-by-step prompting. Third, it provides adaptive depth: the length of the reasoning chain is determined by the geometry of the energy landscape rather than a fixed budget, naturally allocating more computation to harder problems. Fourth, it enables creative reasoning: gradient descent can discover novel activation pathways that were not present in the training data, corresponding to genuinely new inferential strategies.
2 Foundations
We build on the atomic composition algebra of Paper I and the Joint Atomic Embedding (JAE) framework of Paper II. Recall that a reasoning atom a = (S, T, f, V, κ) is a minimal inferential unit with typed inputs and outputs, a verified inference function, and a computational cost. Atoms compose through bonds (>>, ⊗, ◊, μ, ξ) into molecules, and the JAE framework embeds atoms in a latent space Rd where bond formation is governed by an energy function.
The key construct introduced in Paper II is the configuration energy Econfig(G) defined over bond graphs G. This energy governs structural assembly: which atoms bond to which. In this paper, we extend the energy framework to govern dynamic reasoning: given a fixed molecular structure (assembled by the JAE), which atoms should be activated, and in what order, to answer a specific query?
3 The Atomic Energy Functional
Given a knowledge graph G = (A, B) with atom set A = {a1, ..., an} and bond set B, an activation configuration is a vector x ∈ [0, 1]n where xi represents the activation level of atom ai. The value xi = 0 indicates that atom ai is inactive (not participating in the current reasoning chain), xi = 1 indicates full activation, and intermediate values represent partial activation (the atom's output is weighted by xi in the reasoning chain).
The atomic energy functional E: [0,1]n × Q → R maps an activation configuration x and a query q to a scalar energy value. It is decomposed as:
Each term captures a distinct aspect of reasoning quality:
Relevance energy Erel(x, q) measures how well the activated atoms address the query. It is defined as the negative cosine similarity between the query embedding φ(q) and the weighted centroid of activated atom embeddings: Erel = −Σi xi · sim(φ(ai), φ(q)) / Σi xi. Low relevance energy indicates that the activated atoms are topically aligned with the query.
Coherence energy Ecoh(x) measures the internal logical consistency of the activated chain. For every pair of activated atoms connected by a sequential bond, it checks whether the output type of the predecessor matches the input type of the successor and whether the verification predicates are jointly satisfiable. Formally:
where τ is the type-compatibility indicator from Paper II and σ measures the joint satisfiability of the verification predicates. The coherence energy is zero when all activated pairs are type-compatible and jointly verifiable, and positive otherwise.
Cost energy Ecost(x) = λc · Σi xi · κi penalizes configurations with high total computational cost, encouraging the system to find efficient reasoning chains. The coefficient λc controls the trade-off between reasoning thoroughness and computational economy.
Regularization energy Ereg(x) = λs · ||x||1 + λe · H(x) encourages sparse activation (L1 norm) while maintaining sufficient entropy H(x) to avoid degenerate solutions where only a single atom is activated.
4 Landscape Geometry
The properties of the energy landscape determine the feasibility and quality of gradient-based reasoning. We analyze the landscape's geometric structure.
The atomic energy functional E(x, q) is continuously differentiable on the interior of [0,1]n for any query q, with Lipschitz-continuous gradients. The Lipschitz constant L is bounded by L ≤ 2(||Wrel||2 + ||Wcoh||2) + λc · κmax, where Wrel and Wcoh are the weight matrices of the relevance and coherence terms, and κmax is the maximum atomic cost.
Smoothness ensures that gradient descent is well-defined and converges at a predictable rate. The Lipschitz constant provides an explicit bound on the learning rate for guaranteed convergence.
For any two local minima x* and x** of E(·, q), there exists a continuous path γ: [0,1] → [0,1]n with γ(0) = x* and γ(1) = x** such that maxt E(γ(t), q) − min(E(x*, q), E(x**, q)) ≤ Δ, where the barrier height Δ is bounded by Δ ≤ C · dH(x*, x**) and dH is the Hamming distance between the binary roundings of the two configurations.
This connectivity result implies that the landscape has no isolated basins—every local minimum can be reached from every other through a path with bounded energy barriers. This is critical for ensuring that gradient descent does not become trapped in poor local minima far from the global optimum.
4.1 Critical Point Classification
We classify the critical points of the energy landscape (points where ∇E = 0) into three categories based on the eigenvalues of the Hessian matrix HE:
Stable minima (all eigenvalues positive): these correspond to coherent reasoning chains that represent complete, self-consistent answers to the query. The global minimum is the optimal reasoning chain; other local minima represent alternative valid reasoning strategies.
Saddle points (mixed eigenvalues): these correspond to partially formed reasoning chains that are stable in some dimensions (the completed portion of the chain) but unstable in others (atoms that could be added or removed to improve the chain). Saddle points serve as transition states between different reasoning strategies.
Unstable maxima (all eigenvalues negative): these correspond to maximally incoherent configurations and are never observed in practice, as the gradient always points away from them.
Under the conditions that (i) the atom library A is complete for the query domain (every necessary inferential step has a corresponding atom), (ii) the JAE embeddings satisfy an ε-separation condition (distinct reasoning strategies are embedded at distance ≥ ε), and (iii) the regularization coefficients satisfy λs > 0 and λe > 0, the energy functional E(·, q) has a unique global minimum for every query q.
The uniqueness of the global minimum guarantees that gradient descent converges to a single, well-defined reasoning chain for each query, rather than oscillating between multiple equally good alternatives. This does not preclude the existence of alternative valid reasoning strategies—these correspond to local minima that are near-optimal but not globally optimal.
5 Atomic Gradient Reasoning
We now present the core algorithm that leverages the energy landscape for autonomous reasoning.
5.1 The AGR Algorithm
Atomic Gradient Reasoning (AGR) takes a query q and a knowledge graph G as input and produces an activation configuration x* that minimizes E(x, q). The algorithm proceeds as follows:
Step 1: Initialization. Compute the query embedding φ(q) and initialize the activation configuration x(0) by setting xi(0) = σ(sim(φ(ai), φ(q))) where σ is the sigmoid function. This seeds the activation with atoms that are topically relevant to the query.
Step 2: Gradient descent. Iteratively update the activation configuration: x(t+1) = proj[0,1]n(x(t) − η · ∇xE(x(t), q)), where η is the learning rate and proj denotes projection onto the unit hypercube. The gradient ∇xE decomposes into four terms corresponding to the four components of the energy functional, each of which can be computed in O(|B|) time (linear in the number of bonds).
Step 3: Convergence. Iterate until ||x(t+1) − x(t)|| < δ for a tolerance parameter δ. By Theorem 4.1, convergence is guaranteed in O(L/δ2) iterations.
Step 4: Chain extraction. Extract the reasoning chain by identifying all atoms with xi* > θ (a threshold parameter, typically 0.5) and ordering them according to the bond structure of G. The resulting chain is a verified molecular structure whose correctness can be certified by evaluating the verification predicates of each constituent atom.
5.2 Adaptive Computational Depth
A remarkable property of AGR is that the computational depth of the reasoning chain—the number of atoms activated—adapts naturally to the difficulty of the query. This arises from the interplay between the relevance and cost terms in the energy functional.
For simple queries, the relevance energy reaches its minimum with a small number of highly relevant atoms, and the cost energy penalizes additional activations. The resulting chain is short and efficient. For complex queries, the relevance energy cannot be minimized with a small number of atoms (no single atom or small group of atoms is sufficient to address the query), so gradient descent activates a longer chain despite the higher cost energy. The optimal chain length emerges from the balance between these competing pressures.
We formalize this through the concept of reasoning depth:
The intrinsic reasoning depth of a query q with respect to a knowledge graph G is ρ(q, G) = ||x*(q)||0, the number of atoms activated at the global minimum of E(·, q). The depth is intrinsic in the sense that it is determined by the query and the knowledge graph, not by any external parameter.
Let χ(q) denote the information-theoretic complexity of query q (measured by the minimum description length of the optimal answer). Then ρ(q, G) = Θ(log χ(q)) for knowledge graphs G satisfying the small-world property established in Paper II. That is, the reasoning depth grows logarithmically with the complexity of the query.
This logarithmic scaling is a direct consequence of the small-world topology of self-assembled knowledge graphs (Paper II, Theorem 6.1): any atom can reach any other through O(log n) bonds, so even the most complex queries require only logarithmically many reasoning steps. This stands in sharp contrast to chain-of-thought reasoning in monolithic models, where the number of steps often scales linearly with query complexity.
6 Autonomous Chain Formation
The AGR algorithm produces reasoning chains autonomously, without any form of step-by-step prompting. We analyze the properties of these chains and compare them to prompted chains from monolithic models.
6.1 Chain Coherence
We define the coherence of a reasoning chain as the minimum verification confidence across all constituent atoms: Coh(chain) = mini ∈ chain Vi(si, fi(si)). For binary verification predicates, a coherent chain has Coh = 1 (every step is verified). AGR produces chains with Coh = 1 by construction, since the coherence energy term drives the system toward configurations where all verification predicates are satisfied.
6.2 Creative Reasoning
A surprising capability of AGR is creative reasoning: the discovery of novel activation pathways that were not present in the training data. This occurs when gradient descent follows a trajectory through the energy landscape that activates atoms in a combination not seen during JAE training. Because the energy functional is defined over the continuous activation space [0,1]n rather than over a discrete set of pre-defined chains, the system can interpolate between known reasoning strategies to discover new ones.
We quantify creative reasoning through the novelty index of a chain: the fraction of bonds in the chain that were not observed during JAE training. Across our experiments, AGR chains have an average novelty index of 0.23, indicating that roughly one in four bonds is novel. When we restrict evaluation to correctly answered queries that all baselines failed to answer, the novelty index rises to 0.61—demonstrating that creative bond formation is the mechanism by which AGR surpasses existing approaches.
7 Convergence Guarantees
For learning rate η ≤ 1/L (where L is the Lipschitz constant from Theorem 4.1), AGR converges to a point x* satisfying ||∇E(x*, q)|| ≤ ε in at most T = O(L · (E(x(0), q) − E*) / ε2) iterations, where E* is the global minimum energy.
In practice, convergence occurs in 15-80 iterations for typical queries, corresponding to 5-25ms of wall-clock time on standard GPU hardware. The dominant cost is the computation of ∇xE, which requires one forward pass through the bond graph—a linear-time operation in the number of bonds.
When AGR is initialized with the query-seeded initialization of Step 1 and the knowledge graph satisfies the conditions of Theorem 4.3, the algorithm converges to the global minimum with probability at least 1 − n · exp(−ε2 / (2σ2)), where σ2 is the variance of the JAE embeddings. For typical embedding dimensions d ≥ 256, this probability exceeds 0.999.
8 Large-Scale Experiments
We evaluate AGR on four challenging reasoning domains that test different aspects of autonomous reasoning.
8.1 Mathematical Theorem Proving
We evaluate on the miniF2F benchmark (Zheng et al., 2022) of formal mathematical theorem proving. AGR is provided with a library of 2,847 mathematical reasoning atoms covering algebraic manipulation, logical inference, set-theoretic operations, and proof strategies. The self-assembly process (Paper II) constructs a mathematical knowledge graph of 12,400 bonds.
| Method | miniF2F-valid | miniF2F-test | Avg. Steps | Verified |
|---|---|---|---|---|
| GPT-4 + CoT | 33.6% | 29.6% | 8.2 | 0% |
| AlphaProof | 41.2% | 37.8% | 142 | 100% |
| Manual Atomic | 38.4% | 34.1% | 12.6 | 100% |
| AGR (Ours) | 44.7% | 41.3% | 18.4 | 100% |
AGR achieves state-of-the-art results on miniF2F, surpassing both prompted monolithic models and manual atomic compositions. Notably, AGR discovers 12 novel proof strategies not present in its training data, contributing to its 3.5-point advantage over AlphaProof on the test set.
8.2 Multi-Hop Scientific Reasoning
We evaluate on a curated benchmark of 500 multi-hop scientific reasoning questions spanning physics, chemistry, biology, and earth science. Each question requires synthesizing information from 3-8 distinct scientific principles.
| Method | Accuracy | Avg. Hops | Coherence |
|---|---|---|---|
| GPT-4 + CoT | 72.4% | 4.1 | 0.68 |
| Self-Assembled Atomic | 81.2% | 5.8 | 1.00 |
| AGR (Ours) | 87.6% | 5.2 | 1.00 |
AGR produces more accurate and more coherent reasoning chains than both baselines. The coherence score of 1.00 indicates that every reasoning step in every AGR chain is formally verified—a guarantee that no monolithic model can provide.
8.3 Strategic Planning
We evaluate on the Blocksworld and Logistics domains from the International Planning Competition (IPC), adapted to require natural-language reasoning about goals and constraints. AGR constructs plans as molecular structures where each atom represents a single action or state transition.
| Method | Blocksworld | Logistics | Avg. Plan Length | Optimal % |
|---|---|---|---|---|
| GPT-4 + CoT | 64.2% | 51.8% | 12.3 | 22% |
| LLM + MCTS | 78.4% | 68.2% | 9.8 | 45% |
| AGR (Ours) | 91.6% | 82.4% | 8.1 | 73% |
AGR produces shorter, more frequently optimal plans than both baselines. The energy landscape naturally encodes the cost of each action, so gradient descent finds efficient plans without explicit optimization over plan length.
8.4 Open-Ended Scientific Discovery
As a stress test of creative reasoning, we present AGR with a dataset of 50 scientific phenomena from condensed matter physics, organic chemistry, and molecular biology, and ask it to propose mechanistic explanations. We evaluate proposals using a panel of 8 domain experts who rate explanations on a 5-point scale for plausibility, novelty, and internal consistency.
| Method | Plausibility | Novelty | Consistency | Expert Overall |
|---|---|---|---|---|
| GPT-4 | 3.8 | 2.1 | 3.4 | 3.1 |
| AGR (Ours) | 4.1 | 3.7 | 4.6 | 4.1 |
AGR produces significantly more novel (3.7 vs. 2.1) and more internally consistent (4.6 vs. 3.4) explanations than GPT-4. Three of AGR's proposed mechanisms were subsequently confirmed by domain experts as plausible hypotheses worthy of experimental investigation, including a proposed coupling mechanism between protein folding kinetics and lipid membrane curvature that had not previously appeared in the literature. This demonstrates that energy-landscape navigation can generate genuinely creative scientific reasoning.
9 Limitations and Future Work
Several limitations of the current framework warrant discussion. First, the quality of AGR reasoning is bounded by the completeness of the atom library: if the atoms necessary to answer a query are not present, the system cannot succeed regardless of the energy landscape's geometry. Expanding atom libraries automatically is an important direction for future work.
Second, the convergence guarantees of Theorem 7.2 require the JAE embeddings to satisfy an ε-separation condition that may not hold for very closely related reasoning strategies. Improving the discriminability of JAE embeddings in ambiguous regions of the reasoning space is an active area of investigation.
Third, while AGR provides full compositional verification (every atom in the chain is verified), it does not provide end-to-end verification of the semantic correctness of the final answer. The chain is structurally sound, but the atoms' verification predicates certify syntactic validity and type consistency, not semantic truth. Extending verification predicates to capture richer semantic properties is a fundamental challenge that intersects with the broader problem of machine learning verification.
Finally, the current framework operates over a fixed knowledge graph assembled by the JAE process. A natural extension is online assembly: allowing the knowledge graph to grow and restructure during reasoning, so that the system can acquire new atoms and bonds in response to novel queries. This would move the framework closer to LeCun's (2022) vision of an autonomous intelligent agent that continuously updates its world model through interaction with the environment.
9.5 Toward Substrate-Native Atomic Computation
We devote this section to a question that transcends the algorithmic contributions of this paper and addresses the physical foundations of intelligence itself: can the atomic energy landscape be implemented not merely as a mathematical abstraction computed on digital hardware, but as a physical energy landscape instantiated directly in the substrate of the computing device?
The Energy Crisis of Digital Intelligence
The AI industry is approaching an energy inflection point. Training a single frontier model now consumes on the order of 1024 floating-point operations, requiring megawatts of power sustained over weeks. Inference at scale demands data centers consuming hundreds of megawatts—facilities whose construction is constrained not by capital but by access to electrical power. If current scaling trends continue, AI computation will consume a significant fraction of global electricity generation within a decade.
This trajectory is not sustainable, and it is not necessary. The human brain performs all of its cognitive functions—perception, language, reasoning, planning, creativity, motor control, emotional regulation, homeostatic maintenance—on approximately 20 watts. A single NVIDIA H100 GPU, performing a fraction of these cognitive tasks, consumes 700 watts. A rack of eight such GPUs consumes 10,000 watts. A data center of 100,000 GPUs consumes 100 megawatts. The brain is not merely more efficient; it is more efficient by a factor that demands explanation.
Why Digital Computing is Wasteful
The explanation lies in a fundamental mismatch between the nature of intelligence and the nature of digital computation. Intelligence is probabilistic: it operates over distributions, uncertainties, and soft constraints. Digital computation is deterministic: it represents continuous probability distributions as arrays of discrete floating-point numbers, each requiring 16 or 32 bits of precision, each bit maintained at a precise voltage level through active transistor switching that dissipates energy as heat.
This is profoundly wasteful. The transistors in a modern processor are capable of rich analog behavior—they can represent continuous values through current levels, compute products through current-mode multiplication, implement probability distributions through stochastic switching. But digital design methodology suppresses this analog richness, forcing each transistor into one of two states (on or off) and using thousands of such binary switches to approximate the continuous computations that a single transistor could perform natively.
Landauer's principle (1961) quantifies the theoretical minimum energy cost of computation at kT ln 2 ≈ 2.8 × 10−21 joules per bit erasure. Modern digital processors operate approximately 10 billion times above this limit. The gap is not in the silicon; it is in the architecture. The atoms of silicon are capable; it is the digital abstraction layer that wastes their potential.
The Atomic Isomorphism
The atomic framework developed across this paper series provides, for the first time, an architectural bridge between abstract reasoning and physical substrate. The key insight is what we term the atomic isomorphism: a structural correspondence between three levels of description.
The atomic isomorphism is a triple correspondence between: (i) reasoning atoms—abstract inferential units with typed inputs, outputs, and verification predicates; (ii) computational atoms—minimal circuit elements (analog multipliers, stochastic switches, oscillator phases) that perform the atom's computation in the physical domain; and (iii) physical energies—the actual thermodynamic energy dissipated by the circuit element during computation. When the isomorphism holds, the mathematical energy landscape E(x, q) defined in Section 3 is not merely an analogy to physical energy but is proportional to it.
This proportionality has a remarkable consequence: gradient descent in the mathematical energy landscape corresponds to physical relaxation of the substrate toward its thermodynamic equilibrium. In other words, the computing device doesn't simulate gradient descent—it performs it, naturally and spontaneously, through the physics of energy minimization. The computation happens because the system seeks its lowest-energy state, just as a ball rolls downhill or a protein folds into its native conformation.
Three Substrate Modalities
We identify three physical computing modalities that are compatible with the atomic isomorphism, each exploiting a different physical phenomenon to implement reasoning atoms:
Oscillator networks. Each reasoning atom is implemented as a coupled oscillator whose phase and frequency encode the atom's activation level and output value. Sequential bonds are implemented as phase couplings; the bond energy corresponds to the physical coupling energy. The system's natural tendency toward phase synchronization implements gradient descent in the reasoning energy landscape. Preliminary estimates suggest oscillator-based atoms could operate at 10−15 joules per operation—six orders of magnitude below digital implementations.
Thermodynamic computing. Each reasoning atom is implemented as a stochastic bit (p-bit) whose probability of being in state 1 encodes the atom's activation level. Bond formation is governed by Boltzmann-distributed energy functions identical in form to the bond energy defined in Paper II (Definition 4.1). The system performs probabilistic inference through thermal fluctuations—using the very noise that digital circuits spend energy suppressing. The energy cost approaches the Landauer limit for each atomic operation.
Analog mixed-signal circuits. Each reasoning atom is implemented as a compact analog circuit that computes its inference function f through current-mode or charge-mode operations. Type checking at bonds is performed by voltage-level comparison circuits. The verification predicate V is implemented as a simple threshold detector. Because each atom's computation is bounded in depth and width (Definition 3.3, Paper I), the analog precision requirements are modest (4–8 bits effective), eliminating the need for the 16–32 bit precision that dominates digital energy budgets.
Projected Efficiency Gains
We project the energy efficiency of atomic reasoning across substrate modalities:
| Substrate | Energy per Atom (J) | vs. Digital GPU | vs. Human Brain |
|---|---|---|---|
| Digital GPU (current) | 10−6 | 1x (baseline) | ~50,000x worse |
| Analog mixed-signal | 10−9 | 1,000x better | ~50x worse |
| Thermodynamic (p-bit) | 10−12 | 106x better | ~20x better |
| Oscillator network | 10−15 | 109x better | ~20,000x better |
| Landauer limit | 10−21 | 1015x better | Theoretical floor |
The analog mixed-signal modality represents the most near-term opportunity, achieving approximately 1,000x improvement over digital GPUs using circuit techniques that are well-understood and manufacturable in existing CMOS foundries. Thermodynamic and oscillator modalities offer more dramatic improvements but require novel device development.
Crucially, the atomic framework makes these substrate innovations architecturally accessible. Without the atomic decomposition, mapping a monolithic 175-billion-parameter transformer onto analog hardware is intractable—the computation is too complex, the precision requirements too stringent, the error propagation too unpredictable. With the atomic decomposition, each atom is a simple, self-contained computation with bounded precision requirements and a built-in verification predicate that catches analog errors before they propagate. The atomic framework is not merely an algorithm; it is an interface specification between reasoning and physics.
The 20-Watt Target
We close with a calculation. The human brain performs approximately 1016 synaptic operations per second on 20 watts, yielding an efficiency of ~5 × 1014 operations per watt. Our AGR algorithm on a knowledge graph of 50,000 atoms, answering queries at a rate of 100 per second with an average chain length of 40 atoms, requires 4,000 atomic operations per second. On analog mixed-signal circuits at 10−9 joules per atom, this consumes 4 microwatts.
Four microwatts. Not 700 watts. Not 10,000 watts. Four microwatts—for a system that matches or exceeds the reasoning accuracy of frontier AI models on every benchmark we have tested. The 20-watt brain is not a ceiling; with the right architecture, it is an extravagance.
This is the ultimate promise of atomic intelligence: not merely AI that is more transparent, more verifiable, or more composable—but AI that is physically sustainable. Intelligence built from atoms, running on atoms, at the energy scale of atoms.
10 Conclusion
This paper completes the Atomic Intelligence framework by introducing the atomic energy functional and showing that autonomous, globally coherent reasoning can be achieved through gradient descent in the resulting energy landscape. The three papers in this series establish a complete pipeline from microscopic reasoning primitives (Paper I) through self-assembling knowledge structures (Paper II) to autonomous reasoning through energy minimization (this paper).
The atomic paradigm offers a fundamentally different approach to artificial intelligence—one in which intelligence is not a monolithic black box but a transparent, verifiable, compositional structure built from minimal primitives. Just as the periodic table brought order to the apparent chaos of chemical elements by revealing the atomic structure underlying all matter, we believe that the atomic decomposition of intelligence will bring order to the apparent chaos of cognitive phenomena by revealing the compositional structure underlying all reasoning.
But the deepest implication of this work may be thermodynamic rather than computational. By decomposing intelligence into atoms simple enough to be physically instantiated, we open a path toward AI systems that approach the energy efficiency of biological cognition—and potentially surpass it. The era of intelligence that costs megawatts is an artifact of monolithic architecture, not a law of nature. Atomic intelligence points toward a future where reasoning is as cheap as breathing.
The code, atom libraries, and benchmarks described in this paper are available at github.com/xerial-research/atomic-intelligence.
References
- [1] Hao, S., et al. (2023). Reasoning with language model is planning with world model. EMNLP.
- [2] Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. PNAS, 79(8).
- [3] Landauer, R. (1961). Irreversibility and heat generation in the computing process. IBM Journal of Research and Development, 5(3).
- [4] LeCun, Y. (2022). A path towards autonomous machine intelligence. OpenReview preprint.
- [5] LeCun, Y. (2006). A tutorial on energy-based learning. In Predicting Structured Data, MIT Press.
- [6] Navarro, S., Voss, L., Kimura, R., & Okonkwo, M. (2026a). Atomic decomposition of intelligence: A framework for compositional reasoning systems. Xerial Research.
- [7] Navarro, S., Voss, L., Petrov, A., & Kimura, R. (2026b). Self-assembling knowledge structures: Emergent topology from atomic reasoning primitives. Xerial Research.
- [8] Nesterov, Y. (2004). Introductory Lectures on Convex Optimization: A Basic Course. Springer.
- [9] Camsari, K. Y., Faria, R., Sutton, B. M., & Datta, S. (2017). Stochastic p-bits for invertible logic. Physical Review X, 7(3).
- [10] Wales, D. J. (2003). Energy Landscapes: Applications to Clusters, Biomolecules and Glasses. Cambridge University Press.
- [11] Wei, J., et al. (2022). Chain-of-thought prompting elicits reasoning in large language models. NeurIPS.
- [12] Yao, S., et al. (2023). Tree of thoughts: Deliberate problem solving with large language models. NeurIPS.
- [13] Zheng, K., et al. (2022). miniF2F: A cross-system benchmark for formal Olympiad-level mathematics. ICLR.