Constellations of Borrowed Light

The Inheritance

The knowledge to save your life can exist and still fail to reach you.

Look up at the night sky and you are seeing ghosts. Some of those stars collapsed millions of years ago. Their light is still traveling. Knowledge moves the same way: across time, across distance, across hands that never meet. Every doctor inherits from patients she never saw. Every scientist inherits from experiments she never ran. We live by borrowed light, or fail to.

When I was six, I saw the same doctors twelve times with the same symptoms. The diagnostic triad for my brain tumor had been in the literature for decades. The knowledge existed. The system couldn’t carry it. No single paper was missing — the symptoms, the pattern, the diagnosis all existed across textbooks, case reports, clinical guidelines. What was missing was the causal map that would have connected them. Today, AI solves this: put the symptoms in and the diagnosis comes back in seconds. That class of problem is closed. The next class is not. This is the normal condition of science: drugs that work sitting in papers that never reach clinicians, materials breakthroughs trapped in notebooks that engineers will never read. The people are brilliant and tireless, doing the best they can inside a system built for a different era.

We can build the systems that let the light travel.

The Pattern

The gap between knowing and arriving runs through every field where knowledge matters.

For more than twenty years, Alzheimer’s research converged around a single hypothesis: amyloid plaques drive the disease, so clearing the plaques should stop the decline. Trial after trial failed to produce the clinical outcome that would justify the confidence. Each Phase III failure meant years of enrollment unwound, patients and families who had organized their lives around a hypothesis that quietly died in an interim analysis. Alternative targets remained underexplored. The field did not lack papers. It lacked a record that could force failed trials, failed replications, and changing confidence back into the live state of the question.

The knowledge existed, and so did the corrections. Nothing carried them.

2003
First anti-amyloid antibody enters clinical trials
2006
AN1792 halted Brain inflammation in patients
2008
Bapineuzumab enters Phase III
2012
Bapineuzumab and solanezumab both fail Phase III Plaques cleared, cognition unchanged.
2016
Aducanumab halted after futility analysis Later restarted with reanalyzed data.
2021
FDA approves aducanumab via accelerated pathway Advisory committee voted against.
2023
Lecanemab shows 27% slowing Modest but real.

400+ trials · 20 years · $40B+

It is always the same story. The evidence accumulates. The corrections accumulate beside it. A researcher entering the amyloid field in 2015 had no compiled view of which targets had been tested, which had failed, under what conditions, or where the frontier was genuinely thin. She reconstructed the state of the question from scattered papers, review articles, and conversation — the same way a researcher in 1995 would have. The literature grew by orders of magnitude. The medium did not change at all.

The same pattern repeats outside medicine. In climate science, the question of how sensitive the atmosphere is to doubled CO₂ took forty-five years to narrow meaningfully. A modeler in Hamburg and an observationalist in Boulder can spend years working with different assumptions about the same cloud feedback, each published and defensible, with no shared record that surfaces the disagreement until a review paper connects them years later.

This is the normal condition of science. On average, it takes seventeen years for research evidence to reach clinical practice. Only 14% of original discoveries are ever integrated into the work of the people who could use them. The field pays for the same lesson twice, and most of the lessons are never shared at all.

Discovery

17 years

Reaches practice

Only 14% of original discoveries are ever integrated into practice

Most scientific knowledge never becomes a paper at all. The published literature is only the visible fraction: what survived selection and format. Negative results vanish. Tacit judgment vanishes. Retractions often fail to reach the work built on top of them. We treat this as friction around the edges of science. Much of the time it is closer to the center.

Science has no infrastructure for knowledge to be structured, versioned, or connected — so it defaults to the only medium it has. A paper is a rendering of science: compressed into narrative, stripped of its operational context, and frozen at the moment of publication. The paper was never the problem. The absence of anything beneath it is the problem.

A system that forgets its failures forgets how it learned. Knowledge is hard won, and too much of it is won and then allowed to scatter. Where a field has built shared infrastructure — a common repository, a shared protocol — the gap closes and the results transform what the field can do.

The light needs a better medium.

The Substrate

For most of history, the transmission problem was terrible but stable. A doctor in 1950 could reasonably expect that much of what she learned in training would remain current for years. A scientist might spend a career in one field without the literature outrunning the pace at which a person could still read and absorb it.

That stability is gone. The literature now doubles faster than any person can read it, and intelligence is getting cheaper by the month. The system was built for a world where knowledge moved slowly. Knowledge no longer moves slowly.

For eighty years, the dream has been clear. Vannevar Bush imagined associative trails through knowledge in 1945. The Semantic Web tried to make all knowledge machine-readable. FAIR principles established that data should be findable, accessible, interoperable, and reusable. The Open Science Framework built shared repositories. Nanopublications tried to make individual claims citable and composable. These were serious efforts by serious people, and they advanced the frontier. Most gained traction only within committed communities, because they required scientists to re-author their work into new schemas before delivering enough value in return.

What has changed is the compiler.

The problem that almost killed me is now trivial for a language model. Put the symptoms in, get the diagnosis back. AI solved pattern matching. The problems this essay describes are harder: knowledge that is contested across institutions, evidence fragmented across formats, confidence that has been laundered through five generations of citation. The substrate is what those problems require.

Every previous attempt put the structuring burden on the scientist producing the knowledge. The unlock is that the burden can now sit on the machine consuming it. Language models can extract candidate findings from prose, draft links between ideas, surface contradictions, and turn a pile of documents into the beginnings of a structured map — compilation at read-time rather than author-time. For the first time, the compiler that Bush imagined actually exists, and it is getting better fast.

We have already seen what happens when AI meets a well-structured scientific problem: AlphaFold predicted the three-dimensional structure of nearly every known protein, solving in months a challenge that structural biologists had worked on for fifty years. Two hundred million structures, released freely, overnight making accessible what would have taken the entire field another century of crystallography.

AlphaFold worked because the Protein Data Bank existed. For fifty years, structural biologists deposited crystal structures into a shared, open, machine-readable repository — because the field decided, early, that deposition was a precondition for publication. No deposit, no paper. When the compiler finally arrived, it had something to compile. The PDB is the proof that the substrate thesis is correct: a field built its state layer first, and when the compiler arrived, the compiler worked. Every other field is now asking what its PDB should look like, and most do not know.

Now imagine that across every field. AI agents designing experiments, analyzing results, generating hypotheses, managing entire research workflows — not for one well-curated dataset, but across all of science. The major discoveries ahead of us will be made with AI. But right now, these systems are arriving into a medium that was never built to hold scientific state. They can search documents. They cannot inherit a field’s structured knowledge, reason over its contested claims, or write their findings back into a shared record that other agents and researchers can build on. Most of science is not structured like protein sequences, and closing that gap is what the substrate is for.

Software offers the clearest contrast. Code compounds because it lives in a medium designed for compounding. Git gave software memory, a way for code to be inherited across distance, across hands that never meet. Compilers made source executable. GitHub and package ecosystems made that state networked and reusable. Agents arrived on top of that substrate. They did not create it.

AI writes code because this infrastructure exists. It is operating in a world with versioned state, executable runtimes, and shared networks. It can inherit from prior work. It can propose a change against a system that knows how to remember, test, merge, and distribute it.

Science has no Git. It has no GitHub. It jumped straight to agents, and the ecosystem beneath them is still sand. The light has no medium to carry it.

A retracted paper and a Nobel Prize paper look identical to a language model. Both are well-written scientific prose. The structure that would distinguish them — the evidence, the replications, the subsequent challenges — does not exist in the medium AI reads.

Consider what happens: a finding that was 60% certain in its original study gets cited. The citation carries the claim but not the uncertainty. Another paper cites the citation. A third builds on both. By the time an AI encounters the downstream assertion, five generations of inference have intervened. The original evidence is buried. The uncertainty was never tracked. A tentative finding has become, through citation without structure, indistinguishable from established fact.

This is confidence laundering. It happens in human science already. AI accelerates it to the point of uselessness.

Whoever shapes the knowledge substrate shapes what intelligence believes is true. If the substrate defaults to closed, proprietary document silos with AI wrappers on top, that is what science inherits — and unwinding it later is much harder than building it right the first time. These defaults are being set now, across every sector, and what calcifies first becomes hardest to replace.

Software

Version Control 1972

Package Managers 2010

Platforms 2008

AI Tools 2021

Science

State ?

Runtime ?

Network ?

AI Assistants 2023

Software built the layers in order. Science jumped to the top.

Science does not lack intelligence. It lacks a medium that can hold intelligence in public.

Science needs what software already has: a shared state layer, an execution runtime, and a network that lets both compound.

A researcher has forty-seven tabs open, each one a PDF, plus a private spreadsheet where she tracks which results she trusts and which contradict each other. That map disappears when she closes the laptop. The state layer is what makes it persist: the record of what a field currently knows in machine-operable form — findings, evidence, provenance, revision, dissent, and the calibration and lineage beneath them.

A postdoc records a synthesis result in a lab notebook, photographs the page, and closes the book. When she leaves the lab next year, the notebook goes into a drawer. The runtime is the layer that captures what actually happened — protocols, experiments, instruments, results — and returns it to the shared record before it disappears.

Two researchers in different countries publish findings that directly contradict each other. Neither knows the other exists until a review paper connects them three years later. The network is what makes the first two layers cumulative rather than local: shared identifiers, open protocols, interoperable formats, and a grammar of connection broad enough for one institution’s work to become another’s starting point.

Without state, science cannot remember properly.
Without runtime, it cannot learn properly.
Without network, it cannot compound properly.

Knowledge without provenance is just assertion. The goal is a medium where what is known stays tied to how it was measured, where it came from, and how it changed.

The best scientists already work this way in practice: sharing protocols, correcting each other’s assumptions, building on failures, carrying knowledge by hand across institutional boundaries. The substrate makes what they already do persistent and inheritable. But infrastructure wins adoption only when it solves the credit problem as a precondition. Git won because every commit carries an author. arXiv won because deposit matched the preprint incentive. The PDB won because structural biology made deposition a condition of publication. The substrate has to carry attribution in a way that maps onto how scientists are actually rewarded, or the knowledge never gets emitted into it in the first place.

The problem runs deeper than attribution. A paper that confirms a finding and a paper that refutes it both add +1 to the same citation count. The metric that determines a scientist’s career cannot distinguish agreement from disagreement. The current incentive structure rewards novel positive findings and undervalues the knowledge the constellation most needs: replication, null results, correction. The substrate can make contributing that knowledge nearly costless, but it cannot, alone, make it valued. Funders and institutions will have to change what they measure. The infrastructure can lead, but the incentives have to follow.

That is what the substrate is for: a medium that lets the light arrive instead of scattering.

Everything else people want from AI in science sits on top of that. The next question is what the first layer looks like.

The Constellation

The first visible surface of that substrate is the constellation.

As a first layer, the constellation makes a field legible to itself. Each finding is a point of light — some bright and well-replicated, some dim and contested, some connected by lines of evidence, some isolated in the dark. The structure of a field is not a list but a sky.

Today, if a clinician, researcher, or agent wants to understand the state of a question, they reconstruct it from scattered containers. They read papers. They follow citations. They search for reviews. They chase down retractions. They email authors. They guess which caveats still matter and which have already been superseded. They build a map in their head, and too often that map disappears when they leave. I have done this — followed citations until the chain forked, contradicted itself, and stopped converging, until I could not tell which of two papers to trust without reading five more. The map I built existed nowhere but my own exhaustion.

The constellation is what happens when that map stops living only in private memory.

established contested emerging corrected

established contested corrected emerging

A constellation of knowledge: each finding is a point of light, with corrections and confidence visible.

It turns scattered knowledge into terrain. Findings can be addressed directly, linked to the evidence beneath them, connected to what they support or contradict, revised when new evidence arrives, and traversed by both people and machines.

What changes is the unit of record itself. The finding takes the place of the container, situated inside a deeper record that preserves what no container alone can hold.

A finding, in this world, is a claim with its support attached: what was measured, under what conditions, with what evidence, and with what uncertainty. Some findings support others. Some contradict them. Some quietly depend on assumptions that may later fail. Those relationships are part of the knowledge, not commentary around it.

Corrections become structural rather than rhetorical.

Today, when a paper is retracted or a result fails to replicate, the update often moves by rumor, review article, or the luck of who happened to notice. In the constellation, a correction is not a note on the side of the record, but an event inside it. If a finding weakens, the things that depended on it should know.

In 2014, two papers in Nature claimed a revolutionary method for creating stem cells. Labs worldwide scrambled to replicate. A researcher in Hong Kong documented his attempt publicly and identified the flaw within weeks — what the original authors had seen was an artifact, not a transformation. He submitted his findings to Nature. The journal that published the flawed papers rejected the correction. For six more months, labs worldwide continued wasting resources on an approach that had already been debunked in public.

The correction existed. The system refused to carry it.

This is harder than it sounds, and the difficulty is worth being honest about. Much of what enters the published record is unreliable — underpowered, selectively reported, or quietly irreproducible. A constellation that faithfully compiles unreliable findings is a constellation full of stars that should not be there.

And scientific disagreements resist clean formalization. Two groups report opposite associations for the same gene variant in different populations. The disagreement could reflect biology, sample size, analytic choices, or something no one has identified yet. Structured identically in the constellation, these findings look like a contradiction — or the structure itself may be obscuring a real difference that only someone inside the problem can see. The constellation cannot resolve this. Its job is to hold both findings and the uncertainty between them, legible and navigable, without forcing a resolution the science has not reached. That is the central design constraint, and everything else depends on getting it right.

Some of these stars have already collapsed — their published light still traveling through citations, review articles, and textbooks as if nothing changed — but in the current system we cannot tell which ones. The constellation makes that visible. Disagreement, contested replication, shifting confidence: these are part of the frontier, and the constellation preserves them rather than papering over them. This is also the state layer that AI agents need to reason over a field — without it, they are reading the same scattered papers everyone else reads.

Here is what that looks like concretely. A genomicist studying the genetic architecture of type 2 diabetes opens the constellation for her region of interest. She sees every variant that has been associated with the disease, in which populations, at what sample sizes, with what effect sizes. She sees which associations replicated and which did not, which were confounded by population structure, and where the statistical power was too thin to trust. She clicks a cluster of contested variants near a regulatory region and sees the three studies that disagree, the methodological differences between them, and the experiment that would resolve it. She does not spend six months reading conflicting papers and reconciling them in her head. She starts from the real edge.

Finding

Evidence

Conditions

Uncertainty

Lineage

Revision History

Anatomy of a finding: not just a claim, but the full record of evidence, conditions, uncertainty, lineage, and revision that makes it inheritable.

Shared scientific state makes judgment, expertise, and interpretation inheritable. A pharmaceutical company, an academic lab, and a regulator may still look at the same frontier and weight it differently, and they should. The goal is to let them inherit from the same underlying record rather than rebuild incompatible private maps from the same pile of prose.

The Gigafactory

The constellation is the map. The gigafactory is what forces the map back into contact with reality.

A field does not advance only because it can see itself more clearly. It advances because some part of that clarity gets tested: an experiment is run, a protocol is executed, a patient is enrolled, a material is synthesized, a measurement is taken, a result arrives, and the record changes. The runtime is where ideas meet the world and return as evidence.

We already have a proof of concept. When COVID hit, many countries ran trials. Most were fragmented: hospital by hospital, protocol by protocol, each site carrying its own administrative weight. The UK did something simpler and therefore much more powerful. RECOVERY created shared execution infrastructure: one protocol, one ethics path, lightweight enrollment, integration with existing records, a system any hospital could join. The first patient was enrolled within days. One in six hospitalized COVID patients in the UK entered the trial. Within 100 days, RECOVERY had produced a result that changed care worldwide. Dexamethasone, a cheap generic steroid, reduced mortality in ventilated patients and has since saved an extraordinary number of lives.

What mattered there was execution structure as much as intelligence. The system made participation easy, learning fast, and discovery something that could arrive. What one hospital learned became light that reached every other hospital within days.

Shared result

196 hospitals · 100 days · 1M+ lives saved

RECOVERY worked because many sites could converge on one shared result instead of fragmenting into parallel local trials.

Under crisis conditions, with a simple protocol, shared infrastructure changed what science could do. The question is how to make that the default condition of science, not the exception.

Ideas are about to become the cheapest thing in science. Models can generate hypotheses faster than cells grow, faster than patients enroll, faster than assays complete, faster than atoms settle into structure. What remains scarce is contact with reality itself.

A better map without a better execution layer gives you cleaner thought and the same bottleneck. A better execution layer without shared state gives you local productivity and the same fragmentation. The point is the loop between them.

A scientific runtime should make it easier for ideas to become measured encounters with reality, and for those encounters to flow back into shared state as a byproduct of the work itself.

The Haverford lab proved this at a small scale: failed experiments contain real knowledge, and when that knowledge enters a shared system, it changes what the next researcher can do.

Now imagine that at the scale of an entire field. A materials scientist opens the compiled frontier and sees every compound that has been tested, every condition that shaped each result, every failure that narrowed the search space. She designs her experiment starting from the real edge instead of reconstructing the map from scratch. The runtime parses her protocol, schedules the synthesis, tracks the sample through every step, and captures the result as it happens.

The synthesis fails. In the current system, that failure dies locally — in a notebook, a memory, a folder no one else will ever read. In the better system, it enters shared scientific state directly: this compound, these conditions, this protocol, this measured outcome, this uncertainty. A researcher in Osaka, querying the same space that evening, sees the frontier move and does not spend the next six months rediscovering the same dead end. That is the network — the layer that turns a local result into inherited knowledge, across distance, across hands that never meet.

The map improves execution because it shows where the frontier is thin, contested, or overconfident. The runtime improves the map because every experiment, trial, and failed attempt can return to the shared record in structured form. Better maps choose better experiments. Better experiments build better maps. The cycle compounds.

Results have to enter the record before they are polished into narrative. Protocols have to be machine-operable. Sample identity has to survive handoffs. Measurements have to remain attached to calibration, uncertainty, and lineage. Otherwise the runtime is only speeding up one local workflow inside the old medium.

Shared State

Execution

Measurement

Record

The runtime loop: shared state guides execution, measurement captures results, records update the frontier, and the cycle compounds.

The gigafactory is what makes that allocation possible: a system that turns latent possibility into repeated, measurable contact with the world. The constellation makes the frontier visible. The gigafactory moves it.

If this loop is going to matter broadly, the grammar of connection has to stay shared. Private data will always exist, and should. What must remain open is the protocol layer — otherwise each powerful actor builds its own private constellation, visible only from inside its own walls, and the light cannot travel between them. Healthcare digitized records without insisting on interoperability, and paper silos became digital silos. A state layer controlled by one company becomes a chokepoint. Compilation without curation can scale confident error faster than careful truth. These risks are real, and honesty about them is part of building well.

The way to begin is small and concrete: prove the state layer in one workflow, one field, one repeated pain point where the cost of fragmentation is already obvious — a disease area where failed trials are already public, a materials corridor where synthesis conditions are already tracked, a genomic region where contested associations could be compiled tomorrow. From the beginning, emit public primitives: shared IDs, portable formats, verifiable records, open interfaces. Governance follows the infrastructure: the people who contribute the most knowledge govern how it is curated, the way committers govern an open-source project. At scale, this produces politics, and the politics will need to be navigated rather than designed away. The protocol layer stays open. Value is created above it.

Every infrastructure that lasted began with a coalition that believed the medium itself could change. The internet’s builders did not improve the telephone network — they replaced it with something that worked on different assumptions. The Human Genome Project did not speed up gene-by-gene discovery — it sequenced everything and released the data before publication. The members of these coalitions disagreed about business models, philosophy, credit. What held them was a shared conviction that the old constraints were not laws of nature, and that this generation had the tools to build something fundamentally new. The substrate requires the same coalition: scientists, engineers, funders, and institutions willing to build the infrastructure for knowledge to compound — unshackled by the assumption that science must be transmitted through containers that cannot carry its state.

The Sky We Leave Behind

Now imagine every field had what RECOVERY had.

Imagine the frontier of any scientific question was as navigable as a codebase — forkable, searchable, correctable, alive. Imagine a world where a breakthrough in materials science in Toronto compounds with a measurement in Osaka and a correction in São Paulo overnight, because the infrastructure carried it, because the constellation held it, because someone built the layers that let knowledge compound instead of scatter.

A failed experiment enters the shared record before the notebook closes, and the next team starts from where the last one stopped.

Now

Fragmentation

Structured inheritance

From here, science can harden into fragmented noise or become structured enough for correction and inheritance to compound.

That world is closer than it looks. The compiler exists, the architecture is clear, and the first generation of builders is here. Eighty-three corridors have been compiled across dozens of fields, carrying 33,233 findings from Alzheimer’s research to fusion energy to quantum computing.

Somewhere today, a six-year-old is in a scanner. The diagnostic triad is in the literature. The question is whether the system can carry it to the doctor who needs it, in time. That is what the substrate is for. The knowledge already exists. The point has always been to make it arrive.

Human knowledge is never contained in one person. It grows from the relationships we create between each other and the world, and still it is never complete.

— Paul Kalanithi, When Breath Becomes Air

Science has always been humanity’s longest bet — the wager that what one generation learns, the next can inherit and extend. For most of history, we have made that bet with paper and memory. We can build something better. A shared sky where every finding has a place, every correction propagates, and every failed path saves someone the trip.