The Case for Structured Knowledge

Bringing sources back into AI is not a concession. It is a strictly more powerful architecture.

The False Binary

The AI debate has two poles. Scaling maximalists say bigger models and more compute will solve everything. Ethics critics say slow down and protect creators. Both miss something fundamental.

The scaling narrative assumes the bottleneck is compute. It's not. The ethics narrative frames attribution as a cost to be borne. It's not. The bottleneck is structured feedback from the physical world, and attribution is the mechanism that makes that feedback usable.

No model, however large, will solve cancer by reasoning over PubMed abstracts. Because abstracts are the output of science, not the process. The process produces measurements, constraints, negative results, and dependencies that current AI dissolves into training data and can't trace back.

What Structure Enables

Without structured dependencies, there are computations AI simply cannot perform, no matter how large the model or how long the context window:

Propagate updates. When a calibration standard is revised, which downstream results are affected? A system that dissolved the dependency into prose can't answer this. Structured links trace the full blast radius.

Measure independence. Are two measurements truly independent, or do they share hidden assumptions? We showed that 105 published measurements of the Hubble constant collapse to 59 independent clusters when you trace their structural dependencies. Citations alone can't detect this.

Detect contradictions. When two facts share a subject and predicate but disagree on the value, that's a signal. Structured storage makes contradictions visible. Prose buries them.

Find complementarity, not similarity. Embeddings find more of what you already know. Structural matching finds what would make your work more powerful. A technique in materials science that solves a problem in drug delivery, connected by structural properties, not keywords.

These aren't hypothetical capabilities. We've measured them.

Attribution Is Power, Not Charity

An unattributed claim is just a prediction. It might be right, it might be hallucinated, and you can't tell the difference without checking the source. Attribution converts predictions into verifiable assertions. Verification is what separates knowledge from belief.

When you know that Result B depends on Method A, you can compute structural importance, measure what breaks if a finding is wrong, assess whether alternatives exist, and trace how a technique transfers across fields. These computations are impossible over flat text.

The default in AI today is opt-out: your contributions are used to build models that compete with you. CoreTx reverses the flow. You opt in. Your data stays yours. When we match, it's only to connect you with complementary work for the benefit of your research.

Knowledge Objects

A Knowledge Object is a discrete, hash-addressed (subject, predicate, object) triple, attributed to its source, typed for structural matching, and linked to other facts by typed edges. Not a document chunk. Not an embedding. A specific, verifiable, computationally tractable unit of knowledge.

A paper takes months and captures only the final result. A KO takes 30 seconds and captures the process: the negative result, the calibration trick, the half-formed hypothesis that sits in a notebook for years. Researchers speak or type a research idea, AI extracts the structure, and each KO is cryptographically signed with your ORCID and timestamped. Provable priority from the moment you capture it.

Because KOs are granular and typed, the platform can match your work structurally across disciplines. Not "papers like yours" but complementary research: a method that solves your problem, a dataset that tests your hypothesis, a negative result that saves you six months. These connections surface through structural properties, not keywords or citation graphs.

The Evidence

100% vs 0%Write-time salience gating vs ungated memory and Self-RAG (real LLM validated)

252×Cost advantage at N=7,000 facts (O(1) vs O(N) per query)

105 → 59H0 measurements collapse to independent clusters via dependency tracing

5.3×More related pairs found than citation overlap

50%Project constraints silently lost after 3 rounds of context compaction

12 experiments, 1,320 API calls, 3 random seeds.

Science Doesn't Stop

Even a vastly more capable AI needs physical experiments it cannot run in silico: clinical trials, telescope observations, materials characterization. No amount of reasoning over existing data will produce data that doesn't exist yet.

What AI can do is make the feedback loop faster. Today, most experimental knowledge waits months or years for a paper. KOs collapse that delay to seconds: capture a finding, and it is immediately structured, attributed, and available for matching with complementary work worldwide.

The throughput of science is bounded by experiment, not by compute. The interface between AI and that experimental feedback loop is structured, attributed, traceable knowledge. That's what we're building.

Published Research

Attention Is Not Retention (arXiv:2601.15313) →ANML: Attribution-Native Machine Learning (arXiv:2602.11690) →

Built by Scientists, for Scientists

CoreTx is built by researchers who have lived the problem. Our team spans Cambridge, Berkeley, CERN, Harvard, SpaceX, and Google. We have published papers, run experiments, waited months for peer review, and watched our own work dissolve into training data without attribution.

That firsthand experience shapes every design decision. We believe the AI revolution in science should be owned by scientists, not locked inside frontier labs. CoreTx exists because the people building it needed it to exist.

We're Building This for You

Tell us what's working and what we should build next.

Send Feedback

Back to CoreTx