OpenAI Says an Internal Model Disproved Erdős' 1946 Conjecture. The Procurement Row It Exposes Is Third-Party-Reproducible Attestation, Not AI Mathematics.
The Announcement
On 20 May 2026 OpenAI announced that an unnamed internal model had disproved Paul Erdős' 1946 unit distance conjecture, framing the result as "the first time AI has autonomously solved a prominent open problem central to a field of mathematics." The conjecture, a foundational problem in discrete geometry, had stood for eighty years with the square-grid construction widely believed to be the asymptotically tight example at n^(1+o(1)). The companion paper on arXiv, co-authored by Noga Alon, Thomas Bloom, Timothy Gowers, Daniel Litt, Will Sawin, Arul Shankar, Jacob Tsimerman, Victor Wang, and Melanie Matchett Wood, states that the construction was "first mathematically generated in one shot by an internal model at OpenAI, and then expositionally refined through human interactions." The paper never names the model.
The announcement is being read as a mathematics story. For anyone procuring an agentic AI research tool, it is an attestation-chain story, and the chain has three breaks worth naming.
What the Chain Actually Looks Like
The first break is the unnamed model. There is no version pin, no model card, no checkpoint hash, no temperature setting, no system prompt disclosed. A buyer cannot specify "the model that produced the Erdős result" in a contract, because the vendor has not made it identifiable. The Interesting Engineering coverage notes that the engineers "did not specifically train it on the unit distance problem or build dedicated search tools," which is the claim that makes the result remarkable; it is also the claim that is impossible to audit without a reproducible artifact. This is the same unnamed-model gap I drew out of GTIG's zero-day-from-AI disclosure earlier this year, where the agency named which models it had ruled out and left the actual model unnamed; here the structure is identical and the omission belongs to the vendor itself.
The second break is that the original generation cannot be replayed. The arXiv paper's phrasing, "first mathematically generated in one shot" and then "expositionally refined through human interactions with Codex," describes an output that exists only as the refined post-hoc version, with the refinement itself done through a second OpenAI product. Will Sawin's refinement produced an explicit improvement exponent of δ = 0.014, but that exponent is the human-polished output, not the model's raw generation. Melanie Matchett Wood specifically warned that the original AI output did not appropriately cite prior work, which is a provenance and intellectual-property defect that the published paper has cleaned up. The thing the vendor is taking credit for is not the thing the reader is reading.
The third break is that the validation is vendor-curated. The nine mathematicians on the companion paper were assembled by OpenAI. Timothy Gowers called the proof "a milestone in AI mathematics" and said he "would have accepted [it] for the Annals of Mathematics without hesitation," but his endorsement appears inside the paper that is itself the announcement's validation, not in an independent peer review at the Annals. The companion paper is co-authorship, not third-party verification. This is the same structural pattern as a SOC 2 attestation where the audited vendor picks the auditor; I covered the auditor-selection problem in the Delve and LiteLLM shared-auditor analysis, and the shape recurs here in a more prestigious wrapper. It is worth contrasting with what OpenAI itself did weeks earlier in publishing a frontier-access rubric for cyber research: the rubric is the kind of auditable, externally evaluable disclosure that procurement can score against. The Erdős announcement is not.
The Winbuzzer coverage was the only outlet that named the problem directly: "Outside review is central to that claim because many readers cannot test the proof themselves," and if this claim also unravels, "enterprise and scientific institutions may face heightened skepticism about AI-generated mathematical proofs."
Why the Skepticism Has History
Seven months before this announcement, the same vendor made a structurally identical claim. In October 2025 OpenAI's Kevin Weil claimed GPT-5 had solved ten unsolved Erdős problems; Thomas Bloom called it "a dramatic misrepresentation," Yann LeCun and Demis Hassabis criticized the claim publicly, and Weil deleted the post. The May 2026 announcement is the do-over, presented with a co-authored paper as the corrective.
The corrective addresses the prior credibility hit but does not address the structural problem that produced it. The October 2025 claim and the May 2026 claim share the same architecture: the vendor controls the artifact, the vendor controls the framing, and the vendor controls the validation channel. The presence of well-credentialed co-authors changes the perceived weight of the claim; it does not change the locus of attestation. This is the same disclosure shape that Anthropic ran on the 500-zero-day announcement I wrote about: a high-prestige number, a confident assertion, and no externally referenceable artifact a buyer could verify against. The vendors differ; the disclosure pattern does not. A buyer who treats the May 2026 announcement as evidence that the vendor's agentic research tool produces verifiable mathematics is buying the second iteration of a pattern that has already failed once.
The Procurement Row
The row this story exposes is third-party-reproducible-output attestation. For an enterprise procuring an agentic AI tool that will be used in research, regulatory submission, due-diligence work product, or any other context where output provenance matters, the contractual question is not "does the vendor publish a paper." The contractual question is whether the vendor will commit to:
- A pinned model identifier, with a version string and a checkpoint hash, that the buyer can reference in audit logs.
- A reproducibility commitment: the same inputs, model version, and configuration produce the same output, or the vendor documents the specific stochastic parameters that prevent it.
- An independent-validation pathway, where the validator has no commercial or co-authorship relationship to the vendor.
- A provenance trail that distinguishes raw model output from human-refined output, with both versions retained.
None of these are exotic asks. They are the same controls a regulated buyer already demands for code provenance and SBOM attestation; I made this argument about signed provenance lying in the TanStack Actions cache-poisoning post, and the SOC 2 scope-gap version sits in the kernel CVE SaaS procurement SLA piece. The Erdős announcement is the same failure mode applied to research output rather than code or compliance posture: the attestation is signed, the signature is from a credentialed party, and the underlying artifact is not independently reproducible.
The ask for the procurement team is concrete. Add a row to the agentic-AI vendor DDQ that reads: "For any output produced by your platform that will be relied upon as a verifiable artifact, will you commit contractually to a pinned model version, an unmodified raw-output retention requirement, and an independent-reproducibility test path with a validator that has no co-authorship or commercial relationship to your organization?" The vendor's answer, in writing, is the artifact your auditor will want eighteen months from now when a published result needs to be defended.