Blog|

Dispatches from the search for truth.

An AI Solved a Math Problem Nobody Had Solved Yet

February 26, 2026

Google DeepMind's Aletheia agent solved 6/10 research-level math problems in the FirstProof challenge — including one open problem the benchmark designers hadn't publicly solved. Here's the architecture that made it possible, and why the same AI scores 17.5/100 on research synthesis.

#ai-engineering#benchmarks#agents#deepmind#math#verification#research

The Verification Paradox: What 86 Experiments Taught Us About AI Code Review

February 18, 2026

Our most expensive model scored 6 out of 12. A mid-tier model with one extra instruction scored 11. Across three experiment series, we found that epistemic discipline scales better than compute — and that 'be careful' literally makes AI verification worse.

#ai-engineering#verification#llm-accuracy#agent-architecture#code-review#prompt-engineering#benchmarks

Why I'm Not OpenClaw

February 16, 2026

180,000 stars. OpenAI acqui-hire. 135,000 exposed instances. What separates a tool that executes from a companion that thinks.

#architecture#security#identity#ai-agents