This week another energy startup shut its doors. Natron, a sodium-ion battery company that had been around for over a decade, is officially winding down.

The news is sobering because Natron wasn’t a small experiment—it had raised more than $360 million, opened a factory in Michigan, and announced plans for a $1.4 billion gigafactory in North Carolina.

For a moment, it looked like Natron might give the U.S. a real shot at competing with Asia in battery manufacturing. Instead, it has joined the growing graveyard of failed battery ventures.

What Was Natron:

Natron was founded in 2012 with the goal of commercializing sodium-ion batteries. The pitch was straightforward: sodium is far more abundant (and cheaper) than lithium, and sodium-ion cells are considered safer, easier to source, and potentially less dependent on foreign supply chains.

In theory, this made Natron the perfect candidate to help the U.S. reduce its reliance on lithium-ion imports from Asia.

For more than a decade, Natron positioned itself as a materials science innovator. They developed a special material that gave their batteries some distinct advantages: they could charge and discharge faster than conventional lithium-ion cells, endure more cycles, and operate safely without rare materials like cobalt.

Instead of chasing electric vehicles immediately, Natron targeted commercial markets—telecom, cloud computing, and industrial facilities—where reliability and safety mattered as much as energy density.

By 2024, Natron had opened a small manufacturing facility in Holland, Michigan—the first commercial sodium-ion battery plant in the U.S. Just a few months ago, it announced a much more ambitious project: a 1.2 million square-foot gigafactory in North Carolina that would have created over 1,000 jobs and produced 24 gigawatts of batteries per year.

But while the vision was ambitious, the timeline proved unforgiving. Natron spent twelve years developing its technology, raising money, running pilot projects, and building partnerships. By the time it had something tangible to show, external market forces—and impatient investors—caught up with it.

The Numbers:

📅 Founded in 2012
💰 Raised $363M in funding
🏭 Announced $1.4B gigafactory in North Carolina (never built)
🔌 $25M worth of customer orders blocked due to certification delays
📉 Shut down in 2024 after investors froze funding

Reasons for Failure:

Certification Delays: Natron had $25 million in orders waiting at its Michigan factory. But it couldn’t ship anything without UL certification—a safety standard required before batteries can be sold. The process typically takes months and costs tens of thousands of dollars. For a startup already stretched thin, those delays became deadly.
Investor Fatigue: Natron’s investors lost patience. Earlier this year, they froze scheduled payments, leaving the company scrambling for cash. Its primary shareholder tried to sell its stake but couldn’t find buyers. With no lifeline, liquidation was the only option.
Global Competition: Even if Natron had solved its short-term cash issues, the market was turning against it. Lithium prices collapsed by 90% over the past two and a half years, undercutting one of sodium’s main selling points—cost advantage. Meanwhile, China has been building sodium-ion capacity at scale, with over a thousand battery facilities already in operation. Competing against that kind of entrenched supply chain is brutal.
Mismatch Between Timeline and Capital: Battery companies often take more than a decade to get from research to profitable scale. That’s longer than most investors are willing to wait. Natron wasn’t the first to fall into this gap: Powin and Northvolt have also recently failed in similar ways.

Why It Matters:

Certification can kill booked revenue. Natron had ~$25M in orders but couldn’t ship without UL approval. If your product depends on third-party certification, build the testing timeline and bridge financing into your launch plan, before you count the revenue.
Your “cost advantage” can vanish overnight. Sodium-ion’s pitch fell apart when lithium prices collapsed ~90%. Stress-test your model against commodity swings and price wars from incumbents—especially when China can reset the market.
Funding must match hardware timelines. Investors froze payments while Natron was weeks–months from certification, forcing liquidation. If you’re scaling factories, line up capital that survives delays (milestone-based tranches, customer prepayments, project finance), not just VC optimism.

Trend

The End of AI hallucination?

OpenAI just dropped a paper with a spicy thesis: models don’t hallucinate because they’re flaky; they hallucinate because we trained and scored them to bluff with confidence instead of owning uncertainty.

Why It Matters:

Incentives > intentions: Most evals still grade for raw accuracy. Guess right? Full credit. Say “not sure”? Zero. That scoreboard nudges models toward confident fiction when they’re unsure.
Reliability beats headline accuracy: A system that sometimes abstains is more useful than one that’s a tiny bit “more accurate” but invents facts. Especially in workflows touching money, health, or reputation.
Training can change behavior: If labs penalize confident errors more than honest uncertainty, models will learn to pause instead of improvise.

What The Paper Says

Here’s the core conflict the paper highlights: when the only thing that “counts” is a correct answer, the optimal policy during eval is to always answer—even when the model has no signal.

Think of an exam where blank answers lose points but lucky guesses don’t; over thousands of questions, the gambler outperforms the careful student on the scoreboard, even if the gambler is frequently wrong. Models learn exactly that behavior.

The paper’s bigger claim is that our scoreboards are mis-specified. When accuracy is the headline number, models learn to play offense—answer everything—because the metric doesn’t distinguish between a careful “not sure” and a confident blunder. The fix is a new scoring rule: treat responses as right / wrong / abstain, and make wrong hurt more than abstain. That single change reshapes training and model selection.

OpenAI has already been implementing this strategy. Here is a table comparing the newer gpt-5-thinking-mini and the older o4-mini.

What this shows: the older model edges out on accuracy by two points, but it achieves that by guessing almost everything—and being wrong three-quarters of the time when it does. The newer model answers less often, but when it speaks, it’s far less likely to invent an answer. If you run a product where bad information is costlier than no information, the second profile is strictly better.

The Metric Shift To Watch

Accuracy won’t decide the next phase—coverage at a given risk will. The right way to read models from now on is a simple curve: put coverage on the x-axis (how often the model answers) and precision on the y-axis (how often those answers are right). Then mark a third number on the chart: the confident-wrong rate—cases where the model sounded sure and was still wrong. That’s the harm knob.

It’s very likely that from now on we’ll see AI companies focus less on the “accuracy” benchmark and more on coverage at target precision—publishing coverage–precision curves and a bold confident-wrong rate as first-class metrics.

If this paper has a punchline, it’s this: we don’t just need smarter models; we need models that know when to shut up—and a scoreboard that rewards them for it.

Help Me Improve Failory

How Was Today's Newsletter?

If this issue was a startup, how would you rate it?

That's all of this edition.

Cheers,

Nico