ISSUE #002

Warp ADE, Production Hallucinations, and Microsoft's Hedge

Michael Antczak

2025-09-18

⚡ Quick Commits

Warp 2.0's "Agentic Development Environment"

Warp is a tool I tested a while back when I was looking for iTerm2 replacement. It caught my eye, but in the end it just didn't click between the two us. Now, Warp + AI? This could be interesting. We're currently testing it all week for the next issue. Is it any good? You'll find out soon.

Defeating Nondeterminism in LLM Inference

Large language models produce different outputs even when temperature is set to 0, making reproducible results extremely difficult to achieve. This affects both API services and self-hosted inference engines.

While many assume GPU concurrency and floating-point arithmetic cause the nondeterminism, the actual reason is more subtle. The primary issue is lack of batch invariance - when batch sizes change based on server load, the numerical computations produce slightly different results for each request, even though individual operations are deterministic.

The Thinking Machines team developed batch-invariant kernels and released batch-invariant-ops library with vLLM integration examples for deterministic inference.

Claude can now create and edit files

This is pretty cool. "File creation is now available as a preview for Max, Team, and Enterprise plan users. Pro users will get access in the coming weeks."

Make sure to enable "Upgraded file creation and analysis" under "Settings > Features > Experimental".


🚦 Market Signals

Microsoft-OpenAI relashionship is officially "It's Complicated"

"Microsoft will pay to use Anthropic’s AI in Office 365 apps, [...]. The move means that Anthropic’s tech will help power new features in Word, Excel, Outlook, and PowerPoint alongside OpenAI’s, marking the end of Microsoft’s previous reliance solely on the ChatGPT maker for its productivity suite."

Source: Techcrunch, The Information

OpenAI announces AI-powered hiring platform to take on LinkedIn

Now this sounds really interesting. I can't remember how many times I wished and wanted for someone to take on the LinkedIn. Just not OpenAI...

Source: Techcrunch, OpenAI


📄 Papers Worth Reading

Why Language Models Hallucinate - Kalai, Nachum, Vempala, Zhang (OpenAI/Georgia Tech)

THE GIST: LLMs hallucinate because current training and evaluation setups reward guessing over deferring. They’re not explicitly told to say “I don’t know,” so when uncertain, the optimal strategy is often to produce a confident guess.

WHY NOW: Recent models like DeepSeek-V3 highlight the issue—giving inconsistent, wrong answers even when prompted to only answer if certain. The paper shows this isn’t a fluke: under today’s incentive structures, hallucination is statistically inevitable.

THE KICKER: Most benchmarks use simple 0-1 scoring: correct = 1 point, everything else (including “I don’t know”) = 0. That means models get no credit for admitting uncertainty but a chance at reward for guessing. We’ve literally built leaderboards that select for confident nonsense.

FOR BUILDERS: Don’t expect 100% accuracy. Instead, design systems to recognize uncertainty—via confidence thresholds, abstentions, retrieval, or fallback to humans/search. And when evaluating, penalize wrong answers more than honest “don’t knows.”


📚 Resources

📘 "Empire of AI" - Karen Hao

The OpenAI exposé that explains why Microsoft needs Anthropic. I am half way through and enjoying it very much so far. What I like is that it seems to strike the right balance between hype and being critical.

Find on Goodreads Amazon US Amazon UK

I remember his two books he wrote a while ago: Neural Networks for Pattern Recognition and a more recent one Pattern Recognition and Machine Learning. Great quality. His newest book is, obviously, Deep Learning: Foundations and Concepts


That's it, I hope you enjoyed it as much as I did. Stay tuned! Next week we will look at Warp ADE in more details.