Failory
Posts
A Language Without a Market

A Language Without a Market

Why Wing couldn’t turn open-source love into revenue

Nicolás Cerdeira
April 10, 2025

Hey — It’s Nico.

Welcome to another Failory edition. This issue takes 5 minutes to read.

If you only have one, here are the 5 most important things:

Wing Cloud, a startup that wanted to reinvent how we build for the cloud, has shut down — learn why below.
A new framework for AI agent pricing.
OpenAI says it’ll release o3 after all, delays GPT-5.
Artisan, the ‘stop hiring humans’ AI agent startup, raises $25M.
Meta released Llama 4 and sparked controversy — learn more below.

Let’s get into it.

This Week In Startups

🔗 Resources

The most common mistakes with Pricing Pages.

Brand As An Enduring Moat.

State of Fintech in 2025.

A new framework for AI agent pricing.

📰 News

Anthropic rolls out a $200-per-month Claude subscription.

Deep Cogito emerges from stealth with hybrid AI ‘reasoning’ models.

OpenAI says it’ll release o3 after all, delays GPT-5.

Amazon unveils a new AI voice model, Nova Sonic.

💸 Fundraising

Networking startup Tailscale raises $160M.

Autonomous Driving Startup Nuro Raises $106M.

Artisan, the ‘stop hiring humans’ AI agent startup, raises $25M.

US police drone startup Brinc Drones raises $75m.

Fail(St)ory

Cloud First Development

This week, Wing Cloud, a startup that wanted to reinvent how we build for the cloud, announced it’s shutting down.

Less than two years ago, Wing came out of stealth with $20 million in funding and a bold idea: make cloud development dramatically easier by unifying app code and infrastructure into a single programming language.

The vision got developers excited. Investors too. But in the end, it wasn’t enough to turn that excitement into a business.

What Was Wing Cloud:

If you’ve ever worked on a cloud app, you know the pain: the code lives in one place, the infrastructure lives in another, and half the time you’re jumping between five tools to deploy anything.

Wing Cloud tried to fix that. Their tool, Winglang, was a custom programming language designed to let you write both your app’s logic and its infrastructure in one place.

The code could be compiled into deployable packages for AWS Lambda, Kubernetes, Terraform, CloudFormation, and more.

It also included a local cloud simulator, allowing developers to test and debug their full app before deploying anything. Plus, it offered a real-time visual view of cloud architecture and data flow—something meant to bridge the gap between developers and operators.

Winglang was open-source from the start, and quickly gained traction. Over 100 contributors joined, and a community started to form around the tool. It was one of those projects that genuinely resonated with engineers.

But as it turns out, solving a pain point for developers doesn’t always mean you’re solving a business-critical problem.

The Numbers:

📅 Founded: April 2022
💸 Funding: $20M Seed round announced in July 2023
💻 Tech: Winglang, a custom language compiling to multiple cloud platforms
👨‍💻 Community: 100+ open-source contributors

Reasons for Failure:

It solved a real problem, but not one companies paid for: In a LinkedIn post, co-founder and CEO Elad Ben-Israel explained it clearly: “While developer experience is something engineers care deeply about, it’s not always seen as business-critical by most companies.” That line sums up the core issue: Wing built a product that developers loved, but most companies weren’t prioritizing.
They never found the “right problem”: Wing wasn’t just building a devtool—they were trying to define a new way of writing cloud apps. And that’s not a small pivot. According to Elad: “Over the past year, we explored various different directions in the cloud space—looking for a problem we were both passionate about and could build a company around. In the end, we didn’t find the right fit.” This suggests that even with early traction, they weren’t confident the initial problem was big enough or clear enough to scale around.
Hard to Monetize Open Source: Open-sourcing Winglang helped it spread, but monetizing open-source developer tools is notoriously difficult. Without an enterprise product or hosted offering that solved a painful business problem, it was tough to turn community excitement into revenue.
The market might not have been ready: Asking developers to learn a new language—no matter how elegant—is a big ask. Wing wasn’t just an incremental improvement; it was a shift in how people think about cloud development. That kind of change can take years, and early-stage startups don’t always get that much time.

Why It Matters:

Most technical founders underestimate how much of a product’s success depends on business buyers, not just developers.
Tools that developers love don’t automatically translate into viable companies. The gap between technical excitement and commercial traction is real.
Timing matters: creating a new category (like a cloud-oriented language) is a long, uphill battle. Without quick wins or revenue, it’s hard to survive long enough to educate the market.
Open source is great for distribution, but without a clear monetization path, it can lead to popularity without sustainability.

Trend

Llama 4

This week, Meta dropped their latest AI models: Llama 4. The lineup includes three new models — Scout, Maverick, and Behemoth — and they’re making some bold claims about speed, quality, and scale.

Take a look under the hood of Llama 4 Scout and Llama 4 Maverick – our most advanced AI models yet 🧵
— AI at Meta (@AIatMeta)
8:30 PM • Apr 5, 2025

Why It Matters:

Cheaper and faster: Llama 4 uses a more efficient architecture that gives you better results without the heavy compute bill.
Multimodal by default: Text and image are fused into one model. No extra hacks needed.
Massive context windows: Scout can handle 10 million tokens. That opens the door for long docs, codebases, and transcripts.
Some drama: Allegations of benchmark manipulation and a missing technical paper have raised eyebrows.

The Herd

Meta is moving to mixture-of-experts (MoE) models. That means each prompt only activates a fraction of the model, just the “experts” needed, instead of running the full thing. You save compute and still get strong performance.

They released two models publicly:

Maverick is the flagship. 128 experts, 17B active parameters per prompt. It beats or matches top models like GPT-4o and Gemini 2.0 on tasks like coding, reasoning, and image understanding. It’s also cheaper than previous Llamas, which is good news for anyone running their own infra.
Scout is the smaller one, but has a standout feature: 10 million tokens of context. That’s a big leap from the usual 128K. You can now pass in entire books, product manuals, or customer chat histories without chunking.

Both are multimodal, meaning they can handle text and images natively. No wrappers, no extra models — just one system that gets it. They also trained on stills from videos, so they’re better at understanding sequences and context across frames.

Sitting behind them is Behemoth, the biggest model Meta has ever built. It’s not public yet, but it’s huge — nearly 2 trillion parameters. It was used to “teach” Scout and Maverick through distillation, a process where a giant model trains smaller ones. This is becoming a go-to strategy: train one monster, use it to spin off efficient models for real-world use.

The Drama

Of course, no major AI release is complete without a bit of chaos.

Shortly after launch, a post popped up on Substack from someone claiming to be a former Meta employee. The anonymous author said they resigned in protest — and accused Meta of juicing the numbers to make Llama 4 look better than it actually is.

The core allegation: Meta may have trained the model on test sets, which would inflate benchmark scores. Basically, imagine a student getting the exam answers before the test and still bragging about their score.

The rumor also echoes a broader concern in AI right now — that as models get bigger and training datasets more opaque, accidental “contamination” is becoming harder to detect and easier to deny.

Meta quickly responded. Their VP of GenAI said, “That’s simply not true and we would never do that.” No technical paper has been released yet, so for now, it’s a lot of speculation, finger-pointing, and subtext.

We're glad to start getting Llama 4 in all your hands. We're already hearing lots of great results people are getting with these models.
That said, we're also hearing some reports of mixed quality across different services. Since we dropped the models as soon as they were
— Ahmad Al-Dahle (@Ahmad_Al_Dahle)
5:49 PM • Apr 7, 2025

Whether you believe the post or not, it raises an important point: leaderboards are broken. Everyone’s optimizing for benchmarks, but if the test set leaks, the scores don’t mean much. It’s a reminder to focus less on leaderboard bragging rights and more on real-world use.

Help Me Improve Failory

How Was Today's Newsletter?

If this issue was a startup, how would you rate it?

That's all of this edition.

Cheers,

Nico