Open Source After Dependencies (Part V): Adding Guardrails Without Losing the Point

Up to now, this series has been deliberately optimistic.

Not naïve—but optimistic in the sense that it assumed a missing piece rather than a broken one. The core idea was simple: if LLMs make regeneration cheaper than reuse, then maybe dependencies stop being the default unit of sharing. Maybe intent, constraints, and tradeoffs take their place.

The pushback I got after the last articles was useful, because it wasn’t ideological. It was practical.

“This breaks the moment it meets reality.”

This article is about what changed my mind—and what didn’t.

The First Mistake: Treating Generation as Enough

Early versions of this idea implicitly assumed that if you generate multiple variants and benchmark them, you’ll end up with something good enough.

That’s not true.

Generation alone produces plausible code, not reliable systems. The moment you treat LLM output as trustworthy, you recreate the same dependency problem—just with a different abstraction boundary.

The correction is simple but uncomfortable:

LLM-generated code must be treated as hostile by default.

Once you accept that, everything else falls into place.

Guardrails Are Not Optional

To survive contact with production, an intent-driven system needs guardrails that are explicit, enforced, and boring.

Some of the ones that turned out to matter most:

Variant budgets Unlimited exploration feels powerful and quickly becomes noise. Putting a hard cap on variants forces clarity about why you’re exploring.

Pruning before generation Not every idea deserves a full implementation. Early filtering based on diversity and constraints saves human attention—the real scarce resource.

Adversarial tests by default Benchmarks alone reward speed, not correctness. Fuzzing, failure injection, and worst-case scenarios have to be part of the spec, not optional add-ons.

None of this is exciting. That’s a good sign.

Benchmarks Lied to Us Once. Let’s Not Let Them Do It Again.

One uncomfortable realization: benchmarks are easy to game, even unintentionally.

If you optimize for a number long enough, you stop seeing what the number doesn’t measure.

The fix wasn’t to abandon benchmarks, but to demote them:

quantitative metrics guide

human judgment gates

real incidents override both

I started thinking of benchmarks less as truth and more as pressure. They push designs in certain directions, but they don’t decide where to stop.

That decision still belongs to humans.

Where Institutional Knowledge Was Quietly Leaking

The most serious flaw in the original model was what it erased: history.

Libraries don’t just contain code. They contain:

scars from past failures

weird edge cases no test captures

implicit knowledge about what doesn’t work

If regenerated code doesn’t preserve that, it resets the learning curve every time.

The fix was to stop treating failed variants as waste.

Every intent now accumulates:

explored variants

rejected paths

known failure modes

rationale for decisions

This doesn’t recreate a maintainer’s intuition—but it stops us from pretending intuition doesn’t exist.

A Better Question Than “Is This Secure?”

Security discussions kept circling the same question:

“Can we trust LLM-generated code?”

That turned out to be the wrong question.

A better one is:

“Can we verify it cheaply enough to be worth generating?”

Static analysis, license scanning, multi-model critique—none of these guarantee safety. But together they make risk visible, which is more than most dependencies offer today.

The goal isn’t zero risk. It’s known risk.

This Changed How I Think About Maintainers

Earlier in the series, I suggested maintainers would become curators of ideas rather than caretakers of code. That’s still true—but it was incomplete.

What maintainers actually provide is context compression.

They know which paths were tried. They remember why something was rejected. They recognize failure patterns before metrics do.

The updated model doesn’t eliminate that role. It externalizes it:

into intent histories

into shared failure scenarios

into design notes that travel with the code

Maintenance doesn’t disappear. It becomes legible.

The Cost We Don’t Get to Avoid

None of these corrections make the system easier.

They make it honest.

This model:

asks developers to think more

rewards judgment over convenience

makes tradeoffs explicit instead of hiding them in dependencies

That’s not a universal win.

But it does align with what experienced teams already do privately—just without forcing everyone else to inherit their undocumented context.

What Survived the Corrections

After all the pushback and revisions, a few things didn’t change:

Dependencies are still a leaky abstraction for decision-making.

LLMs are still better at enumeration than judgment.

Open source still loses something when it optimizes for traffic over understanding.

Human creativity still lives in defining what matters, not in writing glue code.

Those feel like solid ground.

Where This Leaves Me

I don’t think this paradigm replaces today’s ecosystem.

I do think it deserves to exist alongside it—especially in places where:

dependency weight dominates code size

upgrades cause more anxiety than features

teams already vendor or fork out of necessity

If nothing else, it offers a forcing function:

“If we had to own this code, would we still choose it?”

That question alone has changed how I look at dependencies.

And for now, that’s enough.