Open Source After Dependencies (Part V): Adding Guardrails Without Losing the Point
Evolutionary systems target the right audience
Up to now, this series has been deliberately optimistic.
Not naïve—but optimistic in the sense that it assumed a missing piece rather than a broken one. The core idea was simple: if LLMs make regeneration cheaper than reuse, then maybe dependencies stop being the default unit of sharing. Maybe intent, constraints, and tradeoffs take their place.
The pushback I got after the last articles was useful, because it wasn’t ideological. It was practical.
“This breaks the moment it meets reality.”
This article is about what changed my mind—and what didn’t.
The First Mistake: Treating Generation as Enough
Early versions of this idea implicitly assumed that if you generate multiple variants and benchmark them, you’ll end up with something good enough.
That’s not true.
Generation alone produces plausible code, not reliable systems. The moment you treat LLM output as trustworthy, you recreate the same dependency problem—just with a different abstraction boundary.
The correction is simple but uncomfortable:
LLM-generated code must be treated as hostile by default.
Once you accept that, everything else falls into place.
Guardrails Are Not Optional
To survive contact with production, an intent-driven system needs guardrails that are explicit, enforced, and boring.
Some of the ones that turned out to matter most:
Variant budgets Unlimited exploration feels powerful and quickly becomes noise. Putting a hard cap on variants forces clarity about why you’re exploring.
Pruning before generation Not every idea deserves a full implementation. Early filtering based on diversity and constraints saves human attention—the real scarce resource.
Adversarial tests by default Benchmarks alone reward speed, not correctness. Fuzzing, failure injection, and worst-case scenarios have to be part of the spec, not optional add-ons.
None of this is exciting. That’s a good sign.
Benchmarks Lied to Us Once. Let’s Not Let Them Do It Again.
One uncomfortable realization: benchmarks are easy to game, even unintentionally.
If you optimize for a number long enough, you stop seeing what the number doesn’t measure.
The fix wasn’t to abandon benchmarks, but to demote them:
quantitative metrics guide
human judgment gates
real incidents override both
I started thinking of benchmarks less as truth and more as pressure. They push designs in certain directions, but they don’t decide where to stop.
That decision still belongs to humans.
Where Institutional Knowledge Was Quietly Leaking
The most serious flaw in the original model was what it erased: history.
Libraries don’t just contain code. They contain:
scars from past failures
weird edge cases no test captures
implicit knowledge about what doesn’t work
If regenerated code doesn’t preserve that, it resets the learning curve every time.
The fix was to stop treating failed variants as waste.
Every intent now accumulates:
explored variants
rejected paths
known failure modes
rationale for decisions
This doesn’t recreate a maintainer’s intuition—but it stops us from pretending intuition doesn’t exist.
A Better Question Than “Is This Secure?”
Security discussions kept circling the same question:
“Can we trust LLM-generated code?”
That turned out to be the wrong question.
A better one is:
“Can we verify it cheaply enough to be worth generating?”
Static analysis, license scanning, multi-model critique—none of these guarantee safety. But together they make risk visible, which is more than most dependencies offer today.
The goal isn’t zero risk. It’s known risk.
This Changed How I Think About Maintainers
Earlier in the series, I suggested maintainers would become curators of ideas rather than caretakers of code. That’s still true—but it was incomplete.
What maintainers actually provide is context compression.
They know which paths were tried. They remember why something was rejected. They recognize failure patterns before metrics do.
The updated model doesn’t eliminate that role. It externalizes it:
into intent histories
into shared failure scenarios
into design notes that travel with the code
Maintenance doesn’t disappear. It becomes legible.
The Cost We Don’t Get to Avoid
None of these corrections make the system easier.
They make it honest.
This model:
asks developers to think more
rewards judgment over convenience
makes tradeoffs explicit instead of hiding them in dependencies
That’s not a universal win.
But it does align with what experienced teams already do privately—just without forcing everyone else to inherit their undocumented context.
What Survived the Corrections
After all the pushback and revisions, a few things didn’t change:
Dependencies are still a leaky abstraction for decision-making.
LLMs are still better at enumeration than judgment.
Open source still loses something when it optimizes for traffic over understanding.
Human creativity still lives in defining what matters, not in writing glue code.
Those feel like solid ground.
Where This Leaves Me
I don’t think this paradigm replaces today’s ecosystem.
I do think it deserves to exist alongside it—especially in places where:
dependency weight dominates code size
upgrades cause more anxiety than features
teams already vendor or fork out of necessity
If nothing else, it offers a forcing function:
“If we had to own this code, would we still choose it?”
That question alone has changed how I look at dependencies.
And for now, that’s enough.