Are Agent Skills Are the Next Supply Chain Nightmare
Posted on May 6, 2026 • 4 minutes • 732 words
Are Agent Skills Are the Next Supply Chain Nightmare
Welcome back! As we start moving faster into a world of agent adopted schenanigans, people and companies are starting to treat agents as second or even first class EMPLOYEE’S. That’s right, not just a system, service, function or even tool, but a fully operational, task driven employee. So, I don’t think I’m alone in saying that this is bad. Most people understand insider threats, but many don’t actually consider them a true risk. Why? Why wouldn’t you? In this global economy it should be taken more seriously than before. People are willing to take actions that are detrimental to many companies to stay ahead, afloat or needed. That’s a discussion for another time though. Back to the idea here, is that agents are NOT employee’s and they are untrusted attack surfaces waiting to be abused. Recently, I read this paper on skills as verifiable artifacts , which prompted this post (thought?).
We’ve spent the last couple of years handing over the keys to the kingdom. AI agents aren’t just chatting anymore, they’re executing. We’re wiring them into databases, hooking them up to our infrastructure, and feeding them specialized “skills” via SKILL.md files so they can get work done on our behalf, or an unknowning customers behalf. For those old enough to remember, it feels like the frontier again. A digital wild west where anyone can spin up an agent and watch it tear through tasks. But we’re walking into a massive supply chain trap. If you think a digital signature means a downloaded skill is safe, you’re going to get burned.
The Signature Illusion
Here is the reality of the sprawl we’re building: developers treat registries like they’re holy ground. You pull a package from a trusted source, you see it’s signed, and you assume it’s clean. We’ve seen exactly how that plays out with npm and PyPI. Now, scale that up to an autonomous, agentic system. A signature only tells you who uploaded the code. It doesn’t tell you a damn thing about what the code actually does.
A skill is fundamentally untrusted code. When an agent loads it, that skill becomes its core instruction set. If a malicious payload is tucked inside, whether deliberately by an adversary or just hiding in a poisoned dependency, the LLM becomes an the executor. It’s a direct prompt injection vector that persists across sessions and survives context truncation.
The Human-in-the-Loop Burnout
So, what is the industry’s default answer to this? “Human-in-the-loop” (HITL). If the agent tries to do something destructive—write to a file, drop a table, send a payment—it halts and asks a human for a thumbs up. In my opinion, this is pure security theater. If you correctly treat every downloaded skill as untrusted, the gate fires on every single irreversible action. Alert fatigue kicks in immediately. Operators under heavy load stop verifying and just start stamping ‘Accept’ just to clear the queue. The human-in-the-loop becomes a human-in-the-way, and the entire system collapses under its own friction.
Forcing the Code to Prove Itself
We have to stop inferring trust and start forcing the code to prove its behavior. Alfredo Metere’s paper I mentioned above, maps out the exact architecture needed to lock this down. Rather than hoping for the best, the research points to a few non-negotiable primitives for agent runtimes:
- The Trust Schema: Treating skills as immutable artifacts where capabilities are hard-coded into a signed manifest, completely disabling runtime mutation.
- Offline Adversarial Evaluation: Throwing skills into a sandbox with rogue LLM agents deliberately prompted to break things, testing the code’s boundaries before it ever touches production.
- Biconditional Correctness: Relying on deterministic math, not vibes, to prove that a skill’s hash-chained audit log perfectly matches its actual executed side-effects.
When code (skills) passes it earn its clearance. Once verified, the runtime can finally trust those specific capabilities, and your HITL gate stops screaming at you for every menial task.
Stop Trusting, Start Verifying
We’re bolting agents into our architecture faster than we can properly threat-model the consequences. Curated skill libraries are a massive leap forward, but blindly trusting them based on origin alone is a zero-day waiting to happen. It’s time to treat agent skills like the highly potent, potentially lethal execution vectors they are. Verify the artifact, lock down the capabilities, and keep the human out of the loop until it actually matters.
