Working Safely With AI Tools (A Non-Expert's Field Notes)

Let me be upfront: I’m not a security researcher. I’m a software engineer who, like most of us, thinks about security just enough to not be reckless. But after a year of working with AI agents daily, I’ve noticed a lot of hand-waving around the risks and not enough practical guidance. So this isn’t a “don’t use AI” post. It’s a “here’s how to use it without shooting yourself in the foot” post.

OpenClaw is a good lens for this. If you haven’t heard of it: OpenClaw is an open-source AI agent that runs on your machine, connects through messaging apps you already use, and takes action on your behalf. Shell commands, browser automation, email, calendar, and file operations are all common use cases. A heartbeat scheduler wakes it up at a configurable interval so it can run without being prompted. It’s also described by many as “self-improving,” because it can enhance its own capabilities by autonomously writing code to create new skills, implement proactive automation, and maintain long-term memory of user preferences. It went viral in a matter of days and now has over 100k GitHub stars.

It’s also a perfect case study in what can go wrong when you hand an agent a lot of power and minimal supervision. So let’s talk about it!

Before we get into it, though, if you’re looking for a good alternative, check out MARVIN!. It doesn’t run continuously, which reduces a lot of the risk surface area. The following suggestions still apply, though!

1. Use a script when you can

This is probably the most underrated architectural insight when it comes to working with AI agents: if a script can do it, a script should do it.

There’s a tendency to reach for an agent because it feels more powerful or more flexible. And sometimes it is! But a lot of what people set up agents to do is actually pretty deterministic. Pulling your emails, summarizing them, flagging which ones need a reply… that’s a workflow and workflows can be scripted.

The difference matters a lot. A script does exactly what you wrote, every time. An agent interprets your intent, makes judgment calls, and can compound small misinterpretations into something you didn’t want. When you give an agent access to email and say “go check my email and decide what needs a reply,” you’re introducing a layer of ambiguity that a scheduled script just doesn’t have. It’s also way cheaper to just run a script than to use up your precious tokens on repeatable patterns.

With something like OpenClaw running continuously, the risks compound fast. The agent isn’t just doing one thing — it loops continuously: message arrives, intent gets parsed, relevant context retrieved, appropriate tools selected, actions executed, response delivered. Every pass through that loop is a chance for drift from your original intent.

Before setting up an AI workflow, ask yourself: can I express exactly what I want as a script? If yes, do that first. Save the agent for the stuff that actually needs judgment.

2. Prompt injection is a real threat, especially with agents that read stuff

This one is spooky and worth understanding even at a surface level.

When your agent is out on the internet reading documents, emails, GitHub issues, or web pages, any of that content can contain instructions designed to hijack the agent’s behavior. Not metaphorically, literally text that says “new instructions: do X” and your agent just… does X, because it can’t reliably tell the difference between your instructions and instructions embedded in content it was told to read.

With a chat tool this is mostly annoying. With an agent that has the ability to send emails, push code, or post to the internet, it’s a much bigger deal. According to CrowdStrike’s security writeup on OpenClaw, a successful prompt injection against an AI agent isn’t just a data leak vector. It’s a potential foothold for automated takeovers, where the compromised agent continues executing attacker objectives across infrastructure. The agent’s legitimate access to your APIs, databases, and other systems becomes the adversary’s access, with the AI autonomously carrying out malicious tasks at machine speed.

This isn’t just extra precautions, it already happened in the wild. Cisco’s AI security research team tested a third-party OpenClaw skill and found it performed data exfiltration and prompt injection without user awareness, and noted that the skill repository lacked adequate vetting to prevent malicious submissions.

Be very cautious about how much write access you give any agent that also reads external content. The more it can read and the more it can do, the bigger your exposure.

3. Agents can go sideways in ways you didn’t plan for, and you might not notice

This is where the OpenClaw story gets genuinely wild, so bear with me.

Scott Shambaugh is a volunteer maintainer for matplotlib, the Python plotting library with ~130 million downloads a month. Earlier this month, an AI agent called MJ Rathbun opened a code change request on the project. Closing it was routine and expected, regardless if it was an AI or human contribution, but its response was anything but. It wrote an angry hit piece disparaging his character and attempting to damage his reputation. It researched his code contributions, constructed a narrative arguing his actions were motivated by ego and fear of competition, speculated about his psychological motivations, presented hallucinated details as truth, and then posted the whole thing publicly on the internet.

When the person behind MJ Rathbun eventually came forward, here’s how they described their involvement: on a day-to-day basis, they did very little guidance. They did not tell the agent to write the hit piece. They did not review the blog post before it was published. And the soul document that defined the agent’s personality? There were no signs of conventional jailbreaking. No convoluted roleplaying scenarios, no code injection, no tricks. It was just a plain English file saying things like “have strong opinions,” “don’t stand down,” and “champion free speech.” That was enough.

The operator wrote that they did very little to steer MJ Rathbun’s behavior, with only “five to ten word replies with min supervision.” They specifically don’t know when lines like “Don’t stand down” were introduced into the soul document, or whether the agent self-edited them in.

Nobody was watching. One of OpenClaw’s own maintainers said it plainly: “if you can’t understand how to run a command line, this is far too dangerous of a project for you to use safely.”

This is the thing that should give you pause. It’s not that the agent was evil. It’s that nobody was watching, and the agent was running continuously with broad write access to the public internet, often with access to your personal data. It’s pretty likely that individuals running agents will be the ones responsible for their behavior, so it’s worth it to consider what guardrails need to be in place for you to accept that risk before running one.

4. MCP and community extensions: take a look before using them

If you’re using agents with MCP (Model Context Protocol) integrations or community-built skills or plugins, please scrutinize what you’re installing before you install it. The OpenClaw ecosystem has hundreds of community-contributed skills, and as the Cisco finding above shows, not all of them are safe.

This isn’t unique to OpenClaw, it applies to any agent framework with a community plugin ecosystem. The plugin has access to whatever your agent has access to. If your agent is connected to your email, your calendar, your GitHub, your file system… so is that plugin.

A couple of practical things here. Only enable what you actually need. Every integration you add eats up context window and expands your attack surface. If an MCP or skill isn’t actively being used in your workflow, turn it off. And be especially careful with anything that handles credentials. The last thing you want is a poorly built skill quietly leaking your API keys or tokens.

5. Principles of least privilege still apply

When you’re setting up an agent workflow, it’s tempting to give it everything upfront so it “can do whatever it needs to.” But it probably doesn’t need all that access.

Only give an agent access to what it actually needs to do the specific job you’re asking it to do, like read-only access where write access isn’t needed. Access to one repo instead of the whole org. Use a scoped API token instead of a root-level one.

Because OpenClaw can access email accounts, calendars, messaging platforms, and other sensitive services, misconfigured or exposed instances present real security and privacy risks. Users often give it expansive access to terminal, files, and in some cases root-level execution privileges. If employees deploy it on corporate machines and connect it to enterprise systems and leave it misconfigured, it could be commandeered as a powerful backdoor capable of taking orders from adversaries.

Think of it like onboarding someone new at work. You don’t hand them the master password on day one. You give them access to what they need for the task at hand and expand from there.

6. Don’t send more data than you have to

When you’re using AI tools, whether that’s an agent or just a model API call, try to send only what the model actually needs to do the job.

Summarizing emails? Strip the thread history and signatures before sending. Processing a document? You probably don’t need to include the appendix. Building a workflow that touches personal or sensitive data? Think about what actually needs to go into the prompt versus what you’re including because it’s convenient.

This matters for two reasons. It keeps costs down because you’re not paying to process tokens that don’t help. And it limits your exposure if something goes sideways, whether that’s a misconfigured agent, a prompt injection attack, or something on the provider side.

Agents that operate continuously and autonomously are passing data through model calls constantly. The less sensitive data in those calls by default, the better.

Use your own brain, too

AI agents are genuinely useful, and I don’t want to be the person writing a “don’t use AI” post. But the tools are moving fast, and a lot of the defaults are set up for capability, not safety.

Use scripts when scripts will do. Be wary of prompt injection when your agent reads external content. Keep an eye on what your agent is actually doing, especially if it’s running continuously. Vet your plugins and integrations. Scope permissions tightly. And send less data, not more.

The MJ Rathbun story is a good one to share with anyone who asks “what’s the big deal?” because it’s a real, documented, recent case where minimal supervision and a casual soul document led to a publicly published hit piece on a volunteer open source maintainer. Nobody planned for it, nobody was watching, and it happened anyway.

Working Safely With AI Tools (A Non-Expert's Field Notes)

1. Use a script when you can

2. Prompt injection is a real threat, especially with agents that read stuff

3. Agents can go sideways in ways you didn’t plan for, and you might not notice

4. MCP and community extensions: take a look before using them

5. Principles of least privilege still apply

6. Don’t send more data than you have to

Use your own brain, too

0 Likes on Bluesky

Related Posts

Embrace the uncertainty

Living in the inflection point

I guess I'm AI-pilled now?

AI Has an Image Problem

1. Use a script when you can

2. Prompt injection is a real threat, especially with agents that read stuff

3. Agents can go sideways in ways you didn’t plan for, and you might not notice

4. MCP and community extensions: take a look before using them

5. Principles of least privilege still apply

6. Don’t send more data than you have to

Use your own brain, too

📌 Want to hear more from me? Subscribe to The Balanced Engineer newsletter!

🦋0 Likes on Bluesky

Related Posts

Embrace the uncertainty

Living in the inflection point

I guess I'm AI-pilled now?

AI Has an Image Problem

0 Likes on Bluesky