Call us now

AI Agents in Action: Foundations for Evaluation and Governance

A recent World Economic Forum report shows that 82% of organisations plan to integrate AI agents within the next three years, yet most are still in the pilot or planning phase.

Why the hesitation, if the potential is so great? Part of the answer lies in the shift from “AI that analyses” to “AI that decides and acts”. This paradigm shift isn’t technical, it’s about processes, digital maturity and organisational culture. Who has the authority to do what? Who answers when something goes wrong?

Until now, AI was a sophisticated analysis tool that gave you data, predictions and recommendations, while humans decided what to do with them. With Agentic AI, agents change the equation. Sometimes they no longer wait for you to decide.

Automation vs autonomy

There’s confusion between automation and autonomy, and the difference is essential if you want to understand what an AI agent can and cannot do.

Automation means a system executes predefined tasks in a predictable manner. You do X, Y happens. Always the same, in the same order. An automated workflow that does exactly what you told it, when you told it.

An autonomous system can decide on its own when and how to act to achieve an objective, adapting to context independently of human decision-making.

For example, an automated invoicing system generates the invoice on the same date, in the same format, every time. An autonomous AI agent might decide to delay sending an invoice because it detected a discrepancy in the order and requested clarification from the client.

This flexibility is the power of AI agents, and it comes with the responsibility of clearly defining boundaries. What data it can access, what actions it can take on its own, where it stops and requests human approval. These boundaries aren’t optional they must be established from the start.

Which decisions can be delegated

Not all decisions are equal, and not all can be delegated.

Think about the decisions your team makes daily. Some repeat dozens of times, follow clear rules and require no human judgement. If the request is under £500, approve automatically. If the client hasn’t responded in 48 hours, send a reminder. If the lead scores above 70, move them to sales.

These are ideal candidates for an AI agent. High volume, explicit rules, and if something goes wrong, the impact is limited and usually reversible.

Then there are decisions that should remain human. Ambiguous or exceptional situations where context matters, context based on data and information an AI agent doesn’t have access to, and even human intuition. We’re talking about decisions with significant, irreversible impact and everything involving complex human relationships: negotiations, conflict management, team motivation.

An example from the World Economic Forum report describes a customer service agent configured to optimise for “fast resolution” that might close tickets prematurely without actually solving the client’s problem. Technically, the KPI looks excellent, but in reality, the customer leaves dissatisfied. The decision of how we define success remains with humans, while the agent executes what we ask—but may not understand what we actually wanted to achieve. Hence the need for clarity in requirements.

Concrete business risks

When we talk about risks, we’re referring to things already happening in organisations adopting AI agents without a clear framework.

The first and most common is goal misalignment: the agent optimises for what you said, not what you actually meant. I’ve already given the customer service example. Another would be a sales agent offering maximum discounts just to close deals quickly. Target achieved, but margins are down.

The second is behavioural drift: performance degrades over time without you noticing. An agent trained on data from six months ago may no longer reflect today’s market reality, and without continuous monitoring, you have no way of knowing.

The third appears when you have multiple agents working together: cascading failures. One agent miscommunicates with another, which miscommunicates with a third. In an interconnected system, a small error propagates and becomes large, a kind of Chinese whispers effect.

And perhaps most importantly: the accountability gap. When something goes wrong, who answers? The agent cannot be held accountable because it’s a computer programme it has no legal personality, you can’t sack it, you can’t sue it. Responsibility remains with the people who configured, implemented and supervised it.

How the manager’s role changes

If AI agents take over some operational decisions, what’s left for management?

More than you’d think, but in a different form.

First, defining the framework. What objectives we’re pursuing, what limits exist, what rules apply, where the agent stops and requests approval. This can never be delegated. It’s management’s responsibility to define the framework, and IT’s job to implement it technically.

Then, oversight in the form of continuous monitoring, periodic audits and intervention when something isn’t right. The WEF report makes a useful distinction here: you can have human-in-the-loop, where the agent recommends but you decide, or human-on-the-loop, where the agent decides within established limits while you monitor and intervene when needed.

And finally, accountability. Regardless of how autonomous the agent is, a human answers for what it does in the company’s name. Clients won’t blame an algorithm, and neither will the law.

The WEF report uses an analogy I find particularly apt: treat adopting an AI agent like hiring a new employee. You do onboarding, clearly define what they’re allowed to do and what they’re not, set and monitor performance, and gradually increase trust and responsibilities as you see results. Why would you treat an AI agent any differently?

How to adopt intelligently

From our experience with AI systems, we recommend progressive adoption. Don’t start with an agent that decides everything on its own start with small decisions, low stakes, and gradually increase the level of involvement.

At the first level, the agent only assists: it analyses data, recommends actions, but the final decision rests with humans. Risk is minimal, and the team learns how AI thinks and where it fails. This is the phase where you build trust.

At the second level, the agent decides within clear limits. It can approve requests below a certain threshold, respond to certain types of questions, process standardised orders. Humans monitor and intervene at exceptions. Here you need complete logging and audit capability for visibility.

At the third level, the agent manages entire processes. Humans define the framework and verify periodically. But this only comes after concrete evidence that it works not after promises and colourful slides.

The key here is not to skip levels. Each step provides information about what works and what doesn’t, and each level builds the trust necessary for the next.

And if something isn’t working properly, you can adjust parameters at any time. You can reduce autonomy, restrict access and add control points. Flexibility works both ways.

Delegating business decisions to AI isn’t a question of “if” but “which, when and how”. Start with simple, repetitive decisions with limited impact. Define clear boundaries, monitor constantly and gradually increase autonomy as you build confidence in the system.

The fundamental difference is that humans have something to lose; AI agents don’t. Responsibility remains with people for the framework we define, for oversight and for the results we achieve with AI’s help.

Technology serves a purpose. It isn’t the purpose itself.

AI for Business

Education, strategy, and implementation for companies that want to use Artificial Intelligence in a clear, efficient, and responsible way. Everything is tailored to your internal activities, team maturity, and business objectives.

  • Custom AI workshop
  • Activity audit and AI strategy
  • Internal and commercial AI usage policies
  • AI governance guide
  • Process automation
  • AI agents for repetitive tasks
  • AI integration into applications and websites
  • Website preparation for GEO (Generative Engine Optimisation)
  • Documentation, support, and maintenance

Find out more