The Question

Every enterprise AI agent deployment makes an oversight decision: how much of the agent's workflow executes without human involvement before a human must review or approve? This decision is rarely made explicitly. Implementation teams configure agents for the level of autonomy that makes the workflow functional and the prototype impressive, and that configuration becomes the production configuration without a deliberate review of the risk, liability, and regulatory implications.

The result is a patchwork of oversight levels that do not reflect the actual risk profile of each action type. Agents that handle high-stakes financial commitments operate with the same autonomy as agents that route support tickets. Agents that take irreversible actions have the same oversight level as agents that generate draft communications. The oversight decision was made by accident, not by design.

This brief provides the framework for making the oversight decision deliberately — per workflow, per action type, per risk tier — with the risk, liability, and regulatory implications of each oversight level explicitly understood before the deployment goes live.

The human-in-the-loop decision is not a design preference — it is a risk, liability, and regulatory decision that must be made deliberately for each agent action type, with the consequences of each oversight level explicitly understood.


Why This Matters Now

The Air Canada precedent established the liability dimension of AI agent autonomy in concrete terms. In February 2024, the British Columbia Civil Resolution Tribunal ruled that Air Canada was liable for commitments made by its customer-facing chatbot — a chatbot that had incorrectly informed a passenger that bereavement fares could be applied retroactively. Air Canada's defense — that the chatbot was a "separate legal entity" responsible for its own statements — was rejected. The ruling established that enterprises are responsible for commitments made by AI systems operating on their behalf, regardless of whether a human made or reviewed those commitments.

The implications for agentic AI are more significant than for chatbots. Chatbots provide information. Agents take actions — create contracts, issue refunds, send communications, modify records, make purchases. If an autonomous agent takes an action that creates a legal commitment or causes financial harm, the Air Canada precedent suggests the enterprise is liable for that action, without the shield of "the AI did it."

Simultaneously, the EU AI Act's implementing guidance, published through 2025, clarified that agentic AI systems involved in consequential decisions — employment, credit, healthcare, law enforcement, and significant customer-facing commitments — must maintain human oversight mechanisms at consequential decision points. The Act does not require human approval for every agent action, but it requires that human oversight is available and exercised at points where the agent's decisions are consequential and potentially irreversible.

These two developments — liability precedent and regulatory guidance — transformed the human-in-the-loop decision from an implementation choice into a legal and compliance question that requires legal and compliance input.


What the CURVE™ Data Shows

The 2026 Stackcurve AI Enterprise Agent Platform CURVE™ Report evaluated human-in-the-loop configuration capability as a dedicated assessment dimension, examining how granularly each platform allows oversight levels to be configured per action type and per workflow step.

The evaluation revealed a significant capability gap between enterprise platforms and point solutions. Enterprise platforms — Salesforce Agentforce, Microsoft Copilot Studio, ServiceNow AI Agents — all support workflow-step-level approval configuration, allowing administrators to designate specific actions as requiring human approval while others execute autonomously. The configuration interfaces differ in sophistication, but the capability exists natively in all three.

Point solutions and framework-based deployments — LangChain, CrewAI, custom AutoGen deployments — require custom engineering to implement human approval gates. The flexibility is higher — any approval logic can be implemented in code — but the governance burden is also higher, and the audit trail for approval events is the developer's responsibility rather than the platform's.

The CURVE™ Report specifically evaluated: approval threshold configuration (can you set a $500 approval threshold without writing code?), approval workflow integration (does the approval request route to the right human via the right channel?), escalation logic (what happens if the approver does not respond within the defined window?), and audit trail completeness for approved and rejected actions.

The full vendor rankings are in the 2026 Stackcurve AI Enterprise Agent Platform CURVE™ Report — free to download.


The Gap Most Buyers Miss

The Oversight Spectrum Is Not Binary

The framing of "human-in-the-loop vs. autonomous" implies two options. The reality is a spectrum with at least four distinct tiers, each appropriate for different action types and risk profiles:

Fully supervised: Every agent action requires human approval before execution. The agent proposes; the human approves or rejects. Highest safety, lowest throughput. This tier is appropriate for genuinely high-stakes and irreversible actions: executing legal contracts, processing large financial transactions, making patient treatment recommendations, and taking actions with significant regulatory exposure. The overhead of human approval is justified by the consequence of an erroneous autonomous action.

Human-in-the-loop at decision points: The agent operates autonomously through information gathering, analysis, and low-stakes steps. It pauses at predefined decision points and presents its reasoning and proposed action to a human approver. The human can approve, reject, or redirect. This is the most common pattern for enterprise workflows and the right default when building new agent deployments: start supervised at decision points and relax oversight incrementally as the agent's accuracy and reliability are demonstrated in production.

Exception-based oversight: The agent operates fully autonomously. Humans are notified only when the agent encounters an anomaly, triggers a predefined threshold, or takes an action that falls outside its expected operating parameters. Appropriate for high-volume, well-defined workflows where the action types are low-stakes or easily reversible — IT ticket routing, document classification, initial customer query triage, appointment scheduling. The exception-trigger configuration is the critical governance element: what anomalies trigger notification, and what is the notification path?

Fully autonomous: The agent operates without human involvement in any part of the workflow. Appropriate only for well-tested, well-scoped, genuinely reversible actions in fully understood domains with complete audit coverage. Most enterprises have very few workflows that should be fully autonomous — the combination of well-tested, well-scoped, and genuinely reversible is rare in enterprise contexts. The bar for fully autonomous operation should be high.

The Reversibility Test

The most useful single question for placing a workflow on the oversight spectrum is: if the agent takes the wrong action, how difficult is it to reverse? Reversibility exists on a spectrum:

  • Immediately reversible: a draft email that can be edited before sending; a document classification that can be manually corrected; a ticket assignment that can be reassigned.
  • Reversible with effort: a customer record modification that can be corrected; a service ticket created in error that can be closed; a communication sent that can be followed up with a correction.
  • Difficult to reverse: a financial transaction that requires a separate reversal process; a contract sent for signature; a public communication that has been received by its audience.
  • Irreversible: a payment executed; a deletion with no backup; a regulatory filing submitted.

As reversibility decreases, the required oversight tier should increase. Irreversible actions should default to full supervision. Immediately reversible actions can operate at lower oversight tiers with proportionally lower risk.

The Regulatory Dimension

The EU AI Act is the most developed regulatory framework for agentic AI, but it is not the only one. DORA (Digital Operational Resilience Act) requires financial services firms in the EU to document and test the operational resilience of AI systems that support critical functions. HIPAA guidance from the HHS Office for Civil Rights in 2025 clarified that AI systems making or informing clinical decisions in US healthcare contexts require human oversight at consequential decision points. The SEC's guidance on AI use in investment advisory services requires that AI-generated recommendations be subject to human review before being communicated to clients.

The practical implication is that the appropriate oversight tier for a given workflow may be determined not by internal risk appetite, but by the regulatory framework applicable to the enterprise's industry and the action type being automated. Legal and compliance teams should review the oversight configuration for every agent deployment that touches regulated processes — not as a final sign-off, but as a substantive input to the design.


Questions Your Buying Team Should Be Asking

1. How does the platform configure approval gates at the workflow step level — without custom code?

Approval gates must be configurable by administrators, not just developers. Ask the vendor to demonstrate configuring a human approval step within a workflow using the platform's native interface. The demonstration should show: how the approval request is triggered, how it is routed to the appropriate approver, what information the approver receives about the pending action and the agent's reasoning, what the approver can do (approve, reject, redirect, or modify), and how the workflow resumes or terminates based on the approver's decision.

2. What happens when an approver does not respond within the required window?

Approval workflows without escalation logic create silent failures. An agent waiting for an approval that never comes will block a workflow indefinitely unless escalation is configured. Ask the vendor: what is the timeout period for pending approvals? What happens at timeout — does the workflow fail, escalate to a secondary approver, or default to a safe fallback action? Can timeout behavior be configured per workflow step? The answer reveals whether the approval system is designed for production operations or for demonstration.

3. How are the oversight tier decisions documented — and who is accountable for configuring them correctly?

The oversight configuration is a governance decision, not just a technical configuration. Ask the vendor how the platform supports documenting why each workflow step was assigned its oversight tier. Is there an audit record of the oversight configuration — who set it, when, and with what justification? In regulated industries, the ability to demonstrate that oversight tiers were deliberately configured, reviewed, and approved by appropriate stakeholders may be as important as the oversight tier itself.

4. How does the platform handle agent actions that create legal commitments — specifically, does it have native integration with approval workflows in legal and finance systems?

Agent actions that create legal commitments — sending contracts for signature, issuing refunds, making purchasing commitments — should trigger approval workflows that route to the appropriate authority within the enterprise. A $10,000 purchase commitment should route to a manager with delegated purchasing authority. A contract for signature should route to legal review. Ask the vendor whether these approval routing integrations are native to the platform or require custom development. Native routing integrations indicate a platform designed for enterprise accountability. Custom routing requirements indicate a platform that has not been designed for the regulatory and liability dimensions of enterprise agent deployment.

5. What is your recommended oversight configuration for a new deployment — and how does it change as the deployment matures?

This question tests whether the vendor has thought carefully about the production deployment lifecycle. The appropriate oversight level at initial deployment — when agent behavior is not yet well-characterized — is more conservative than the appropriate oversight level for a mature deployment with a documented accuracy record. Vendors with production deployment experience will have a recommended ramp-down approach: start at human-in-the-loop at all decision points, relax to exception-based oversight for action types where accuracy meets defined thresholds, and only reduce to fully autonomous operation for action types with demonstrated reliability over a defined period.


The Stackcurve Take

The Air Canada precedent and the EU AI Act implementing guidance together establish that the human-in-the-loop decision is no longer a purely technical choice. It is a legal, regulatory, and operational risk decision that must involve legal and compliance stakeholders alongside technical and business teams.

The practical recommendation: every enterprise agent deployment should begin with a risk classification of each action type the agent will execute. Classify each action by reversibility, financial materiality, regulatory exposure, and reputational consequence. Map each classification to an oversight tier. Document the classification rationale. Review the configuration with legal and compliance before production deployment.

This process adds time. It adds less time than managing the liability exposure of an autonomous agent that made a material commitment without human oversight, or the regulatory exposure of an agentic system that operated at full autonomy in a regulated process that required human oversight at consequential decision points.

The 2026 Stackcurve AI Enterprise Agent Platform CURVE™ Report covers human-in-the-loop configuration capability, approval workflow integration, escalation logic, and audit trail completeness across all evaluated platforms, with specific guidance for regulated industries. Download it free →


← Back to Research Library

Stackcurve Advisory Briefs are independent research. No vendor pays for placement, tier assignment, or editorial influence. The CURVE™ methodology is disclosed in full at stackcurve.net/research/methodology.