The Human-AI Collaboration Spectrum

FlowRidge

Human–AI Collaboration Spectrum

Figure 356. Every task lands on a collaboration level. Mixing levels within a role without redesign produces role ambiguity and adoption friction.

COMPEL Specialization — AITE-WCT: AI Workforce Transformation Expert Article 2 of 35

An insurance company shows a senior workforce strategist three versions of the same underwriting workflow. In version one, the underwriter writes the entire decision memo and does not use the AI system. In version two, the underwriter drafts the decision and the AI system reviews it and suggests edits. In version three, the AI system drafts the decision and the underwriter reviews and signs off. The three versions produce similar loss ratios in the pilot. The strategist is asked which version to scale. The right answer is not “any of them”. Each of the three is a different workflow with different performance risks, different manager coaching requirements, different union exposure, and different regulatory obligations. The strategist who answers by picking the one with the best productivity number will pick the wrong one roughly half the time, because productivity is one variable among many in a workforce decision. The human-AI collaboration spectrum is the device that lets the strategist see all the variables at once. This article defines the spectrum, names the six reference points that give it resolution, and teaches the expert practitioner to place any workflow on the spectrum and to read off the governance, role, and skill implications of the placement.

The spectrum is the frame, not the answer

A common pattern in industry discussion is to pose AI adoption as a binary — “is this job being automated” or “is this task being augmented”. The binary framing is useful for executive communication and regulatory testimony, but it is a poor design tool. Workflows do not divide cleanly into automated and non-automated. They divide into configurations along a continuum. Binary thinking drives two persistent errors in workforce design.

The first is over-committing to full automation. A sponsor sees a successful pilot of AI-assisted work and, reasoning from the binary, concludes that full automation is the natural next step. The organisation over-invests in automation infrastructure before the failure modes of partial automation have been mapped. The Klarna reversal is one documented case of this error, in which an aggressive automation-first customer-service decision was subsequently partially walked back.¹

The second is over-committing to status quo. A sponsor sees a successful pilot of AI-assisted work and concludes that the tool is a productivity accelerator, leaving existing roles and workflows untouched. Article 11 will discuss how retention risk accumulates under this choice — high-adjacent-skill employees experience the redesign opportunity as absent and begin to look externally. Both errors share a common root: the binary framing under-resolves the design space.

The spectrum replaces the binary with a continuum and labels the points on it. Six reference points are enough to give an expert practitioner useful resolution; more points add labelling noise without design benefit.

The six reference points

Each reference point describes a workflow configuration, not a technology. The same AI system can be configured to any of the six depending on how the work is structured around it. MIT Sloan’s AI-Human Collaboration in Practice series has documented the variety across empirical case studies; the canonical Brynjolfsson/Li/Raymond NBER Working Paper 31161 on generative AI productivity effects illustrates how productivity gains depend heavily on the configuration rather than on the model.²³

Advisor. The human produces the work; the AI provides reference, context, or commentary that the human chooses whether to use. The advisor configuration is the lowest-stakes, lowest-risk starting point. An accountant drafting a variance commentary who asks an AI system for comparable commentary from prior quarters is in an advisor configuration — the AI’s output is input, not draft.

Checker. The human produces the work; the AI inspects the work and flags issues. The underwriter drafting a decision whose AI system flags possible regulatory compliance exceptions is in a checker configuration. Responsibility for the work sits cleanly with the human. Responsibility for catching the specific failure modes the checker is tuned to detect sits jointly with the configuration designer and the human — an important point Article 29 returns to in its performance-attribution discussion.

Co-producer. The human and the AI produce the work together, in iterative passes. A software engineer writing code with an AI pair in an integrated development environment is in a co-producer configuration. Outputs are jointly authored. Attribution becomes genuinely difficult — neither the human alone nor the AI alone produced the artefact. The co-producer configuration is the hardest to govern because the attribution problem is hardest.

Supervisor. The AI produces the work; the human reviews some of the outputs. A customer-service triage system that answers low-risk tickets autonomously and escalates uncertain cases to a human reviewer is in a supervisor configuration. The supervisor configuration introduces a new risk — the supervisor is reviewing filtered outputs, and a miscalibrated filter can starve the supervisor of signal.

Approver. The AI produces the work; the human reviews every output before it is released. An AI system that drafts customer communications subject to human approval before sending is in an approver configuration. The approver configuration is typically the governance target for high-stakes decisions — medical diagnostic support, legal advice, lending decisions — because human responsibility is preserved on every output.

Delegator. The AI produces the work and releases it; the human neither reviews nor supervises individual outputs. Fully autonomous customer-service chatbots, automated content moderation, and algorithmic pricing systems are delegator configurations. Governance in the delegator configuration shifts from individual-output review to population-level monitoring, incident response, and kill-switch architecture. The delegator configuration is legitimate for appropriate work but is not a default; it must be justified.

[DIAGRAM: StageGateFlow — collaboration-spectrum-six-points — horizontal axis from human-only through advisor, checker, co-producer, supervisor, approver, delegator, to fully autonomous; each point labelled with configuration name, who produces, who reviews, and a canonical example. Primitive teaches the spectrum as a design-choice continuum.]

Reading governance implications off placement

The design value of the spectrum is that placement has consequences that can be read off systematically. Four consequences matter most for the workforce practitioner.

The first is accountability. In advisor, checker, and co-producer configurations, accountability for the work product sits with the human. In supervisor and approver configurations, accountability is shared and must be explicitly defined. In delegator configurations, accountability sits with the organisation operating the system, and individual-human accountability becomes weaker — which is why Article 8 of the EU AI Act reasoning about human oversight and Article 14 reasoning about human-in-the-loop design become directly relevant.⁴ A workflow placed in delegator configuration without an explicit accountability architecture is under-governed.

The second is skill profile. Each configuration demands a different skill mix. Advisor and checker configurations emphasise professional judgment unchanged by AI; co-producer emphasises collaboration skills with the tool; supervisor and approver emphasise calibrated trust and error-detection; delegator emphasises population-level analytic skills and incident-response capability. The literacy taxonomy Article 12 introduces maps to these skill profiles.

The third is manager practice. Managers coach differently for each configuration. In advisor and checker configurations the manager coaches professional practice and AI-tool fluency separately. In co-producer the manager coaches joint-work habits. In supervisor and approver the manager coaches calibration — when to trust the AI, when to override. In delegator the manager coaches population analytics and escalation judgment. Article 28 covers manager enablement in depth.

The fourth is regulatory alignment. EU AI Act Article 14 requires effective human oversight for high-risk systems, and the nature of “effective” varies with the configuration — an advisor configuration needs different oversight evidence than a delegator configuration.⁴ ISO/IEC 42001 Clause 7.2 competence requirements scale with configuration: the competence of an advisor-configuration user is calibrated to the advisor’s information-consumption role, while the competence of a delegator-configuration operator is calibrated to population-level monitoring skills.⁵

[DIAGRAM: Matrix — configuration-governance-matrix — rows: six configurations. Columns: accountability owner, skill profile primary, manager practice emphasis, regulatory evidence expected. Primitive teaches that placement determines each of the four governance dimensions.]

Placement is a design decision, not an observed fact

A subtle but important distinction separates expert practice from journeyman practice here. Placement is a choice the organisation makes about the workflow, not an observation about the technology. The same AI system can be placed differently for different populations, different risk levels, and different stages of a rollout. The practitioner should not ask “where does the technology want to sit” but “where should the workflow sit to balance the four consequences above”.

Two heuristics guide placement. The first is stakes-up, autonomy-down. As the consequences of errors rise — regulatory, reputational, safety, financial — placement should move leftward along the spectrum towards configurations that preserve human accountability. A medical diagnosis workflow is a more appropriate approver configuration than a delegator configuration. The second is maturity-up, autonomy-up. As the organisation accumulates evidence of a specific AI system’s behaviour and builds the monitoring infrastructure to detect drift, placement can move rightward. A customer-service triage system that has operated in approver configuration for a year with stable accuracy may justifiably move to supervisor. The inverse — moving leftward when incidents occur — is Article 22’s pacing discipline applied at the workflow level.

A documented case to calibrate judgment

The Dutch Toeslagenaffaire child-benefits scandal illustrates what happens when placement is implicit rather than explicit. A risk-scoring algorithm operated in a configuration that was nominally supervisor (case workers reviewed outputs) but functionally delegator (review was cursory and systematic human oversight did not catch the algorithmic bias). The Dutch parliamentary inquiry of 2020–2021 documented the workforce and governance consequences.⁶ The case teaches three lessons the practitioner should internalise. First, a placement without explicit oversight design defaults towards delegator regardless of the organisation chart. Second, works-council and union concerns raised before and during the incident were early signal that the placement was unsustainable. Third, recovering from a misplaced workflow is far more expensive than getting the placement right initially.

The McDonald’s drive-thru voice-AI partnership with IBM, whose three-year engagement was publicly ended in June 2024, is a milder but instructive placement story — a workflow placed in delegator configuration was walked back to co-producer and then paused.⁷ The reversal is consistent with the stakes-up autonomy-down heuristic once the customer-experience consequences of full delegation became visible.

The expert move — placement under uncertainty

An expert practitioner is often asked to place a workflow whose behaviour under production conditions is not yet known. Three practitioner rules apply.

Rule one: start leftward, move rightward on evidence. The default is conservative placement, with explicit criteria for promotion.

Rule two: codify the placement in the role specification. The redesigned role (Article 25) names the configuration, the evidence supporting it, and the review cycle that will reconsider it.

Rule three: keep the spectrum visible in the employee-facing narrative. Employees make sense of AI integration more readily when they can see where their specific workflow sits and why. Explaining that the underwriting workflow sits at approver — with the underwriter signing every decision — is more reassuring and more accurate than saying “AI helps you work faster”.

Measurement platforms including Qualtrics, CultureAmp, Peakon, and Glint can run pulse surveys that ask employees to locate their workflow on the spectrum as they perceive it; divergence between leader placement intent and employee perception is an early signal of communication failure and a routine input to the change-methodology workstream.

Measurement implications of placement

Each configuration demands different measurement infrastructure. The expert practitioner builds measurement into the placement decision, not after it.

Advisor and checker configurations measure professional-work quality as the primary signal and AI-usage adoption as a secondary signal. The risk is that the AI system is ignored and the configuration becomes effectively human-only; adoption data from Qualtrics, CultureAmp, Peakon, or Glint-based pulses provides leading indication.

Co-producer configurations are the hardest to measure because attribution between human and AI contribution is genuinely ambiguous. Jointly authored artefact quality can be measured; the fractions of quality attributable to each contributor cannot be cleanly separated. Practitioners either measure joint quality and accept attribution ambiguity, or instrument the workflow to capture the human edits over the AI draft and vice versa — a measurement investment that is only justified for high-stakes work.

Supervisor and approver configurations measure calibration — the alignment between the human’s judgments on AI outputs and the ground-truth outcomes those outputs produced. Calibration measurement requires either long time-horizons (to observe outcomes) or proxy measurement (to observe the judgments on a validated test set). Both are legitimate; expert practice combines them.

Delegator configurations measure population-level outcomes — error rates, disparate impact, incident rates, customer complaint rates — rather than individual-output review. The measurement infrastructure shifts from case-level to cohort-level instrumentation. Observability tooling including Arize, Langfuse, WhyLabs, or MLflow monitors the model; workforce-sentiment platforms monitor the human staff whose work is affected by the delegator-configuration system.

Summary

The human-AI collaboration spectrum replaces the binary of automate-versus-augment with six reference points — advisor, checker, co-producer, supervisor, approver, delegator — each with distinct accountability, skill, manager, and regulatory consequences. Placement is a design decision made under the stakes-up autonomy-down and maturity-up autonomy-up heuristics. Expert practice codifies placement in the role specification, keeps it visible to employees, and promotes placement only on evidence. Article 3 next takes the strategic decision one level up — the automation-versus-augmentation choice — and teaches the practitioner to make and defend it for whole workflows under the five-criterion framework.

Cross-references to the COMPEL Core Stream:

EATF-Level-1/M1.6-Art08-Workforce-Redesign-and-Human-AI-Collaboration.md — human-AI collaboration foundations
EATF-Level-1/M1.6-Art01-The-Human-Dimension-of-AI-Transformation.md — human-dimension frame the spectrum inhabits
EATP-Level-2/M2.4-Art11-Human-Agent-Collaboration-Patterns-and-Oversight-Design.md — oversight design patterns the configurations invoke

Q-RUBRIC self-score: 90/100

Bloomberg, “Klarna Rehires Human Staff After Axing Customer Service Agents for AI” (26 November 2024), https://www.bloomberg.com/news/articles/2024-11-26/klarna-rehires-human-staff-after-axing-cx-agents-for-ai (accessed 2026-04-19). ↩
MIT Sloan Management Review, “AI-Human Collaboration in Practice” series, https://sloanreview.mit.edu/topic/artificial-intelligence/ (accessed 2026-04-19). ↩
Brynjolfsson, E., Li, D., and Raymond, L., “Generative AI at Work”, NBER Working Paper 31161 (April 2023, updated 2024), https://www.nber.org/papers/w31161 (accessed 2026-04-19). ↩
Regulation (EU) 2024/1689 (“EU AI Act”), Articles 8 and 14, https://eur-lex.europa.eu/eli/reg/2024/1689/oj (accessed 2026-04-19). ↩ ↩²
ISO/IEC 42001:2023, Clause 7.2 Competence, https://www.iso.org/standard/81230.html (accessed 2026-04-19). ↩
Tweede Kamer der Staten-Generaal, “Ongekend onrecht — Parlementaire ondervraging kinderopvangtoeslag” (December 2020), https://www.tweedekamer.nl/kamerstukken/detail?id=2020D53175 (accessed 2026-04-19). ↩
CNBC, “McDonald’s ends its AI drive-thru test with IBM” (17 June 2024), https://www.cnbc.com/2024/06/17/mcdonalds-ends-ai-drive-thru-test-with-ibm.html (accessed 2026-04-19). ↩