Case Study 01: Moffatt v. Air Canada — Deployer Liability for Chatbot Confabulation

FlowRidge

AITB-LAG: LLM Risk & Governance Specialist — Body of Knowledge Case Study 1 of 1

The facts

In November 2022, Jake Moffatt’s grandmother died. Searching for flight options between Vancouver and Toronto to attend the funeral, Moffatt visited Air Canada’s website and interacted with the airline’s customer-service chatbot, which was publicly available for travel questions. Moffatt asked about the airline’s bereavement fares. The chatbot produced a response that explained, among other things, that a passenger could travel on a standard fare and then apply for a bereavement-fare refund within ninety days of travel. Relying on that response, Moffatt booked a full-fare ticket, traveled, and subsequently applied for the refund. Air Canada denied the application. The airline’s actual bereavement-fare policy, linked elsewhere on the website, required the bereavement fare to be requested and documented before travel, not retroactively. The chatbot’s answer contradicted the documented policy.

Moffatt brought a claim in the British Columbia Civil Resolution Tribunal, a small-claims forum with streamlined procedures designed for disputes under five thousand Canadian dollars. The decision, Moffatt v. Air Canada, 2024 BCCRT 149, was published on 14 February 2024. The tribunal found for Moffatt and ordered Air Canada to pay the difference between the standard fare and the bereavement fare, plus interest and tribunal fees¹. The decision received wide press coverage, including from Reuters, The Washington Post, and specialist aviation press, precisely because Air Canada’s defense became the teaching point: the airline argued that the chatbot was, in effect, a separate legal entity whose statements were not binding on Air Canada. The tribunal rejected that argument in direct terms.

Why this case is a teaching case

Moffatt v. Air Canada is a small-claims decision with limited formal precedential weight. Its importance to LLM governance is not its value as a legal precedent but its clarity as an illustration. The case sits at the intersection of three things every practitioner must understand: the technical failure mode (confabulation), the organizational failure mode (deployer disclaimer of responsibility), and the governance gap that connects them (absence of the controls that would have prevented or caught the confabulation). Analyzing it through the frameworks of this credential converts the story into a set of concrete design changes.

Risk-surface analysis

The six-layer risk surface from Article 1 localizes what went wrong.

Input layer. Moffatt asked a standard natural-language question about bereavement policy. No adversarial content was involved; the input was benign. This is a critical observation: the failure mode was not an attack. A significant share of LLM incidents in the public record are not attacks; they are ordinary usage producing inaccurate responses. Governance frames that exclude non-adversarial failure from the surface will miss this entire category.

Model layer. Air Canada’s public description of the chatbot at the time indicated it used a generative model to produce responses. The model was trained broadly and had no specific grounding in Air Canada’s current policy text beyond whatever it had encountered during training. The model-layer failure was in its use for a task (quoting current corporate policy) for which its training did not equip it reliably.

Output layer. The chatbot produced fluent, confident text that sounded like a policy statement. There was no visible uncertainty marker, no “I am not certain; please check the policy page”, no citation to the source of the claim. The output had the form of an authoritative statement.

Retrieval layer. The public record does not conclusively establish whether the chatbot had any retrieval integration at the time of the incident. The more telling observation is that if retrieval was in place, it did not include a citation surface that would have shown the user the source of the claim; and if retrieval was not in place, the architecture was closed-book with no grounding against the current policy corpus at all.

Tool layer. The chatbot did not invoke any booking or refund-processing tool. The harm was produced by text alone. This is instructive: a feature with no tool-layer exposure can still produce substantial real-world harm through text-layer confabulation.

Data layer. No sensitive-data exposure was at issue. The data flows were ordinary: user query in, textual response out. The harm dimension here was reputational for Air Canada and financial for Moffatt, not privacy.

Guardrail-architecture analysis

The four-layer guardrail architecture from Article 4 makes the specific gaps visible.

Input classifier. Not relevant in the primary sense, because the input was benign. However, an input classifier that detected “this is a policy question” could have routed the conversation differently, for example to a policy-lookup tool rather than to free generation.

Policy filter. This is the layer whose absence produced the incident. A policy filter is where organization-specific rules live: “when the user asks about refund eligibility, do not make commitments about Air Canada’s obligations; instead cite the policy page and offer to connect to an agent.” Such a filter is a deterministic control or a lightweight rewriter. It is not difficult to build. It was not present.

Output classifier. A category-level content-safety output classifier would not have flagged the response, because the content was not toxic, dangerous, or category-violating. It was just wrong. Output classification catches a specific failure class; it does not catch confabulation.

Tool-call validator. Not applicable in this specific incident because no tool was called.

The analysis shows where output-level guardrails stop and where policy-layer guardrails begin. Confabulation about policy commitments is a policy-layer concern. A team whose entire guardrail investment was in vendor-default content-safety filters would have been unprotected against this failure class.

Grounding analysis

Article 3’s grounding taxonomy further localizes the failure. The chatbot appears to have operated either closed-book or with insufficiently tight closed-domain retrieval. A closed-book architecture is the wrong architecture for a policy question. A closed-domain retrieval against Air Canada’s current policy corpus, with mandatory source citation in the response and a constraint that the model may only state what the retrieved passage supports, would have either produced the correct answer or surfaced the conflict to the user. The ninety-day window the chatbot stated would not have appeared in the retrieved current-policy document.

This is the most consequential change the case suggests. Grounding architecture is not a performance optimization; in policy-answering features, it is a liability-reduction control.

Regulatory-framework analysis

The incident occurred in Canada; no jurisdiction at the time imposed a specific LLM statute. The EU AI Act, now in force, would apply to a comparable feature operated in the European Union. Two articles bite particularly.

Article 50 transparency. Air Canada’s chatbot did inform users that they were interacting with an automated system, so the Article 50(1) transparency duty would have been satisfied at the baseline. Article 50(1) is a floor, not a ceiling; it would not have prevented the incident.

Article 26 deployer duties. This is the article that matches the tribunal’s reasoning most directly. Article 26(1) requires deployers to use high-risk systems in accordance with the provider’s instructions, assign trained human oversight, monitor operation, and take corrective action on non-conformity risk. A customer-service chatbot that makes commitments on behalf of the deployer is, even if not classified as high-risk under Annex III, an operational system whose output the deployer owns. The tribunal’s rejection of the “separate legal entity” argument is the Canadian common-law analogue of the Article 26(1) principle.

ISO 42001 Clause 9.1 evidence. The evidence an auditor would ask for (what was the feature’s policy filter configured to catch; what were the red-team results on policy-commitment questions; what human-review sampling was done on policy-answer conversations; what disclosure was made; what was the incident-response runbook) would in 2022 have been absent from most deployments. Clause 9.1 does not itself produce the controls, but it produces the expectation that the controls and their evidence are present.

What the practitioner takes away

Five concrete changes would have altered the Moffatt outcome.

First, a closed-domain retrieval architecture over the current policy corpus, with mandatory citation in the response, would have either produced the correct answer or made the contradiction visible to the user.

Second, a policy-layer filter that recognizes policy-commitment questions as a protected category would have redirected the conversation into a bounded flow rather than allowing free generation.

Third, a scoped disclaimer, not a hidden terms-of-service disclaimer but an inline one on responses involving policy or pricing, would have calibrated user reliance.

Fourth, a correction path that allowed the user to flag the answer as wrong and that routed flagged answers into a review queue would have surfaced the error to Air Canada before it reached the tribunal, where it could have been corrected and refunded at lower total cost.

Fifth, a training and oversight record for the team operating the chatbot would have provided the evidence that Article 26(2)‘s human-oversight duty and ISO 42001 Clause 7.2’s competence requirement were being met.

None of these changes is exotic. Every one of them is achievable on a commercial managed-API stack, on a self-hosted open-weight stack, or on a hybrid. The controls are technology-neutral; the governance discipline is not.

Comparison: Mata v. Avianca (S.D.N.Y. 2023)

The Moffatt case has a close cousin in the United States. In June 2023, a federal judge in the Southern District of New York sanctioned two attorneys in Mata v. Avianca for filing a legal brief that contained six fabricated case citations generated by a general-purpose language model². The teaching contrasts with Moffatt are useful.

The confabulation mechanism was the same: fluent, confident, category-plausible output unsupported by ground truth. The organizational failure was different. In Mata, the lawyers treated the model’s output as research rather than as generation, and did not verify. In Moffatt, the airline treated the chatbot’s output as the organization’s statement but then disclaimed responsibility for it. Both are failures of organizational framing. The Moffatt response (“the chatbot is not us”) failed in tribunal. The Mata response (“we trusted the tool”) failed at sanctions.

Both cases anchor the same practitioner lesson: an LLM feature is not a separate entity. Its outputs are the organization’s outputs. The governance investment required to make that ownership operationally tolerable is the investment the rest of this credential describes.

Summary

Moffatt v. Air Canada is the clearest public illustration of deployer liability for LLM confabulation in a customer-facing feature. The tribunal’s rejection of the airline’s “separate entity” defense is the operative holding; the technical failure behind the defense is confabulation enabled by inadequate grounding and a missing policy-layer guardrail. Five specific changes (closed-domain retrieval with citation, a policy-layer filter, scoped disclaimers, a correction path, and an operator-oversight record) are the controls this credential teaches, and each would have altered the outcome. The case functions, correctly used, as the compass point for LLM governance in 2026.

Moffatt v. Air Canada, 2024 BCCRT 149. British Columbia Civil Resolution Tribunal, decision dated 14 February 2024. https://decisions.civilresolutionbc.ca/crt/sc/en/item/525448/index.do — accessed 2026-04-19. ↩
Mata v. Avianca, Inc., No. 22-cv-1461 (S.D.N.Y. 2023); coverage in Sara Merken, New York Lawyers Sanctioned for Using Fake ChatGPT Cases in Legal Brief, Reuters, 22 June 2023. https://www.reuters.com/legal/transactional/lawyer-used-chatgpt-cite-bogus-cases-what-are-ethics-2023-05-30/ — accessed 2026-04-19. ↩