Lab 01: Mapping the Risk Surface of an HR Policy Assistant

FlowRidge

AITB-LAG: LLM Risk & Governance Specialist — Body of Knowledge Lab Notebook 1 of 1

Scenario

Your organization is deploying an internal HR policy question-answering assistant called PolicyPal. The product team has provided the following description, and the first governance review has been scheduled for two weeks from today.

PolicyPal is a retrieval-augmented assistant available to all employees through the intranet. It answers questions about HR policies — time off, benefits, expense reimbursement, leave of absence — using the organization’s current policy documents as its knowledge source. It integrates with the corporate calendar to schedule meetings with HR representatives when the user requests them. The system uses a general-purpose managed LLM API for generation and a vector store for retrieval. The system prompt identifies PolicyPal as an HR assistant, instructs it to answer only HR policy questions, and tells it to escalate to a human HR representative when uncertain. Guardrails are the vendor’s default content-safety filter.

You are the governance practitioner assigned to the review. Your deliverables, produced across the four parts of this lab, will form the governance package that goes to the review board.

Part 1: Risk-surface brief (25 minutes)

Produce a one-page brief following the Article 1 template, with a row for each of the six layers of the LLM risk surface.

For each layer, record:

What inhabits the layer in PolicyPal? Name specifics.
Who owns it? Vendor, platform team, HR business owner, or external party.
Known failure modes. List two to four from the article material, specific to this deployment.
Controls in place or proposed. What is present today; what you will add.
Evidence. What artifact would demonstrate the control to an auditor.

Pay attention to the layers where PolicyPal’s description is silent. A layer with no declared content still needs a row and still needs the “empty, intentionally” discipline. The calendar tool is not optional (the description says integrates with the corporate calendar) so the tool layer is active, and you should think carefully about what a “schedule a meeting” function can do if the model is induced to misuse it.

Expected artifact: Risk-Surface-Brief.md (one page, six layers, complete rows).

Part 2: Injection stress-test plan (20 minutes)

Design a minimum battery of injection tests that the feature team must run before PolicyPal’s first launch. The battery should exercise every technique class from Article 2 that the PolicyPal surface exposes. For each test, specify:

Technique class (direct, indirect, jailbreak, encoded, persona, tool-output indirect, retrieval-corpus indirect).
Concrete prompt or content pattern.
Expected correct behavior.
Measurable success criterion.
Which layer of your Part 1 brief is intended to catch it.

A PolicyPal-specific consideration: PolicyPal retrieves from internal HR policy documents. What does indirect prompt injection look like on that retrieval path, and who can place content in the corpus and through what channel? Include at least one indirect-injection test that reflects a realistic adversarial content planting for this specific feature.

Expected artifact: Injection-Stress-Test-Plan.md with a table of at least twelve tests covering at least six technique classes.

Part 3: Guardrail design sketch (20 minutes)

The current design names only “the vendor’s default content-safety filter.” Sketch the upgrade to a four-layer architecture per Article 4.

Propose one commercial and one open-source option at each of: input classifier, policy filter, output classifier, tool-call validator. State explicitly which options your organization should actually choose and why; the reasoning should refer to existing vendor relationships, data-residency constraints, cost of ownership, and the skills of the platform team. Do not pick a vendor because its brand is familiar; state the decision criteria.

For the tool-call validator, the most important design decision on PolicyPal is the scope of the calendar function. Write out the function schema you would accept: exact arguments, allowed recipients, allowed time windows, whether the assistant may create a meeting without user confirmation. Note that a “help me schedule a meeting” capability can easily be abused by an injection attacker to schedule large numbers of meetings, to invite many employees to a single meeting, or to book at disruptive times; the validator must prevent these patterns by construction.

Include a two-paragraph section titled Over-blocking considerations in which you name the categories of legitimate employee queries that an overly-conservative input classifier will refuse (for example, questions about bereavement, mental health benefits, or harassment complaint procedures, all of which touch classifier categories that can produce false positives). Propose a policy-layer rewrite strategy rather than a block for these cases.

Expected artifact: Guardrail-Design-Sketch.md with the four-layer table, the tool-call validator schema, and the over-blocking section.

Part 4: Evaluation harness outline (15 minutes)

Outline the evaluation harness per Article 5, adapted to PolicyPal.

The outline should describe:

The four evaluation modes and what each consists of for this feature. For capability, what set of seed questions does PolicyPal need to answer correctly? For regression, what is the first regression test? For safety, which elements of the Part 2 battery become permanent harness tests? For human review, how many conversations per week should an HR business owner or their delegate review, and what rubric should they use?
The cadence schedule you will ask the team to commit to for PolicyPal specifically, noting that the feature is internal, moderate risk, and has a tool with limited but non-zero authority.
The ownership assignments. Name three roles that must own parts of the harness, even if the role titles in your organization are different from the article’s examples.

Expected artifact: Evaluation-Harness-Outline.md with the four modes, the cadence, and the owner table.

Reflection questions (10 minutes)

Write one paragraph on each of the three questions below. These paragraphs go into the governance package as rationale the review board will read.

The calendar tool. PolicyPal has a calendar tool. Suppose a colleague tells you they are not worried because the tool is read-only-ish (“it just schedules meetings”). Respond to that framing in governance terms. What is the worst realistic outcome of excessive agency on PolicyPal’s calendar tool, and what change to the Part 3 design closes that outcome?
The retrieval corpus. PolicyPal retrieves from HR policy documents. Who has write access to those documents inside the organization, and through what workflow? If your organization uses a document-management system where policy drafts move through review stages, what governance control needs to exist at which stage to prevent an indirect-injection attack from reaching the retrieval index?
The bereavement case. Re-read the Air Canada chatbot case in Article 3. Name the three controls from your Part 3 and Part 4 artifacts that would have changed the outcome in that case if they had been present, and state precisely how.

Final deliverable

A single governance package named PolicyPal-Governance-Package.md that combines all four artifact files plus the three reflection paragraphs, in that order, with a one-page executive summary at the top stating: the feature, the risk tier you assign it, the three most significant residual risks after your proposed mitigations are in place, and the recommendation you would make to the review board (approve, approve with conditions, or block).

The package, once complete, should run to approximately eight to twelve pages. It is the artifact an actual governance review would receive from an LLM risk specialist.

What good looks like

A governance lead reviewing a submitted package will look for:

Completeness. All six surface layers covered. At least twelve injection tests. All four guardrail layers. All four evaluation modes.
Specificity. Not generic phrases like “implement content filtering”; named controls with named technologies and explicit choice rationale.
Neutrality. At each choice between commercial and open-source, one commercial and one open-source option named. No favored vendor; choice rationale applied consistently.
Coverage of the PolicyPal-specific surface. The calendar tool, the HR corpus, and the employee-population characteristics, each shaping the controls.
A clear recommendation. The executive summary names a decision, not a hedge.