All scenarios
Scenario 03 Government / Public Sector

Responsible AI under regulatory pressure.

Halcyon Public Services Agency

  • RoleDirector of AI & Data Governance: Responsible AI
  • IndustryGovernment / Public Sector
  • Scale3 ML models, 400K beneficiaries, 6 regions
  • Timeline12 months: audit finding to framework adoption

Halcyon Public Services Agency is a fictitious organization. This scenario is a composite drawn from patterns observed across public sector organizations deploying algorithmic decision-support tools. It is included here to illustrate strategic thinking and leadership approach.

Section 01

About Halcyon Public Services Agency

Halcyon had been delivering social programs for over three decades. In 2021, facing increasing caseloads and pressure to modernize, the agency invested in a suite of machine learning models to help caseworkers prioritize interventions, flag high-risk cases, and allocate housing supports more efficiently.

The models were built by a third-party vendor and deployed across all six regional offices within 18 months. On paper, the initiative was a success. Processing times improved. Caseworker capacity increased. Leadership presented the results at a national public sector conference.

Eighteen months later, an internal audit found something the agency had not planned for.

Leadership Landscape

Deputy Minister
Politically exposed and risk-averse. Primary concern was public and regulatory accountability.
Chief Information Officer
Technically capable but had limited experience governing AI systems in a regulatory context.
Director of Program Integrity
Had commissioned the internal audit after receiving complaints from frontline caseworkers about inconsistent model outputs across regions.
Section 02

The Situation

The internal audit found that three ML models in active use could not be adequately explained. Caseworkers could see the outputs, high risk, medium risk, low risk, but had no visibility into what drove a given classification. When clients challenged decisions, caseworkers had no way to walk through the reasoning. When regional outputs diverged significantly, nobody could diagnose why.

The audit also found that the models had been validated on historical data that contained documented demographic imbalances. There was no ongoing monitoring process in place. Model performance had not been reviewed since the initial vendor sign-off.

A formal complaint had been filed with the provincial privacy commissioner. A legal review was underway. The Deputy Minister needed a plan that could be presented to the minister's office within 60 days.

The core challenge was not technical. The models were functioning as designed. The challenge was that the agency had deployed consequential AI systems, tools that influenced decisions affecting people's access to housing and financial support, without the governance infrastructure to operate them responsibly.

Section 03

The Diagnosis

The first month was spent working through four parallel workstreams: a technical review of all three models, interviews with caseworkers across all six regions, a review of vendor contracts and documentation, and a policy gap analysis against emerging federal AI accountability guidelines.

Five findings shaped everything that followed.

1. No explainability layer existed. The models produced outputs but no reasoning. Caseworkers were expected to use them as tools, but had no framework for when to trust the output and when to apply their own judgment. In practice, some caseworkers ignored the models entirely. Others followed them without question. Neither approach was documented or governed.

2. Training data issues were real but manageable. The demographic imbalances in the historical data were a known limitation that had been flagged in the original vendor report, then set aside. The models were not performing equally across client subgroups, and nobody had been tracking this.

3. Vendor accountability was unclear. The original contract did not include provisions for ongoing model monitoring, performance reporting, or documentation standards. The agency had purchased a deployment, not an accountable AI system.

4. Frontline staff had spotted the problems first. Caseworkers had been raising concerns informally for months before the audit. Those concerns had not been collected, categorized, or escalated. There was no channel for frontline feedback on AI system performance.

5. The agency had no AI governance policy. There were IT procurement policies and data privacy policies, but nothing that specifically addressed how AI systems would be selected, validated, monitored, or retired. The gap was not unique to this agency, but it was now a liability.

Section 04

The Strategic Response

Framing the Problem for Leadership

The first conversation with the Deputy Minister was about framing. This was not a technology failure. The models were doing what they were built to do. This was a governance failure, and the distinction mattered because it changed the response.

A technology failure gets fixed by the vendor. A governance failure gets fixed by building the internal capability to operate AI systems responsibly, regardless of who built them. That meant the agency needed to own this, not hand it back.

That framing was accepted and became the foundation for how the remediation was communicated internally and to the minister's office.

The Remediation Plan

Three tracks ran in parallel.

Track 1: Immediate Risk Reduction

Before any new framework was built, the three models were placed under a mandatory human review requirement. No model output could be used as the sole basis for a consequential decision until the review process was complete. This was operationally disruptive and caseworker workload increased temporarily, but it was the right call and the Deputy Minister supported it.

A moratorium on deploying any additional AI tools was put in place for the duration of the review period.

Track 2: Technical Remediation

Working with the vendor and an independent technical reviewer, three changes were made to each model:

  • An explainability layer was added using SHAP (SHapley Additive exPlanations), a method that shows how much each input variable contributed to a given output. Caseworkers could now see, in plain language, why a case was flagged at a particular risk level.
  • Subgroup performance analysis was run across key demographic variables. Where performance gaps were found to exceed acceptable thresholds, the relevant model outputs were flagged for mandatory caseworker review rather than direct use.
  • A monitoring pipeline was established to track model performance on a monthly basis, with automated alerts if performance metrics shifted beyond defined thresholds.

Track 3: Governance Framework

A responsible AI framework was developed for the agency, covering the full lifecycle of an AI system from procurement through retirement. The framework addressed six areas:

Procurement standards
What documentation, validation evidence, and accountability commitments a vendor must provide before an AI system can be considered for deployment
Pre-deployment validation
The internal review process any AI system must pass before going live, including bias testing, explainability review, and caseworker readiness assessment
Operational requirements
What ongoing monitoring, human oversight, and documentation standards apply to any AI system in active use
Incident response
How to identify, escalate, and respond when an AI system produces outputs that cause harm or raise accountability concerns
Staff training
What caseworkers need to understand about AI systems they use, including how to exercise judgment and how to raise concerns
Retirement criteria
The conditions under which an AI system should be taken out of service

The framework was reviewed by the provincial privacy commissioner's office before finalization. It was adopted as agency policy in month ten.

Section 05

Execution Plan

Months 1 and 2: Stabilize

  • Complete technical review of all three models
  • Implement mandatory human review requirement for all model-influenced decisions
  • Issue internal communication to caseworkers explaining the review process and interim procedures
  • Begin frontline feedback collection across all six regional offices
  • Deliver preliminary findings to the Deputy Minister for the minister's office briefing

Months 3 and 5: Remediate

  • Work with vendor to add explainability layer to all three models
  • Run subgroup performance analysis; flag and address gaps exceeding acceptable thresholds
  • Stand up monthly monitoring pipeline with automated performance alerts
  • Draft responsible AI framework; circulate for internal review across legal, privacy, program integrity, and operations

Months 6 and 8: Rebuild Confidence

  • Run explainability training with caseworkers across all six regions
  • Pilot updated models with explainability outputs in two regional offices before full rollout
  • Incorporate caseworker feedback into final model adjustments
  • Submit responsible AI framework to provincial privacy commissioner for review

Months 9 and 12: Embed and Sustain

  • Full rollout of remediated models with explainability layer active
  • Adopt responsible AI framework as agency policy; integrate into procurement and IT governance processes
  • Establish AI governance committee with representation from program integrity, legal, IT, and frontline operations
  • Publish plain-language summary of the framework for public transparency
  • Lift mandatory human review requirement where subgroup performance gaps have been resolved
Section 06

Business Impact Targets

MetricTarget
Models with explainability layer active3 of 3 by month 8
Subgroup performance gaps exceeding thresholdResolved before full rollout
Responsible AI framework adopted as policyMonth 10
Caseworker training completion across all regionsOver 90% by month 11
Formal complaint with privacy commissioner resolvedMonth 12
New AI procurement standard in placeMonth 10
Outcome

What this delivered.

The Deputy Minister had a defensible plan for the minister's office within 45 days. That mattered. The legal and reputational exposure was real and the timeline for a credible response was short.

The mandatory human review requirement created short-term friction but it did something else too. It forced a structured conversation with caseworkers about how they were actually using the models, which surfaced concerns that had been sitting in informal channels for over a year. That intelligence shaped the entire remediation.

The explainability layer changed how caseworkers related to the tools. Several regional offices reported that caseworkers who had been ignoring the model outputs started engaging with them once they could see the reasoning. Others who had been over-relying on them started applying more independent judgment. Both shifts were in the right direction.

The responsible AI framework outlasted the immediate crisis. It became the standard the agency applied to two subsequent technology procurements in the following fiscal year, neither of which involved AI, but both of which benefited from clearer accountability requirements.

The privacy commissioner closed the complaint file in month twelve, noting the agency's remediation approach as an example of responsible institutional response to an AI governance gap.

Halcyon Public Services Agency is a fictitious organization. This scenario is a composite drawn from patterns observed across public sector organizations deploying algorithmic decision-support tools. It is included here to illustrate strategic thinking and leadership approach.