Halcyon Public Services Agency | Responsible AI under regulatory pressure.

Section 01

About Halcyon Public Services Agency

Halcyon had been delivering social programs for over three decades. In 2021, facing increasing caseloads and pressure to modernize, the agency invested in a suite of machine learning models to help caseworkers prioritize interventions, flag high-risk cases, and allocate housing supports more efficiently.

The models were built by a third-party vendor and deployed across all six regional offices within 18 months. On paper, the initiative was a success. Processing times improved. Caseworker capacity increased. Leadership presented the results at a national public sector conference.

Eighteen months later, an internal audit found something the agency had not planned for.

Leadership Landscape

Deputy Minister: Politically exposed and risk-averse. Primary concern was public and regulatory accountability.
Chief Information Officer: Technically capable but had limited experience governing AI systems in a regulatory context.
Director of Program Integrity: Had commissioned the internal audit after receiving complaints from frontline caseworkers about inconsistent model outputs across regions.

Section 02

The Situation

The internal audit found that three ML models in active use could not be adequately explained. Caseworkers could see the outputs, high risk, medium risk, low risk, but had no visibility into what drove a given classification. When clients challenged decisions, caseworkers had no way to walk through the reasoning. When regional outputs diverged significantly, nobody could diagnose why.

The audit also found that the models had been validated on historical data that contained documented demographic imbalances. There was no ongoing monitoring process in place. Model performance had not been reviewed since the initial vendor sign-off.

A formal complaint had been filed with the provincial privacy commissioner. A legal review was underway. The Deputy Minister needed a plan that could be presented to the minister's office within 60 days.

The core challenge was not technical. The models were functioning as designed. The challenge was that the agency had deployed consequential AI systems, tools that influenced decisions affecting people's access to housing and financial support, without the governance infrastructure to operate them responsibly.

Section 03

The Diagnosis

The first month was spent working through four parallel workstreams: a technical review of all three models, interviews with caseworkers across all six regions, a review of vendor contracts and documentation, and a policy gap analysis against emerging federal AI accountability guidelines.

Five findings shaped everything that followed.

1. No explainability layer existed. The models produced outputs but no reasoning. Caseworkers were expected to use them as tools, but had no framework for when to trust the output and when to apply their own judgment. In practice, some caseworkers ignored the models entirely. Others followed them without question. Neither approach was documented or governed.

2. Training data issues were real but manageable. The demographic imbalances in the historical data were a known limitation that had been flagged in the original vendor report, then set aside. The models were not performing equally across client subgroups, and nobody had been tracking this.

3. Vendor accountability was unclear. The original contract did not include provisions for ongoing model monitoring, performance reporting, or documentation standards. The agency had purchased a deployment, not an accountable AI system.

4. Frontline staff had spotted the problems first. Caseworkers had been raising concerns informally for months before the audit. Those concerns had not been collected, categorized, or escalated. There was no channel for frontline feedback on AI system performance.

5. The agency had no AI governance policy. There were IT procurement policies and data privacy policies, but nothing that specifically addressed how AI systems would be selected, validated, monitored, or retired. The gap was not unique to this agency, but it was now a liability.

Section 04

The Strategic Response

Framing the Problem for Leadership

The first conversation with the Deputy Minister was about framing. This was not a technology failure. The models were doing what they were built to do. This was a governance failure, and the distinction mattered because it changed the response.

A technology failure gets fixed by the vendor. A governance failure gets fixed by building the internal capability to operate AI systems responsibly, regardless of who built them. That meant the agency needed to own this, not hand it back.

That framing was accepted and became the foundation for how the remediation was communicated internally and to the minister's office.

The Remediation Plan

Three tracks ran in parallel.

Track 1: Immediate Risk Reduction

Before any new framework was built, the three models were placed under a mandatory human review requirement. No model output could be used as the sole basis for a consequential decision until the review process was complete. This was operationally disruptive and caseworker workload increased temporarily, but it was the right call and the Deputy Minister supported it.

A moratorium on deploying any additional AI tools was put in place for the duration of the review period.

Track 2: Technical Remediation

Working with the vendor and an independent technical reviewer, three changes were made to each model:

An explainability layer was added using SHAP (SHapley Additive exPlanations), a method that shows how much each input variable contributed to a given output. Caseworkers could now see, in plain language, why a case was flagged at a particular risk level.
Subgroup performance analysis was run across key demographic variables. Where performance gaps were found to exceed acceptable thresholds, the relevant model outputs were flagged for mandatory caseworker review rather than direct use.
A monitoring pipeline was established to track model performance on a monthly basis, with automated alerts if performance metrics shifted beyond defined thresholds.

Track 3: Governance Framework

A responsible AI framework was developed for the agency, covering the full lifecycle of an AI system from procurement through retirement. The framework addressed six areas:

Procurement standards: What documentation, validation evidence, and accountability commitments a vendor must provide before an AI system can be considered for deployment
Pre-deployment validation: The internal review process any AI system must pass before going live, including bias testing, explainability review, and caseworker readiness assessment
Operational requirements: What ongoing monitoring, human oversight, and documentation standards apply to any AI system in active use
Incident response: How to identify, escalate, and respond when an AI system produces outputs that cause harm or raise accountability concerns
Staff training: What caseworkers need to understand about AI systems they use, including how to exercise judgment and how to raise concerns
Retirement criteria: The conditions under which an AI system should be taken out of service

The framework was reviewed by the provincial privacy commissioner's office before finalization. It was adopted as agency policy in month ten.

Section 05

Execution Plan

Months 1 and 2: Stabilize

Complete technical review of all three models
Implement mandatory human review requirement for all model-influenced decisions
Issue internal communication to caseworkers explaining the review process and interim procedures
Begin frontline feedback collection across all six regional offices
Deliver preliminary findings to the Deputy Minister for the minister's office briefing

Months 3 and 5: Remediate

Work with vendor to add explainability layer to all three models
Run subgroup performance analysis; flag and address gaps exceeding acceptable thresholds
Stand up monthly monitoring pipeline with automated performance alerts
Draft responsible AI framework; circulate for internal review across legal, privacy, program integrity, and operations

Months 6 and 8: Rebuild Confidence

Run explainability training with caseworkers across all six regions
Pilot updated models with explainability outputs in two regional offices before full rollout
Incorporate caseworker feedback into final model adjustments
Submit responsible AI framework to provincial privacy commissioner for review

Months 9 and 12: Embed and Sustain

Full rollout of remediated models with explainability layer active
Adopt responsible AI framework as agency policy; integrate into procurement and IT governance processes
Establish AI governance committee with representation from program integrity, legal, IT, and frontline operations
Publish plain-language summary of the framework for public transparency
Lift mandatory human review requirement where subgroup performance gaps have been resolved

Section 06

Business Impact Targets

Metric	Target
Models with explainability layer active	3 of 3 by month 8
Subgroup performance gaps exceeding threshold	Resolved before full rollout
Responsible AI framework adopted as policy	Month 10
Caseworker training completion across all regions	Over 90% by month 11
Formal complaint with privacy commissioner resolved	Month 12
New AI procurement standard in place	Month 10

About Halcyon Public Services Agency

Leadership Landscape

The Situation

The Diagnosis

The Strategic Response

Framing the Problem for Leadership

The Remediation Plan

Execution Plan

Business Impact Targets

What this delivered.

Two more engagement patterns from the same body of work.

Governance as a commercial asset

AI strategy and governance from zero