The Document
On April 30, 2026, the Australian Signals Directorate's Australian Cyber Security Centre, as lead author, released joint guidance titled Careful Adoption of Agentic AI Services with co-seal from CISA, NSA, the Canadian Centre for Cyber Security, NCSC-NZ, and NCSC-UK. The full PDF is hosted by the U.S. Department of Defense media site, and NSA published a release the same day confirming the multilateral signature.
The document tells security teams what every cybersecurity blog has already told them this year: assume agents will do unexpected things, treat prompt injection as unsolved, build in rollback, log everything. The guidance itself notes that "agentic AI does not require an entirely new security discipline", which is technically true and politically convenient; it lets six governments co-sign without committing any of them to a new regulatory regime.
That framing buries the actual news. Careful Adoption is the first cross-government document a procurement team can translate directly into vendor questionnaire items, because it organizes risk into five named classes that map cleanly onto evidence a buyer can demand. The document does not perform that translation, however, which is the work this post does.
The Five Risk Classes, And What Each One Asks Procurement to Demand
The joint guidance organizes agentic AI risk into five classes: Privilege, Design and Configuration, Behavior, Structural, and Accountability. Each is defined in the document. None is mapped to a vendor questionnaire item. For each class below I summarize the document's framing, name the gap procurement has to close on its own, and give the specific evidence request a buyer can put on the questionnaire today.
1. Privilege
The document defines Privilege risk as agents granted excessive scope, persistent access, or unrestricted system reach, where a single compromise causes far more damage than a typical software vulnerability. The prescribed controls are cryptographically secured agent identities separate from user accounts, short-lived credentials in place of persistent API keys, encrypted agent-to-agent and agent-to-service communications, and least-privilege scoping per task, with privilege management treated as foundational throughout the document. This is the same identity surface that turns an over-privileged agent into an insider threat when the credential model collapses to "the agent is the user."
The gap is that the guidance does not tell a buyer how to verify any of this in a vendor's product before contract signature. "Trust us, we use short-lived credentials" is not an artifact.
The questionnaire ask: provide the agent identity model documentation, including the credential type (cryptographic identity vs. shared API key), credential lifetime in seconds, the per-task scope envelope, and a sample audit log showing scope enforcement on a denied action. If the vendor cannot produce the denied-action log, they have not implemented the control; they have described it. This is the same diligence pattern I worked from on the M&A side at Houlihan Lokey, where a target's claimed control posture was never the same artifact as evidence the control fired.
2. Design and Configuration
The document defines this class as poor setup that creates security gaps before the system goes live: weak architecture, sloppy third-party integrations, misconfigured permissions. The prescribed control is incremental deployment starting with low-risk, non-sensitive use cases, paired with defense-in-depth across input filtering, output monitoring, anomaly detection, and architectural separation of agent planning from execution.
The gap is that the document does not split responsibility for these controls between vendor and buyer. A vendor can claim defense-in-depth while shipping a system in which the buyer is silently responsible for output monitoring and the architectural separation between planning and execution.
The questionnaire ask: provide a written responsibility matrix mapping each Design and Configuration control to either the vendor, the buyer, or a shared boundary, with the boundary defined in code (which API call enforces it) rather than in marketing copy. The same matrix should name the third-party integrations the agent depends on and the version of each, because the Mercor breach showed how privileged-access vendor compromises propagate through misconfigured integrations.
3. Behavior
The document defines Behavior risk as an agent pursuing a goal in ways its designers never intended or predicted: goal misalignment, deceptive outputs. Its concession on prompt injection is unusually direct for a co-sealed government document; the guidance states "the problem may never be solved". That concession matches what Google's own DeepMind data showed about prompt injection success rates after defenses, and it shifts the questionnaire away from prevention claims toward containment evidence. The prescribed control is human sign-off for high-impact actions, with the explicit caveat that "deciding which actions require that approval is a job for system designers, not the agent".
The gap is that the document never defines "high-impact action." A buyer and a vendor can both sign the contract and disagree, in production, about whether an agent moving 50,000 dollars between accounts, or deleting 10,000 records, or sending an email to a regulator counts as high-impact. Without a shared vocabulary, the human-in-the-loop control is a negotiation, not a guarantee.
The questionnaire ask: provide the vendor's default "high-impact action" threshold list, expressed as concrete predicates such as dollar amount, record count, recipient domain, system criticality tier, or destructive verb. Then negotiate the buyer's overrides into the SOW before integration begins, not after. Where the vendor refuses to commit to a threshold, treat the silence as the answer.
4. Structural
The document defines Structural risk as cascading failures across multi-agent or orchestrator architectures, where the compromise of one sub-agent provides a foothold into the orchestrator. A compromised agent in the document's framing is capable of "altered files, changed access controls and deleted audit trails". The prescribed controls are end-to-end audit trails, zero trust applied to agent identities, and architectural separation of planning from execution.
The gap is that the document is silent on incident attribution and shared liability across the vendor-buyer boundary in multi-agent systems. When an orchestrator from Vendor A coordinates sub-agents from Vendor B and Vendor C, the question of which party owns the incident is unresolved.
The questionnaire ask: provide the vendor's incident attribution model for multi-agent compositions, including the log schema that allows a buyer's IR team to identify which agent took which action, the contractual owner of cross-vendor incidents, and the SLA for sub-agent compromise notification. The same agent-config decline-to-patch posture I covered in the OpenAI and Anthropic agent-config disclosures is what makes attribution language a procurement problem, not a vendor problem.
5. Accountability
The document defines Accountability risk as the difficulty of inspecting agentic decisions and parsing their logs. The prescribed controls are logging every agent action (not just failures) and end-to-end audit trails.
The gap is the absence of any standard for log format, retention duration, or query interface. "We log everything" is not an artifact; a 90-day retention policy with an unindexed JSON blob is functionally unauditable, which is exactly the gap ten production incidents with no postmortem reconstructions made visible last quarter.
The questionnaire ask: provide the agent action log schema, the retention duration in days, the query interface (API endpoint, SQL view, or export format), a sample query for the canonical forensics question ("what did agent X do between time T1 and T2"), and the vendor's policy on model-version drift logging when the underlying model is updated mid-contract. The last item closes a gap the joint guidance ignores entirely: agentic systems whose model substrate changes during the contract period.
How This Stacks With the Sector Regimes
Careful Adoption is horizontal cybersecurity guidance from six national agencies. It sits pre-procurement, before any sector regulator gets involved. That makes it different from the sector regimes already shaping AI vendor diligence: the FHFA mandate that Fannie Mae and Freddie Mac govern AI in the mortgage stack, the NAIC's emerging insurance AI vendor registry, and Colorado's now-stayed AI Act with its post-procurement documentation regime.
The horizontal floor and the vertical mandates do not cancel each other; they stack. A regional bank deploying an agentic underwriting workflow inherits the Five Eyes risk classes at the procurement gate, the FHFA AI governance expectations at integration, and the state insurance vendor registry obligations if the workflow touches a captive insurance subsidiary. A buyer who runs three separate questionnaires for these three regimes ends up with three separate records that disagree at the seams.
The practical move is to normalize: write one master AI vendor questionnaire whose rows are the Five Eyes risk classes (because they are the most general), and whose mapping columns reference each applicable sector regime. The same evidence request for Privilege satisfies the cross-vendor SOC 2 expectations I covered when looking at the shared Delve auditor pattern, the FHFA mortgage mandate, and the NAIC insurance registry simultaneously.
What Procurement Teams Should Do in the Next 90 Days
For any procurement team evaluating an agentic AI vendor between now and August 1, 2026, three actions move the questionnaire from theoretical to operational.
First, add a five-row addendum to the existing AI vendor questionnaire, with one row per risk class above and the specific evidence request named in this post. Reject vendor responses that describe the control instead of providing the artifact.
Second, negotiate the "high-impact action" threshold into the SOW before integration, expressed as concrete predicates the vendor's runtime can evaluate. The Five Eyes guidance is explicit that this decision belongs to the buyer's system designers, not the vendor's agent, and the negotiation is easier before money has changed hands than after.
Third, add a model-version drift clause to the contract requiring the vendor to notify the buyer of any underlying model change with a specified lead time and to rerun a defined acceptance test before the new model serves production traffic. The joint guidance does not name model drift as a risk class, but a buyer who has watched a vendor swap model substrates mid-contract has watched the diligence record go stale in real time.
The Five Eyes did the work of naming the risk classes. The work of turning each class into an artifact a vendor must hand over before signature falls to procurement, and the buyers who write that addendum in May 2026 are the ones whose 2027 audits will have something to show.