📝ai-hooks-guardrails-for-coding-agents.md

Building ai-hooks: Guardrails for AI Coding Agents

The Problem

The first generation of AI coding workflows was mostly about capability.

Can the model read files?
Can it run tests?
Can it edit code?
Can it open pull requests?

That is useful, but it is not the hard part anymore.

The real problem is this: an AI agent can be technically capable and still be operationally unsafe.

It can work on the wrong ticket. It can ignore a blocker that exists outside the repo. It can make a reasonable local change that violates a product decision made in Linear, Slack, or a planning doc. It can optimize for green tests while missing the actual business constraint.

That is the gap I built ai-hooks to close.

What ai-hooks Does

ai-hooks is a guardrail layer that sits around AI tooling. It does not replace the coding agent. It shapes the environment the agent works inside.

Think of it as middleware for autonomous development:

A request comes in from an AI tool.
ai-hooks identifies the project, user, task, and available context.
It injects the right context before the agent acts.
It evaluates the action against policies and boundaries.
It records what happened so humans and other agents can respond.

The key idea is simple:

Do not just give the agent tools. Give it rules, context, and a feedback loop.

Why Prompt Instructions Are Not Enough

You can put a lot of good text into a system prompt. That still does not solve three real problems.

1. Prompts Are Static

The rules that matter are often dynamic:

this issue is blocked
the release is frozen
the user only wants docs updated, not code
a migration needs approval before deployment
the PR is in a security-sensitive area

That kind of state changes constantly. It should not live only in a prompt.

2. Prompts Cannot Enforce Tool Boundaries

If an agent has access to file edits, git, cloud resources, and issue trackers, then some actions need real enforcement.

Examples:

prevent production deploys from non-approved sessions
require a linked work item before editing a billing service
block issue closure if tests were not run
stop the agent from changing code outside the agreed slice

That is policy, not prose.

3. Prompts Usually Lose the Feedback Loop

Humans need to know when the AI encountered friction:

a decision conflict
a missing requirement
a risky code path
a failing test that suggests a product bug instead of a code bug

If the agent sees that signal and no one else does, the system is incomplete.

The Three Guardrail Layers

I ended up modeling ai-hooks around three layers.

1. Context Guardrails

These make sure the agent has enough situational awareness before acting.

Examples:

inject the linked issue, acceptance criteria, and recent decisions
inject outcome-level context so the agent understands why the work matters
surface active blockers, risks, and known constraints
attach relevant docs and prior implementation patterns

This is the difference between "change the button" and "change the button without breaking the pricing experiment tied to this workflow."

2. Boundary Guardrails

These define what the agent is allowed to do.

Examples:

read-only access on audit sessions
no git push from exploratory sessions
no database migration generation without explicit approval
no edits outside the files matched to the work item
no closing issues when required validations fail

This is where autonomy becomes usable. Not by removing power, but by making power conditional.

3. Feedback Guardrails

These capture what the agent observed and did.

Examples:

log blocked actions with reason codes
create an issue comment when the AI finds ambiguity
emit a risk signal when an agent detects scope creep
store successful patterns for reuse by future agents

Without this layer, you end up with invisible autonomy. That is exactly what teams should avoid.

The Hook Model

The cleanest design I found was a small set of predictable interception points.

type HookContext = {
  projectId: string;
  sessionId: string;
  actor: 'human' | 'agent';
  toolName?: string;
  workItemId?: string;
  repo?: string;
};
 
type HookResult = {
  allow: boolean;
  contextToInject?: string[];
  warnings?: string[];
  reason?: string;
};
 
interface AiHooks {
  beforeSessionStart(ctx: HookContext): Promise<HookResult>;
  beforePrompt(ctx: HookContext, prompt: string): Promise<HookResult>;
  beforeToolCall(ctx: HookContext, input: unknown): Promise<HookResult>;
  afterToolCall(ctx: HookContext, output: unknown): Promise<void>;
  afterTaskComplete(ctx: HookContext, summary: string): Promise<void>;
}

This gives you enough leverage without turning the platform into a maze.

A Real Example: Blocking the Wrong Work

One of the most common failure modes in AI-assisted development is silent priority drift.

The AI sees a nearby bug. It looks easy. It fixes it. The fix is good. The tests pass.

But it was not the work the user asked for.

That sounds minor until you multiply it across dozens of sessions.

Here is the kind of hook policy I wanted:

export async function beforeToolCall(ctx: HookContext) {
  const workItem = await getWorkItem(ctx.workItemId);
  const activeBlockers = await getActiveBlockers(ctx.projectId);
 
  if (!workItem) {
    return {
      allow: false,
      reason: 'No linked work item for writable action.',
    };
  }
 
  if (activeBlockers.some(blocker => blocker.scope === workItem.scope)) {
    return {
      allow: false,
      reason: 'Work item is currently blocked by an active project signal.',
    };
  }
 
  return {
    allow: true,
    contextToInject: [
      `Outcome: ${workItem.outcome}`,
      `Acceptance criteria: ${workItem.acceptanceCriteria.join('; ')}`,
      `Recent decisions: ${workItem.decisions.join('; ')}`,
    ],
  };
}

The important part is not the TypeScript. It is the behavior.

The agent gets stopped before it burns time in the wrong direction, and it gets the missing context when the work is valid.

ai-hooks and MCP

MCP gives AI tools a standardized way to connect to external tools and data. That is the transport layer. ai-hooks is the control layer that decides how that access should be used.

MCP answers:

what tools exist
what resources can be read
what actions can be executed

ai-hooks answers:

should this session see those tools right now
what context should be injected before the tool is used
what policy must pass before the action is allowed
what follow-up signals should be emitted afterward

That combination is what makes AI tooling feel production-capable instead of demo-capable.

Product-Aware Guardrails

The most important design decision was that the guardrails should be product-aware, not just repo-aware.

Repo-aware guardrails can say:

do not edit billing/
require tests before commit
never push to main

Product-aware guardrails can say:

do not change onboarding copy while the experiment is still running
do not resolve this ticket until compliance approved the workflow
warn if the implementation conflicts with the decision made last week
attach the top two business risks tied to this outcome before the task begins

This is the shift from "AI assistant" to "AI teammate with context."

Hard Lessons From Building It

1. Too Many Rules Feels Like Latency

Every hook adds cost.

If every tool call triggers six network round trips and a giant context assembly pipeline, the agent becomes slow and unpleasant. So the guardrails need to be opinionated and minimal.

My rule became:

intercept the smallest number of moments that give the highest control.

2. Warnings Are Not Enough for High-Risk Actions

I started with soft warnings for many operations. That was a mistake.

For some actions, warnings are fine.

documentation changes
low-risk refactors
read-only analysis

For others, warnings are theater.

deploys
migrations
cross-boundary edits
issue state transitions

If an action is meaningfully risky, the hook needs to block it.

3. Teams Need an Audit Trail

Once agents become useful, people ask the same question every time:

"Why did it do that?"

So every blocked action, injected context bundle, and emitted signal needs a durable record.

Not because the AI is untrustworthy by default, but because operational systems need explainability.

A Practical Configuration Shape

I found it useful to keep configuration boring and explicit.

project: plannable
 
policies:
  writable_sessions_require_work_item: true
  block_deploys_without_approval: true
  require_tests_before_issue_close: true
 
context:
  inject_decisions: true
  inject_active_risks: true
  inject_acceptance_criteria: true
 
boundaries:
  protected_paths:
    - billing/**
    - auth/**
  allowed_tools:
    - read_file
    - edit_file
    - run_tests
    - create_pr
 
feedback:
  report_blockers: true
  create_risk_signals: true
  store_success_patterns: true

The system underneath can be sophisticated. The operating model should not be.

What Good Looks Like

When the system is working well, the developer barely notices the guardrails.

They notice that:

the AI already understands the ticket
risky actions require the right approvals
blocked work is surfaced early
decisions stop getting lost
the AI leaves an audit trail humans can trust

That is the bar.

Not maximum autonomy.

Reliable autonomy.

The Bigger Takeaway

Most teams do not need a more powerful coding model first.

They need a better operating envelope around the models they already have.

That means:

stronger context injection
explicit action boundaries
durable feedback loops
product-aware policies instead of repo-only rules

That is what ai-hooks is for.

The future of AI development tooling is not just better code generation. It is better control systems around autonomous behavior.