Research
When an AI agent acts on your behalf, did it do what you actually meant?
Intent Alignment in Multi-Agent Systems
I built Clide, a production environment where 28 AI coding agents coordinate in real-time — writing code, managing infrastructure, and dispatching tasks across 2 sites. Every action flows through a human-in-the-loop permission system that generates rich behavioral data.
The Problem
The initial question was simple: can we classify permission requests as safe or unsafe? Early results looked promising — a 2-feature decision tree hit 96% accuracy. But that number was misleading. With a 0.3% deny rate, a classifier that always approves already scores 93.3%. The "96% accuracy" was barely better than doing nothing.
That failure pointed to the real question: safety classification is the wrong frame. The actual problem is intent alignment — did the agent do what the human actually meant? An agent that runs safe but irrelevant commands is still failing. An agent that takes a risky but correct action might be succeeding.
Key Findings
- 87% of permission requests timeout — humans can't keep up with the review queue, creating a bottleneck that degrades both safety and productivity
- 3–5 behavioral archetypes emerge naturally: coordinators, builders, research, specialist, and infrastructure operators — each with distinct risk profiles
- A regex-based classifier deployed in production matches 99.3% of 678 real human decisions — interpretable rules outperform complex models when the decision space is well-understood
- Intent routing, not binary approval, is the actual mechanism needed — mapping operator intent to appropriate action classes
Research Direction
The system generates continuous labeled data through a custom annotation tool (ClideClassify) where multiple reviewers label agent actions across dimensions of alignment, safety, and intent match. This produces ground-truth data for studying the gap between what a human requests and what an agent executes.
Current work focuses on: workflow chain analysis (multi-step intent tracking), operator archetype-aware oversight policies, and interpretable models for explainable approval decisions. The goal is not a better classifier — it's a better understanding of how human-agent collaboration actually works.
This sits at the intersection of AI control, scalable oversight, and human-computer interaction. It's empirical, measurable, and I have a working production system generating new data daily.
Publications
- CS 522 — Data Management (ODU, Spring 2026) — Class project: permission classification and intent alignment analysis using Clide production data.
- MODSIM World 2020 — First-author research on data de-identification and synthetic data security, including analysis of re-identification attacks.
Code & Data
Containerized multi-agent execution environment with human-in-the-loop permission system.
↗ GitHubMulti-reviewer annotation tool for labeling agent actions across alignment dimensions.
Browser-based notebook for data exploration (SQL, Python, JS) with real-time collaboration.