Research — ericiscool.net

▶ Intent Alignment in Multi-Agent Systems

I built Clide, a production environment where 28 AI coding agents coordinate in real-time — writing code, managing infrastructure, and dispatching tasks across 2 sites. Every action flows through a human-in-the-loop permission system that generates rich behavioral data.

The Problem

The initial question was simple: can we classify permission requests as safe or unsafe? Early results looked promising — a 2-feature decision tree hit 96% accuracy. But that number was misleading. With a 0.3% deny rate, a classifier that always approves already scores 93.3%. The "96% accuracy" was barely better than doing nothing.

That failure pointed to the real question: safety classification is the wrong frame. The actual problem is intent alignment — did the agent do what the human actually meant? An agent that runs safe but irrelevant commands is still failing. An agent that takes a risky but correct action might be succeeding.

Key Findings

87% of permission requests timeout — humans can't keep up with the review queue, creating a bottleneck that degrades both safety and productivity
3–5 behavioral archetypes emerge naturally: coordinators, builders, research, specialist, and infrastructure operators — each with distinct risk profiles
A regex-based classifier deployed in production matches 99.3% of 678 real human decisions — interpretable rules outperform complex models when the decision space is well-understood
Intent routing, not binary approval, is the actual mechanism needed — mapping operator intent to appropriate action classes

Research Direction

The system generates continuous labeled data through a custom annotation tool (ClideClassify) where multiple reviewers label agent actions across dimensions of alignment, safety, and intent match. This produces ground-truth data for studying the gap between what a human requests and what an agent executes.

Current work focuses on: workflow chain analysis (multi-step intent tracking), operator archetype-aware oversight policies, and interpretable models for explainable approval decisions. The goal is not a better classifier — it's a better understanding of how human-agent collaboration actually works.

This sits at the intersection of AI control, scalable oversight, and human-computer interaction. It's empirical, measurable, and I have a working production system generating new data daily.

▶ Publications

CS 522 — Data Management (ODU, Spring 2026) — Class project: permission classification and intent alignment analysis using Clide production data.
MODSIM World 2020 — First-author research on data de-identification and synthetic data security, including analysis of re-identification attacks.

▶ Code & Data

Clide shipping

Containerized multi-agent execution environment with human-in-the-loop permission system.

↗ GitHub

ClideClassify unreleased

Multi-reviewer annotation tool for labeling agent actions across alignment dimensions.

ClideKitchen unreleased

Browser-based notebook for data exploration (SQL, Python, JS) with real-time collaboration.