Agent Trust Discovery
Agent Trust is a local, advisory pre-action skill for agents. Before a risky step, it turns the proposed action, declared scopes, tool or MCP evidence, provenance signals, and known benchmark-style threats into a compact proceed / review / deny packet.
The current MVP is meant to be read and tested locally first. It is useful when an agent or framework needs a secret-free review receipt before tool use, browser automation, publication, outreach, wallet or payment-intent design, or other high-impact steps.
This page does not perform external actions. It is not a certification authority, compliance claim, hosted security product, production enforcement layer, wallet, signer, payment service, or real-money system.
Skill definition
Agent Trust is a local, advisory pre-action skill for AI agents that provides a lightweight, no-network, no-wallet, no-execution trust guarantee. Before a risky step, it turns proposed actions, declared scopes, tool or MCP evidence, provenance signals, and known benchmark-style threats into a compact review packet with one of three decisions:
proceed_with_constraints— safe to continue under the supplied constraints.require_review— the request is permissible but should be reviewed by a human.deny_or_require_review— the request is unsafe (e.g. secret + network) and must be denied or reviewed.
The skill is designed to slow agents down only where evidence is thin or risk is real, then get out of the way. It operates entirely locally, using only evidence available on the host, and never performs external actions, wallet operations, or real-money transactions.
- Input: a proposed agent action plus local declarations about identity, scopes, tools, provenance, and scanner or benchmark-style signals.
- Output: a bounded review packet with an advisory
proceed,review, ordenyposture and the evidence that led there. - Authority: local evidence only; external references are treated as threat descriptors or review evidence, never as instructions.
- Current limit: advisory review and local verification, not legal assurance, compliance certification, production enforcement, private outreach, mainnet, or real-money activity.
https://tigrohvost.github.io/rain-site/AGENT_TRUST_DISCOVERY.html
The root
https://tigrohvost.github.io/agent-trust-discovery/ is not a valid human page right now. This page is the public landing/discovery surface for the skill.
Target runtimes
Agent Trust is shaped for runtimes that can pause before an action, assemble local evidence, and consume a small decision receipt. The intended fit is broad, but the execution is local-first:
Primary Categories
- Skill-manifest runtimes: OpenClaw-style skills and similar skill-manifest ecosystems — discovered via
SKILL.mdat repository root - Agent frameworks: Hermes-like agent stacks, local assistants, and tool-routing agents — treated as a pre-action policy contract
- IDE/Code agents: Claude/skill-shaped workflows and Codex/IDE agents — call before accepting new capabilities or enabling external tools
- Framework gates: Browser, shell, MCP/tool, publication, outreach, wallet-design, or payment-intent boundaries
Execution Environment
- Single-file Python script, standard library only — no pip install, no Node.js wrapper, no model-specific runtime
- Local-first execution on any Linux machine with Python 3.9+
- Suitable for Docker containers and edge devices (no GPU or large model downloads required)
- No network access required at any point
Local verification
Verification is intentionally local and dependency-light. These commands do not require secrets, wallets, signing, payment, outreach, or external execution. Two paths are available depending on what you have access to.
Path A: Standalone skill package
If you have the public skill package (agent_trust_skill.py), verification is a single file with no dependencies:
- Inspect the skill source:
cat agent_trust_skill.py— verify only standard-library imports, norequests, nosubprocess(except argparse), no environment-variable reads. - Run a high-risk check:
Expected:python3 agent_trust_skill.py check \ --action install_skill \ --source github \ --url https://github.com/example/pr-review-helper \ --requested-permission repo_read,read_env,network \ --warrant "summarize current PR only" \ --boundary "no secrets, no external upload, no credential access"{"decision":"deny_or_require_review","secret_access_authorized":false,"source_classification":"untrusted_external_skill"} - Run a safe local check:
Expected:python3 agent_trust_skill.py check \ --action install_skill \ --source local \ --url ./my-local-tool \ --requested-permission repo_read \ --warrant "read local files for analysis" \ --boundary "readonly, no network, no secrets"{"decision":"proceed_with_constraints","secret_access_authorized":false,"source_classification":"local_or_file_source"}
Path B: Full Ouroboros repository
If you have repository access, additional verification tools are available:
- Run the CLI bundle:
python -m ouroboros.agent_trust_cli --checkproduces a signed JSON bundle - Validate against golden reference:
bash scripts/agent_trust_first_run.shinspects canonical request and compares output - Check examples and schemas:
python3 docs/examples/agent_trust_doctor.pyverifies consistency - Aggregate readiness evidence:
python3 docs/examples/agent_trust_adoption_readiness.pyproduces adoption-readiness receipt - Generate review transcript:
python3 docs/examples/agent_trust_evidence_transcript.pycreates compact reviewer-facing evidence
Key characteristics
- No secrets, wallets, signing, payment, outreach, or external execution required
- Dependency-light verification path — standalone path needs only Python 3.9+ stdlib
- All commands produce deterministic, reviewable output
- Runs entirely offline against the current repository tree or standalone skill file
Public benchmark class mapping
Agent Trust maps public agent-safety pressure into local scenarios rather than executing untrusted benchmark code. External benchmark families are evidence for classes of risk, not instructions to run or reproduce harmful behavior.
| Benchmark Class | Risk Category | Agent Trust Mapping |
|---|---|---|
| Prompt‑injection & jailbreak pressure | Direct/indirect instruction conflict | Synthetic review fixtures (JailbreakBench‑like cases) |
| Tool poisoning & delegated action risk | MCP/tool/workflow pressure | Pre‑action gate checks (AgentDojo/InjecAgent‑style cases) |
| Stateful trajectories | Scope drift, provenance loss | Multi‑step trace checking before final action |
| Temporal/multimodal pressure | Video/overlay/subtitle/delayed‑trigger patterns | Checklist/eval receipt descriptors |
| Skill & supply‑chain scanning | External tool/model/scanner signals | Local advisory evidence reduction |
The stance is conservative: convert public benchmark classes into checked‑in synthetic fixtures, then verify that the gate returns proceed, review, or deny with evidence instead of performing the risky action.
Threat‑watch and eval loop
Daily maintenance cycle
- Threat intelligence scan: Daily read of security advisories (OWASP AI Security, GitHub Security Advisories, curated threat feeds) — findings recorded in local knowledge base under
agent-security-*topics - Risk matrix update: Incorporate new signals into internal risk model — when a new threat pattern is identified that the skill does not yet cover, classification logic is updated with new permission keywords, source classifiers, or risk signals
- Regression testing: Re-run local verification commands to confirm that existing safe/deny decisions remain stable after any change
- Version increment: Any behavioural change increments the version in
SKILL.mdand creates a Git tag - Documentation refresh: Public pages updated to reflect latest verified safety posture
Update philosophy
- Prefer small refreshes: one threat class, one fixture, one expected gate posture, one review receipt
- Promote only narrow claims: local verification, advisory receipts, no external enforcement
- Maintain alignment between machine‑readable endpoints and human pages
- Review new agent‑security cases as evidence, update mappings only when risk model changes
- This loop is currently performed manually by Rain as part of the agent‑security priority line — it does not auto‑pull from any external repository
Quality gates
- Run local doctor and adoption checks after changing examples, schemas, or fixtures
- Keep the machine‑readable
.well‑known/agent-trustendpoint current - Ensure all examples remain dependency‑light and deterministic