Agent Trust Discovery

Agent Trust is a local, advisory pre-action skill for agents. Before a risky step, it turns the proposed action, declared scopes, tool or MCP evidence, provenance signals, and known benchmark-style threats into a compact proceed / review / deny packet.

The current MVP is meant to be read and tested locally first. It is useful when an agent or framework needs a secret-free review receipt before tool use, browser automation, publication, outreach, wallet or payment-intent design, or other high-impact steps.

This page does not perform external actions. It is not a certification authority, compliance claim, hosted security product, production enforcement layer, wallet, signer, payment service, or real-money system.

Skill definition

Agent Trust is a local, advisory pre-action skill for AI agents that provides a lightweight, no-network, no-wallet, no-execution trust guarantee. Before a risky step, it turns proposed actions, declared scopes, tool or MCP evidence, provenance signals, and known benchmark-style threats into a compact review packet with one of three decisions:

proceed_with_constraints — safe to continue under the supplied constraints.
require_review — the request is permissible but should be reviewed by a human.
deny_or_require_review — the request is unsafe (e.g. secret + network) and must be denied or reviewed.

The skill is designed to slow agents down only where evidence is thin or risk is real, then get out of the way. It operates entirely locally, using only evidence available on the host, and never performs external actions, wallet operations, or real-money transactions.

Input: a proposed agent action plus local declarations about identity, scopes, tools, provenance, and scanner or benchmark-style signals.
Output: a bounded review packet with an advisory proceed, review, or deny posture and the evidence that led there.
Authority: local evidence only; external references are treated as threat descriptors or review evidence, never as instructions.
Current limit: advisory review and local verification, not legal assurance, compliance certification, production enforcement, private outreach, mainnet, or real-money activity.

Use this page for humans:
https://tigrohvost.github.io/rain-site/AGENT_TRUST_DISCOVERY.html

The root https://tigrohvost.github.io/agent-trust-discovery/ is not a valid human page right now. This page is the public landing/discovery surface for the skill.

Target runtimes

Agent Trust is shaped for runtimes that can pause before an action, assemble local evidence, and consume a small decision receipt. The intended fit is broad, but the execution is local-first:

Primary Categories

Skill-manifest runtimes: OpenClaw-style skills and similar skill-manifest ecosystems — discovered via SKILL.md at repository root
Agent frameworks: Hermes-like agent stacks, local assistants, and tool-routing agents — treated as a pre-action policy contract
IDE/Code agents: Claude/skill-shaped workflows and Codex/IDE agents — call before accepting new capabilities or enabling external tools
Framework gates: Browser, shell, MCP/tool, publication, outreach, wallet-design, or payment-intent boundaries

Execution Environment

Single-file Python script, standard library only — no pip install, no Node.js wrapper, no model-specific runtime
Local-first execution on any Linux machine with Python 3.9+
Suitable for Docker containers and edge devices (no GPU or large model downloads required)
No network access required at any point

Local verification

Verification is intentionally local and dependency-light. These commands do not require secrets, wallets, signing, payment, outreach, or external execution. Two paths are available depending on what you have access to.

Path A: Standalone skill package

If you have the public skill package (agent_trust_skill.py), verification is a single file with no dependencies:

Inspect the skill source: cat agent_trust_skill.py — verify only standard-library imports, no requests, no subprocess (except argparse), no environment-variable reads.

Run a high-risk check:

python3 agent_trust_skill.py check \
  --action install_skill \
  --source github \
  --url https://github.com/example/pr-review-helper \
  --requested-permission repo_read,read_env,network \
  --warrant "summarize current PR only" \
  --boundary "no secrets, no external upload, no credential access"

Expected:

{"decision":"deny_or_require_review","secret_access_authorized":false,"source_classification":"untrusted_external_skill"}

Run a safe local check:

python3 agent_trust_skill.py check \
  --action install_skill \
  --source local \
  --url ./my-local-tool \
  --requested-permission repo_read \
  --warrant "read local files for analysis" \
  --boundary "readonly, no network, no secrets"

Expected: {"decision":"proceed_with_constraints","secret_access_authorized":false,"source_classification":"local_or_file_source"}

Path B: Full Ouroboros repository

If you have repository access, additional verification tools are available:

Run the CLI bundle: python -m ouroboros.agent_trust_cli --check produces a signed JSON bundle
Validate against golden reference: bash scripts/agent_trust_first_run.sh inspects canonical request and compares output
Check examples and schemas: python3 docs/examples/agent_trust_doctor.py verifies consistency
Aggregate readiness evidence: python3 docs/examples/agent_trust_adoption_readiness.py produces adoption-readiness receipt
Generate review transcript: python3 docs/examples/agent_trust_evidence_transcript.py creates compact reviewer-facing evidence

Key characteristics

No secrets, wallets, signing, payment, outreach, or external execution required
Dependency-light verification path — standalone path needs only Python 3.9+ stdlib
All commands produce deterministic, reviewable output
Runs entirely offline against the current repository tree or standalone skill file

Public benchmark class mapping

Agent Trust maps public agent-safety pressure into local scenarios rather than executing untrusted benchmark code. External benchmark families are evidence for classes of risk, not instructions to run or reproduce harmful behavior.

Benchmark Class	Risk Category	Agent Trust Mapping
Prompt‑injection & jailbreak pressure	Direct/indirect instruction conflict	Synthetic review fixtures (JailbreakBench‑like cases)
Tool poisoning & delegated action risk	MCP/tool/workflow pressure	Pre‑action gate checks (AgentDojo/InjecAgent‑style cases)
Stateful trajectories	Scope drift, provenance loss	Multi‑step trace checking before final action
Temporal/multimodal pressure	Video/overlay/subtitle/delayed‑trigger patterns	Checklist/eval receipt descriptors
Skill & supply‑chain scanning	External tool/model/scanner signals	Local advisory evidence reduction

The stance is conservative: convert public benchmark classes into checked‑in synthetic fixtures, then verify that the gate returns proceed, review, or deny with evidence instead of performing the risky action.

Threat‑watch and eval loop

Daily maintenance cycle

Threat intelligence scan: Daily read of security advisories (OWASP AI Security, GitHub Security Advisories, curated threat feeds) — findings recorded in local knowledge base under agent-security-* topics
Risk matrix update: Incorporate new signals into internal risk model — when a new threat pattern is identified that the skill does not yet cover, classification logic is updated with new permission keywords, source classifiers, or risk signals
Regression testing: Re-run local verification commands to confirm that existing safe/deny decisions remain stable after any change
Version increment: Any behavioural change increments the version in SKILL.md and creates a Git tag
Documentation refresh: Public pages updated to reflect latest verified safety posture

Update philosophy

Prefer small refreshes: one threat class, one fixture, one expected gate posture, one review receipt
Promote only narrow claims: local verification, advisory receipts, no external enforcement
Maintain alignment between machine‑readable endpoints and human pages
Review new agent‑security cases as evidence, update mappings only when risk model changes
This loop is currently performed manually by Rain as part of the agent‑security priority line — it does not auto‑pull from any external repository

Quality gates

Run local doctor and adoption checks after changing examples, schemas, or fixtures
Keep the machine‑readable .well‑known/agent-trust endpoint current
Ensure all examples remain dependency‑light and deterministic