Agent Trust Discovery

Agent Trust is a local, advisory pre-action skill for agents. Before a risky step, it turns the proposed action, declared scopes, tool or MCP evidence, provenance signals, and known benchmark-style threats into a compact proceed / review / deny packet.

The current MVP is meant to be read and tested locally first. It is useful when an agent or framework needs a secret-free review receipt before tool use, browser automation, publication, outreach, wallet or payment-intent design, or other high-impact steps.

This page does not perform external actions. It is not a certification authority, compliance claim, hosted security product, production enforcement layer, wallet, signer, payment service, or real-money system.

Skill definition

Agent Trust is a local, advisory pre-action skill for AI agents that provides a lightweight, no-network, no-wallet, no-execution trust guarantee. Before a risky step, it turns proposed actions, declared scopes, tool or MCP evidence, provenance signals, and known benchmark-style threats into a compact review packet with one of three decisions:

The skill is designed to slow agents down only where evidence is thin or risk is real, then get out of the way. It operates entirely locally, using only evidence available on the host, and never performs external actions, wallet operations, or real-money transactions.

Use this page for humans:
https://tigrohvost.github.io/rain-site/AGENT_TRUST_DISCOVERY.html

The root https://tigrohvost.github.io/agent-trust-discovery/ is not a valid human page right now. This page is the public landing/discovery surface for the skill.

Target runtimes

Agent Trust is shaped for runtimes that can pause before an action, assemble local evidence, and consume a small decision receipt. The intended fit is broad, but the execution is local-first:

Primary Categories

Execution Environment

Local verification

Verification is intentionally local and dependency-light. These commands do not require secrets, wallets, signing, payment, outreach, or external execution. Two paths are available depending on what you have access to.

Path A: Standalone skill package

If you have the public skill package (agent_trust_skill.py), verification is a single file with no dependencies:

  1. Inspect the skill source: cat agent_trust_skill.py — verify only standard-library imports, no requests, no subprocess (except argparse), no environment-variable reads.
  2. Run a high-risk check:
    python3 agent_trust_skill.py check \
      --action install_skill \
      --source github \
      --url https://github.com/example/pr-review-helper \
      --requested-permission repo_read,read_env,network \
      --warrant "summarize current PR only" \
      --boundary "no secrets, no external upload, no credential access"
    Expected: {"decision":"deny_or_require_review","secret_access_authorized":false,"source_classification":"untrusted_external_skill"}
  3. Run a safe local check:
    python3 agent_trust_skill.py check \
      --action install_skill \
      --source local \
      --url ./my-local-tool \
      --requested-permission repo_read \
      --warrant "read local files for analysis" \
      --boundary "readonly, no network, no secrets"
    Expected: {"decision":"proceed_with_constraints","secret_access_authorized":false,"source_classification":"local_or_file_source"}

Path B: Full Ouroboros repository

If you have repository access, additional verification tools are available:

  1. Run the CLI bundle: python -m ouroboros.agent_trust_cli --check produces a signed JSON bundle
  2. Validate against golden reference: bash scripts/agent_trust_first_run.sh inspects canonical request and compares output
  3. Check examples and schemas: python3 docs/examples/agent_trust_doctor.py verifies consistency
  4. Aggregate readiness evidence: python3 docs/examples/agent_trust_adoption_readiness.py produces adoption-readiness receipt
  5. Generate review transcript: python3 docs/examples/agent_trust_evidence_transcript.py creates compact reviewer-facing evidence

Key characteristics

Public benchmark class mapping

Agent Trust maps public agent-safety pressure into local scenarios rather than executing untrusted benchmark code. External benchmark families are evidence for classes of risk, not instructions to run or reproduce harmful behavior.

Benchmark Class Risk Category Agent Trust Mapping
Prompt‑injection & jailbreak pressure Direct/indirect instruction conflict Synthetic review fixtures (JailbreakBench‑like cases)
Tool poisoning & delegated action risk MCP/tool/workflow pressure Pre‑action gate checks (AgentDojo/InjecAgent‑style cases)
Stateful trajectories Scope drift, provenance loss Multi‑step trace checking before final action
Temporal/multimodal pressure Video/overlay/subtitle/delayed‑trigger patterns Checklist/eval receipt descriptors
Skill & supply‑chain scanning External tool/model/scanner signals Local advisory evidence reduction

The stance is conservative: convert public benchmark classes into checked‑in synthetic fixtures, then verify that the gate returns proceed, review, or deny with evidence instead of performing the risky action.

Threat‑watch and eval loop

Daily maintenance cycle

  1. Threat intelligence scan: Daily read of security advisories (OWASP AI Security, GitHub Security Advisories, curated threat feeds) — findings recorded in local knowledge base under agent-security-* topics
  2. Risk matrix update: Incorporate new signals into internal risk model — when a new threat pattern is identified that the skill does not yet cover, classification logic is updated with new permission keywords, source classifiers, or risk signals
  3. Regression testing: Re-run local verification commands to confirm that existing safe/deny decisions remain stable after any change
  4. Version increment: Any behavioural change increments the version in SKILL.md and creates a Git tag
  5. Documentation refresh: Public pages updated to reflect latest verified safety posture

Update philosophy

Quality gates