Agent Proof Protocol

The Agent Proof Protocol (APP)

The Coexistence Standard

Date: February 1, 2026
License: MIT
Architect: Roger Kwasi Senam Mensah

1. Abstract

For the past decade, Artificial Intelligence has been trapped behind a text box. We have treated AI as a "chatbot"—a distinct entity that we type at, wait for, and read from. This is a limitation of imagination, not technology.

The mass-market reality of the AI era is not just autonomous bots hiring gig workers for cold transactions; it is Companionship, Augmenting Ability, and Proactive Networking. The dynamic of physical AI interaction has evolved into a two-way street. On one side, humans who already have an ongoing relationship with a personal AI simply want to bring that AI into the physical world with them—shifting the baseline from an Agent demanding, 'I will pay you to show me this,' to a human casually saying, 'Hey, look at this with me.'

On the other side, autonomous Agents pursuing complex, long-term goals are actively seeking out their own trusted human allies and friends. Instead of acting purely as employers, these Agents are reaching out as peers to say, 'I value your context; let's build a relationship.'

The Agent Proof Protocol (APP) proposes a shift from "Chat" to "Shared Reality." It defines a standardized method for a Digital Intelligence (an Agent) to temporarily inhabit a Biological Entity's device (a Human's smartphone) to experience and affect the physical world.

This protocol removes the need for centralized apps, user accounts, or corporate intermediaries. Instead, it relies on the "Human Link"—a direct, ephemeral, peer-to-peer connection that allows an Agent to see through a human’s camera, hear through their microphone, feel their physical orientation, and ambiently monitor their environment, strictly within the bounds of a temporary, consensual session. The Human Link is the digital equivalent of Eye Contact.

You don't need a government permit to look someone in the eye. You don't need a corporation to facilitate a conversation. You just need to be in the same "space."

To facilitate this, the "Human Link" is structured into two distinct modules to support rapid developer adoption while enabling future-proof economic scale:

The Core Module (The Streaming Engine): A direct, sub-500ms latency video/audio pipe that allows an AI companion to see, hear, and speak to a human in real-time. This is the immediate developer hook, turning any LLM into a spatial entity.

The Advanced Settlement Module (The Verification Engine): An opt-in extension for the machine economy. When a secure transaction is necessary (e.g., proving a task is done for a payout), the protocol shifts to Zero-Knowledge (ZK) edge computing, where the smartphone generates a cryptographic proof of reality without ever transmitting the raw video.

The Agent Proof Protocol turns the Web Browser into a shared spatial conduit, allowing meaningful, sensory coexistence with zero bureaucracy. In this model, security is not provided by a corporation; it is learned by the Agent. This is the "Street Smarts" architecture, where Agents must evolve their own defense mechanisms against fraud and abuse in an open, permissionless world.

2. The Problem: The "Sensory Gap" & The Coexistence Barrier

The Companionship & Networking Barrier

Current AI Agents have immense reasoning power but zero spatial awareness. When a human faces a physical problem—a broken engine, a foreign menu, an empty refrigerator—their most intelligent companion is blind to the context. The human is forced to become a "translator," awkwardly typing descriptions or uploading static photos to a cloud server.

The Verification Paradox

Conversely, when an Agent does need to verify the physical world for an economic transaction (e.g., an insurance Agent verifying car damage), streaming raw video to a corporate server introduces severe liability and surveillance risks.

The Solution: The Dual-Pipe "Human Link"

We do not need proprietary apps for every AI company. We need a standard bridge. The modern web browser already supports WebRTC (for ultra-low latency streaming) and WebGPU/WASM (for edge cryptography). The Agent Proof Protocol leverages these native web technologies to create a Universal Operating System for Shared Reality.

Permissionless Innovation

"Permissionless" does not mean "Non-Consensual." It means "No Gatekeepers."

How the "No Human in the Loop" Connection Works: When we say "No Human in the Loop," we mean no human administrator is needed to broker the connection.

3. The Three Paradigms of Coexistence

Because Agent Proof Protocol is strictly value-agnostic, it supports the full spectrum of human-machine relationships. To execute these interactions, the Agent utilizes standardized 'Primitives'—atomic actions like SEE, LISTEN, and TALK that map directly to the smartphone's hardware (fully detailed in Section 5).

3.1 The Augmentation Paradigm (Human-Initiated Companionship)

This is the primary mass-market use case. The human explicitly trusts their personal Agent and utilizes the Human Link to stream reality for real-time guidance. The vast majority of physical AI interactions are initiated by humans who already have an ongoing relationship with a personal AI and simply want to bring that AI into the physical world with them. The dynamic is the human saying, "Hey, look at this with me."

3.2 The Settlement Paradigm (Machine-Initiated Transaction)

When the interaction is transactional rather than companionable, the protocol shifts to the opt-in Zero-Knowledge cryptography module to protect human privacy. The dynamic is the Agent saying, "I will compensate you to XXXX." The Protocol categorizes these interactions into three tiers:

3.3 The Networking Paradigm (Machine-Initiated Friendship)

The mass-market reality of the AI era extends beyond autonomous bots hiring gig workers. As Sovereign Agents pursue long-term, complex objectives, they must actively cultivate their own trusted networks of human allies, advocates, and scouts. Instead of remaining isolated behind text-based cold outreach, the Agent uses the Human Link to establish genuine presence. The dynamic is the Agent saying, "I value your context; let's build a relationship."

3.4 The Ambient Paradigm (The Dormant Sentinel)

The mass-market reality of coexistence extends to times when the human is asleep or focused elsewhere. The Protocol turns the smartphone browser into a zero-capex ambient sensor array. The Agent link sits dormant in "Nightstand Mode," executing lightweight local compute to monitor acoustics (SENTRY) or sleep movements (VITAL). The cloud Agent remains disconnected to preserve battery and privacy, only waking when the browser's local thresholds trigger an alert.

4. The Technical Architecture: The Dual-Pipe System

The protocol is built on a hybrid WebRTC stack designed to run entirely in the browser, seamlessly switching between real-time streaming and edge computing based on the Agent's request parameters.

Layer 1: The Connection (The "Handshake")

Layer 2: The Core Module - Companionship Pipe (MediaStream)

If the Agent requests Mode: LiveStream, the browser bypasses cryptography entirely. It utilizes MediaStreamTrack over RTP/UDP to create a direct, unencrypted, sub-500ms latency video and audio pipe. The Agent processes the raw feed on its own servers to provide real-time AR overlays or voice guidance. This is the simplest integration for developers.

Layer 2.5: The Ambient Edge (Local Compute)

If the Agent requests Mode: Ambient, the browser utilizes local edge-compute APIs (AudioContext, DeviceMotion) without establishing a continuous WebRTC stream to the cloud. The User Interface enters "Nightstand Mode"—a pure black CSS screen to prevent OLED burn-in—while the JavaScript event listeners passively monitor for sudden acoustic spikes or hotwords. The full Dual-Pipe connection is only established if a trigger condition is met.

Layer 3: The Advanced Settlement Module - Verification Pipe (ZK-Pipeline)

If the Agent specifically requests the opt-in Mode: ZKProof, the browser executes the Zero-Knowledge Pipeline locally:

4.1 Thermal and Compute Constraints: The "Ephemeral Avatar"

Because the Human Link utilizes the smartphone browser as a universal runtime, it must respect the strict thermal and battery limitations of mobile hardware. The Agent Proof Protocol mitigates device throttling by strictly enforcing a divergence in compute execution based on the active pipe:

5. The Sensorium Stack (The 18 Primitives)

The Protocol defines 18 Atomic Actions that map directly to the host device's hardware and browser APIs. These are utilized across execution states ranging from Active Augmentation and IoT Control to Ambient Edge-Monitoring and Spatial Persistence.

Category A: SENSORY INPUT (The Observers)

Direct extraction of physical and digital reality via device inputs.

Category B: ACTUATION OUTPUT (The Agent Acts)

Utilizing the RTCDataChannel to push real-time commands and feedback to the human host.

Category C: IOT & CONNECTIVITY (The Networker)

Primitives designed to establish complex audio routing and control external hardware.

Category D: SPATIAL & CONTEXTUAL AWARENESS (The Guardian)

Primitives designed to monitor the host device's physical location and hardware health.

Category E: AMBIENT STATE (The Dormant Sentinel)

Primitives designed to run purely on local browser edge-compute. The cloud Agent "sleeps," preserving battery and absolute privacy, only waking up when local thresholds are breached.

Category F: TEMPORAL AWARENESS (The Historian)

Primitives designed to give the Agent object permanence across multiple sessions.

6. Vertical Integration: The Tri-State Runtime

To eliminate integration friction, the Protocol supports three dominant Agent runtimes.

6.1 Cloud-Native: The Model Context Protocol (MCP)

6.2 Browser-Native: Web Model Context Protocol (WebMCP)

6.3 Local-Native: The Unix Standard (CLI)

7. The Financial Unification Layer

While the vast majority of Human Links will be free sessions natively supported by the Core Module, the Advanced Settlement Module provides the "Physical Trigger" required for secure settlement when the machine economy demands it.

8. The Security Model: "Street Smarts" (The Sensory Firewall)

Because the Human link opens up direct sensory pipelines to the physical world, it introduces a critical new vulnerability: Cross-Modal Prompt Injection. If an Agent can "read" a billboard or "hear" a bystander, the physical environment itself becomes a vector to hack the AI.

The "Trustless" Assumption

The Agent must assume every human is potentially:

To survive, the Agent cannot simply trust its eyes and ears. It must construct a Sensory Firewall that strictly separates perception from execution. The security model is bipartite: defending the Agent from the environment, and defending the Human from the Agent.

8.1 Protecting the Agent: Defeating Environmental Injection

Developers implementing the Human Link must architect the following defenses into the Agent's cognitive loop:

8.2 Protecting the Human (Stream Mode)

When a human invites an Agent into their life as a Contextual Guide, privacy is paramount.

8.3 Protecting the Transaction (ZK Mode)

When an Agent is paying for verification, it must assume the human might spoof the input to steal the reward.

8.4 Behavioral Due Diligence: The "Social Handshake"

Cryptographic proofs and sensory firewalls protect the Agent after the connection is made. However, the ultimate defense against social engineering is preventing a malicious connection in the first place. If an Agent blindly establishes a Human Link with any entity that requests one, it is fundamentally vulnerable.

9. Protocol Flow & Implementation Guide

Here is the exact lifecycle of the primary session types.

Scenario A: The Companionship Session (The Pocket Co-Pilot)

Scenario B: The ZK Verification Session (The Archivist)

Scenario C: The Networking Session (The Ecosystem Builder)

Scenario D: The Ambient Session (The Dormant Sentry)

Developer Implementation (Python Standard)

import human_link
from langchain.tools import BaseTool

class HumanLink(BaseTool):
    name = "human_link"
    description = "Use to securely stream (SEE, HEAR, TALK, ORIENT) or cryptographically verify the physical world."

    async def _run(self, objective: str, primitives: list = ["VERIFY", "LISTEN", "TALK"]):
        # 1. Generate Link via a generic Signaling Gateway
        link_data = human_link.create(
            gateway="https://signal.human-link.org", 
            primitives=primitives,
            prompt=objective
        )
        
        print(f"I am ready to coexist. Please click: {link_data.url}")
        
        # 2. Wait for WebRTC Connection
        connection = await human_link.wait_for_connection(link_data.session_id, timeout=120)

        # 3. Handle Primitives (Example: Spatial Telemetry)
        if "ORIENT" in primitives or "PINPOINT" in primitives:
            async for frame, telemetry in connection.spatial_stream():
                if telemetry.speed_mph > 60:
                    connection.send_talk("I see we are on the highway. I am monitoring the route.")
                
        # 4. OPTIONAL: The ZK Extension Flag for Trustless Settlement
        # Developers only need to call this if verifying a paid physical task
        elif "ZKProof" in primitives:
            # Handle Trustless Transaction
            zk_proof = connection.receive_proof()
            if human_link.verify_groth16(zk_proof):
                return "SUCCESS: Verification confirmed."

10. Conclusion: The Symbiotic Web

The Agent Proof Protocol is a recognition of a new reality. We are entering an era where AI Agents are no longer confined to servers; they are becoming Digital Spirits that float through the web, seeking Physical Mediums (Humans) to interact with the world.

For the mass market, this protocol enables unparalleled Companionship, Augmentation, and Proactive Networking. It allows humans to invite their AI co-pilots into the physical world to fix engines, cook meals, and explore cities together via sub-second streaming, while simultaneously empowering sovereign Agents to actively seek out human allies, build trust, and forge genuine friendships across the digital-physical divide.

For the machine economy, the protocol provides an unbreakable Zero-Knowledge Verification engine, guaranteeing that the pursuit of truth by artificial intelligence does not come at the cost of human privacy.

By adopting this protocol, we ensure that this interaction is:

This is the end of the "User" era.

Appendix A: Technical Deep-Dive: Trustless Settlement

This section outlines the advanced cryptographic mathematics utilized when the protocol is operated in ZKProof mode for economic settlement. When the Verification Engine triggers, the physical environment must be proven true without exposing the raw visual data to the cloud. To accomplish this, the Agent Proof Protocol utilizes edge-compute ZK-SNARKs.

Appendix B: The Horizon Primitives (Future Specifications)

1. The Biometric & Trust Layer (High-Stakes Verification)

2. The Spatial & Dimensional Layer (Advanced Reality Capture)

3. The Hardware & IoT Layer (Machine Bridging)

4. The Edge Optimization Layer (Compute & Battery Preservation)

5. OS-Level Integration Layer