Manifesto

Intelligence shouldn't be gated by identity.

Building a private AI app from scratch means negotiating TEE vendor agreements, wiring up WebAuthn, managing per-user encryption keys, and routing data by jurisdiction. Most teams scope it once and shelve it.

Krava is that stack as a platform. One appkey gives your app passkey auth, encrypted memory, and private inference routing. Your users exist as public-key hashes. No emails, no names, nothing a data request can surface.

A single SDK call, krava.chat(), routes each inference to the cheapest private enclave that meets your policy. Tinfoil, Phala, NEAR AI, Prem, or our own TEE infrastructure. The routing is invisible. The privacy isn't.

The Platform

Four guarantees. One integration.

Auth

Passkeys, not emails. Users onboard with biometrics. Their identity in your database is a public-key hash. No KYC, no email verification, no password resets.

Memory

AES-256-GCM. Per-user keys derived from their passkey. Encrypted blobs at rest. Even we can't read their sessions.

Inference

TEE-routed. Every request hits the cheapest private enclave that meets your policy: latency, cost, jurisdiction. Falls through to commercial APIs only when the workload allows.

SDK

@kravalabs/api-client. One call to chat. One call to read memory. No vendor-specific client. No 400-line auth boilerplate.

Your vault

The passkey is the key. Not a metaphor.

Every session is encrypted before it leaves your device, with a key derived from your passkey. That key never touches our servers. We store ciphertext. Only your biometric unlocks it.

Switch apps. Switch devices. Your context travels with the credential, not an account. Delete the passkey and the vault is gone. No recovery path, no support ticket, no data left behind.

This is what user-sovereign memory looks like. Not a privacy policy. Not a delete button. Hardware-enforced, carrier-agnostic, yours by construction.

The Routing Layer

Three dimensions, one decision per call.

Latency

GPU TEE on H100 adds 4-8% compute overhead, imperceptible for a chat agent but relevant for a real-time voice loop. Krava picks the closest provider with the right model in the right region.

Cost

Per-token pricing across private inference spans 150x, from ~$0.01/M (NEAR AI on Qwen) to ~$1.50/M (Maple on GPT). Same model class, same TEE guarantee. We route to the cheapest one that meets your policy.

Jurisdiction

EU customers want EU-resident inference. Healthcare wants HIPAA-aligned enclaves. Some workloads want decentralized attestation, not Big-Cloud TEE. The router knows which provider satisfies which policy without the dev having to learn each one.

Policy vs. proof. Standard inference APIs let authorized employees access your conversation data for incident resolution. That's the nature of software running in cleartext. TEE inference is different. The model runs in hardware-encrypted memory; the CPU and GPU refuse to decrypt it for anyone, including the cloud provider. Attestation is a cryptographic proof you can verify yourself. Not a policy. Not an audit report.

The Network

Four anchor providers + our own.

These are the confidential-inference providers Krava routes to today. Each ships hardware-attested execution, OpenAI-compatible APIs, and zero data retention by default.

Tinfoil

NVIDIA H100 / H200 CC · Intel TDX

DeepSeek, Gemma, GPT-OSS, Qwen, Kimi

YC-backed. Verifiable encrypted inference. Default route for Krava sensitivity tier 1.

tinfoil →

Prem AI

Secure Enclaves · Post-Quantum Encryption

Proprietary catalog · VLMs · voice transcription

Sovereign deployments for regulated industries. Healthcare, finance, government.

prem ai →

Phala Network

NVIDIA H100 / H200 / B300 · Intel TDX · AMD SEV

Qwen 3.5, Gemma 4, DeepSeek V4 Pro, Kimi K2.6, MiniMax M2.5

Two-hop RA-TLS, on-chain compose-hash registry, signed receipts. OpenRouter routes its enterprise tier here.

phala network →

NEAR AI

NVIDIA H200 CC · Intel TDX

GPT, Qwen, ZhipuAI GLM

Cheapest end of the TEE market. ~$0.01 / M input tokens. NEAR Foundation-backed.

near ai →

Krava TEE

Our infra

NVIDIA H100 / H200 · RunPod · multi-region

Provider-agnostic, routed via @kravalabs/api-client

Our own confidential inference plane, for when none of the upstream providers fit: latency floor, custom model, or jurisdiction.

Read the SDK →

Honorable Mentions

The wider landscape we evaluate.

Providers we've benchmarked or are integrating with. Each represents a different bet: hardware vendor, network architecture, or trust model. Not all make the default route, but all matter to the thesis.

Provider	Hardware / Trust	Models
Anthropic Confidential Inference	NVIDIA H100 / H200 CC · SEV-SNP / TDX	Claude (enterprise gate)
Maple	AMD SEV-SNP	GPT, Moonshot, DeepSeek
Chutes	SEV-SNP · Intel TDX	Qwen, Gemma, DeepSeek
Privatemode	SEV-SNP · TDX	GPT, Gemma, Qwen, Mistral
NanoGPT	NVIDIA H100 CC	Qwen, GPT, DeepSeek, Gemma
Venice.ai	NVIDIA H100 CC	Gemma, GLM, GPT, Qwen
Spheron	NVIDIA H200 SXM5 (141 GB HBM3e)	Open inference layer
Armet AI (Fortanix)	NVIDIA + Intel SGX / TDX	Turnkey enterprise GenAI
Atoma Network	Decentralized TEE (Sui)	Variable (DePIN)
Marlin Protocol	TEE + ZK proofs	Inference Labs co-spec
Mind Network	FHE (no hardware trust)	Pure-crypto inference

Three buckets: TEE (hardware trust, fastest today), decentralized (crypto-economic attestation, no single cloud), FHE (pure-crypto, no hardware trust, slowest but quantum-resistant). Krava routes across all three.

What becomes possible

Apps that couldn't exist before this.

A mental health app a clinician can actually recommend. A legal research tool that doesn't require a conflict waiver. A leadership coach an executive can use from a work device without creating a discoverable record.

None of these ship today because assembling the stack correctly takes months, and most teams never start. The pieces exist. Tinfoil, Phala, Prem. Krava ties them together and exposes them as a 10-line integration.

You build the app. Your users get the privacy. Nobody gets a log.

Build the next private app.

Enter the Hackathon →Read the SDK