Intelligence shouldn't be gated by identity.
Building a private AI app from scratch means negotiating TEE vendor agreements, wiring up WebAuthn, managing per-user encryption keys, and routing data by jurisdiction. Most teams scope it once and shelve it.
Krava is that stack as a platform. One appkey gives your app passkey auth, encrypted memory, and private inference routing. Your users exist as public-key hashes. No emails, no names, nothing a data request can surface.
A single SDK call, krava.chat(), routes each inference to the cheapest private enclave that meets your policy. Tinfoil, Phala, NEAR AI, Prem, or our own TEE infrastructure. The routing is invisible. The privacy isn't.
Four guarantees. One integration.
Auth
Passkeys, not emails. Users onboard with biometrics. Their identity in your database is a public-key hash. No KYC, no email verification, no password resets.
Memory
AES-256-GCM. Per-user keys derived from their passkey. Encrypted blobs at rest. Even we can't read their sessions.
Inference
TEE-routed. Every request hits the cheapest private enclave that meets your policy: latency, cost, jurisdiction. Falls through to commercial APIs only when the workload allows.
SDK
@kravalabs/api-client. One call to chat. One call to read memory. No vendor-specific client. No 400-line auth boilerplate.
The passkey is the key. Not a metaphor.
Every session is encrypted before it leaves your device, with a key derived from your passkey. That key never touches our servers. We store ciphertext. Only your biometric unlocks it.
Switch apps. Switch devices. Your context travels with the credential, not an account. Delete the passkey and the vault is gone. No recovery path, no support ticket, no data left behind.
This is what user-sovereign memory looks like. Not a privacy policy. Not a delete button. Hardware-enforced, carrier-agnostic, yours by construction.
Three dimensions, one decision per call.
Latency
GPU TEE on H100 adds 4-8% compute overhead, imperceptible for a chat agent but relevant for a real-time voice loop. Krava picks the closest provider with the right model in the right region.
Cost
Per-token pricing across private inference spans 150x, from ~$0.01/M (NEAR AI on Qwen) to ~$1.50/M (Maple on GPT). Same model class, same TEE guarantee. We route to the cheapest one that meets your policy.
Jurisdiction
EU customers want EU-resident inference. Healthcare wants HIPAA-aligned enclaves. Some workloads want decentralized attestation, not Big-Cloud TEE. The router knows which provider satisfies which policy without the dev having to learn each one.
Policy vs. proof. Standard inference APIs let authorized employees access your conversation data for incident resolution. That's the nature of software running in cleartext. TEE inference is different. The model runs in hardware-encrypted memory; the CPU and GPU refuse to decrypt it for anyone, including the cloud provider. Attestation is a cryptographic proof you can verify yourself. Not a policy. Not an audit report.
Four anchor providers + our own.
These are the confidential-inference providers Krava routes to today. Each ships hardware-attested execution, OpenAI-compatible APIs, and zero data retention by default.
Tinfoil
NVIDIA H100 / H200 CC · Intel TDX
DeepSeek, Gemma, GPT-OSS, Qwen, Kimi
YC-backed. Verifiable encrypted inference. Default route for Krava sensitivity tier 1.
Prem AI
Secure Enclaves · Post-Quantum Encryption
Proprietary catalog · VLMs · voice transcription
Sovereign deployments for regulated industries. Healthcare, finance, government.
Phala Network
NVIDIA H100 / H200 / B300 · Intel TDX · AMD SEV
Qwen 3.5, Gemma 4, DeepSeek V4 Pro, Kimi K2.6, MiniMax M2.5
Two-hop RA-TLS, on-chain compose-hash registry, signed receipts. OpenRouter routes its enterprise tier here.
NEAR AI
NVIDIA H200 CC · Intel TDX
GPT, Qwen, ZhipuAI GLM
Cheapest end of the TEE market. ~$0.01 / M input tokens. NEAR Foundation-backed.
The wider landscape we evaluate.
Providers we've benchmarked or are integrating with. Each represents a different bet: hardware vendor, network architecture, or trust model. Not all make the default route, but all matter to the thesis.
| Provider | Hardware / Trust | Models |
|---|---|---|
| Anthropic Confidential Inference | NVIDIA H100 / H200 CC · SEV-SNP / TDX | Claude (enterprise gate) |
| Maple | AMD SEV-SNP | GPT, Moonshot, DeepSeek |
| Chutes | SEV-SNP · Intel TDX | Qwen, Gemma, DeepSeek |
| Privatemode | SEV-SNP · TDX | GPT, Gemma, Qwen, Mistral |
| NanoGPT | NVIDIA H100 CC | Qwen, GPT, DeepSeek, Gemma |
| Venice.ai | NVIDIA H100 CC | Gemma, GLM, GPT, Qwen |
| Spheron | NVIDIA H200 SXM5 (141 GB HBM3e) | Open inference layer |
| Armet AI (Fortanix) | NVIDIA + Intel SGX / TDX | Turnkey enterprise GenAI |
| Atoma Network | Decentralized TEE (Sui) | Variable (DePIN) |
| Marlin Protocol | TEE + ZK proofs | Inference Labs co-spec |
| Mind Network | FHE (no hardware trust) | Pure-crypto inference |
Three buckets: TEE (hardware trust, fastest today), decentralized (crypto-economic attestation, no single cloud), FHE (pure-crypto, no hardware trust, slowest but quantum-resistant). Krava routes across all three.
Apps that couldn't exist before this.
A mental health app a clinician can actually recommend. A legal research tool that doesn't require a conflict waiver. A leadership coach an executive can use from a work device without creating a discoverable record.
None of these ship today because assembling the stack correctly takes months, and most teams never start. The pieces exist. Tinfoil, Phala, Prem. Krava ties them together and exposes them as a 10-line integration.
You build the app. Your users get the privacy. Nobody gets a log.