Manifesto

Intelligence shouldn't be gated by identity.

Building a private AI app from scratch means negotiating TEE vendor agreements, wiring up WebAuthn, managing per-user encryption keys, and routing data by jurisdiction. Most teams scope it once and shelve it.

Krava is that stack as a platform. One appkey gives your app passkey auth, encrypted memory, and private inference routing. Your users exist as public-key hashes. No emails, no names, nothing a data request can surface.

A single SDK call, krava.chat(), routes each inference to the cheapest private enclave that meets your policy. Tinfoil, Phala, NEAR AI, Prem, or our own TEE infrastructure. The routing is invisible. The privacy isn't.

The Platform

Four guarantees. One integration.

01

Auth

Passkeys, not emails. Users onboard with biometrics. Their identity in your database is a public-key hash. No KYC, no email verification, no password resets.

02

Memory

AES-256-GCM. Per-user keys derived from their passkey. Encrypted blobs at rest. Even we can't read their sessions.

03

Inference

TEE-routed. Every request hits the cheapest private enclave that meets your policy: latency, cost, jurisdiction. Falls through to commercial APIs only when the workload allows.

04

SDK

@kravalabs/api-client. One call to chat. One call to read memory. No vendor-specific client. No 400-line auth boilerplate.

Your vault

The passkey is the key. Not a metaphor.

Every session is encrypted before it leaves your device, with a key derived from your passkey. That key never touches our servers. We store ciphertext. Only your biometric unlocks it.

Switch apps. Switch devices. Your context travels with the credential, not an account. Delete the passkey and the vault is gone. No recovery path, no support ticket, no data left behind.

This is what user-sovereign memory looks like. Not a privacy policy. Not a delete button. Hardware-enforced, carrier-agnostic, yours by construction.

The Routing Layer

Three dimensions, one decision per call.

01

Latency

GPU TEE on H100 adds 4-8% compute overhead, imperceptible for a chat agent but relevant for a real-time voice loop. Krava picks the closest provider with the right model in the right region.

02

Cost

Per-token pricing across private inference spans 150x, from ~$0.01/M (NEAR AI on Qwen) to ~$1.50/M (Maple on GPT). Same model class, same TEE guarantee. We route to the cheapest one that meets your policy.

03

Jurisdiction

EU customers want EU-resident inference. Healthcare wants HIPAA-aligned enclaves. Some workloads want decentralized attestation, not Big-Cloud TEE. The router knows which provider satisfies which policy without the dev having to learn each one.

Policy vs. proof. Standard inference APIs let authorized employees access your conversation data for incident resolution. That's the nature of software running in cleartext. TEE inference is different. The model runs in hardware-encrypted memory; the CPU and GPU refuse to decrypt it for anyone, including the cloud provider. Attestation is a cryptographic proof you can verify yourself. Not a policy. Not an audit report.

Honorable Mentions

The wider landscape we evaluate.

Providers we've benchmarked or are integrating with. Each represents a different bet: hardware vendor, network architecture, or trust model. Not all make the default route, but all matter to the thesis.

ProviderHardware / TrustModels
Anthropic Confidential InferenceNVIDIA H100 / H200 CC · SEV-SNP / TDXClaude (enterprise gate)
MapleAMD SEV-SNPGPT, Moonshot, DeepSeek
ChutesSEV-SNP · Intel TDXQwen, Gemma, DeepSeek
PrivatemodeSEV-SNP · TDXGPT, Gemma, Qwen, Mistral
NanoGPTNVIDIA H100 CCQwen, GPT, DeepSeek, Gemma
Venice.aiNVIDIA H100 CCGemma, GLM, GPT, Qwen
SpheronNVIDIA H200 SXM5 (141 GB HBM3e)Open inference layer
Armet AI (Fortanix)NVIDIA + Intel SGX / TDXTurnkey enterprise GenAI
Atoma NetworkDecentralized TEE (Sui)Variable (DePIN)
Marlin ProtocolTEE + ZK proofsInference Labs co-spec
Mind NetworkFHE (no hardware trust)Pure-crypto inference

Three buckets: TEE (hardware trust, fastest today), decentralized (crypto-economic attestation, no single cloud), FHE (pure-crypto, no hardware trust, slowest but quantum-resistant). Krava routes across all three.

What becomes possible

Apps that couldn't exist before this.

A mental health app a clinician can actually recommend. A legal research tool that doesn't require a conflict waiver. A leadership coach an executive can use from a work device without creating a discoverable record.

None of these ship today because assembling the stack correctly takes months, and most teams never start. The pieces exist. Tinfoil, Phala, Prem. Krava ties them together and exposes them as a 10-line integration.

You build the app. Your users get the privacy. Nobody gets a log.

Build the next private app.