The Air-Gapped LLM Blueprint: What Egress-Free Deployments Actually Need
The cloud AI playbook assumes one primitive that nobody writes down: outbound HTTPS. Vendor APIs, hosted judges, telemetry pipelines, model registries, vector stores, dashboard SaaS, secret managers — every one of them quietly resolves to a domain on the public internet. Pull that one cable and the stack does not degrade gracefully. It collapses.
That is the moment most teams discover their architecture has an egress dependency they never accounted for. A "small" prompt update needs to call out to a hosted classifier. The eval suite hits an LLM judge over the wire. The observability agent phones home. The model registry pulls weights from a CDN. None of it is malicious, and none of it is unusual. It is just what the cloud-native stack looks like when you stop noticing the cable.
Defense, healthcare, and financial-services deployments increasingly cannot tolerate that cable. The reasons are non-negotiable: data classification, residency rules, contractual exclusivity, lateral-movement risk, regulator-defined custody chains. Los Alamos National Laboratory moved its LLM stack on-prem in early 2025 to handle Controlled Unclassified Information, ITAR-flagged data, and Unclassified Controlled Nuclear Information. A hospital running diagnostic copilots over PHI cannot route prompts through a vendor inference endpoint. A broker-dealer governed by FINRA Rule 3110 and SEC Regulation S-P does not have a clean answer when client portfolios traverse a third-party API.
"Just self-host an open-weight model" understates the operational surface by an order of magnitude. The model is the easy part. What follows is the blueprint for everything else.
The Egress Audit Comes First, Not the Inference Server
Before a single GPU is racked, the team has to answer a question most teams cannot: list every outbound network call your AI stack makes today. Not just the inference API. The hosted judge. The Hugging Face download. The prompt-monitoring SaaS. The vector database's telemetry. The OpenTelemetry collector that ships traces to a managed backend. The Slack webhook your eval pipeline pings on regression. The npm postinstall script in the SDK.
A useful exercise: run the existing stack inside a network namespace with a default-deny egress rule and watch what breaks. The breakages map almost one-for-one to the new primitives the air-gapped version has to grow. Most teams find five to fifteen distinct egress dependencies they did not know they had.
The temptation is to handle each as a one-off — a local mirror here, a configuration flag there. The discipline that actually scales is treating the egress surface as a first-class architectural concern, with an explicit list of allowed destinations (often: the empty set, or a tightly-controlled internal mirror), a CI check that fails the build when a new dependency is introduced, and a network policy that enforces the same rule in production. Without that gate, the air-gap claim erodes one quietly-added dependency at a time.
Model Artifact Provenance Is the Hardest Problem
Inside the boundary, the model file is no longer something you pip install. It is a regulated artifact whose provenance the team has to defend in an audit. Three problems compound here.
The supply chain is poisoned by default. Open-weight model repositories are the new target. Researchers have already demonstrated that the Hugging Face Safetensors conversion service can be compromised to hijack submitted models, and OWASP's LLM Top 10 lists supply-chain risks (LLM03:2025) as a primary class of attacks. SafeTensors format mitigates the worst of pickle-style code execution, but it does not solve provenance. There is still no widely-adopted mechanism for cryptographically signing weights and verifying that signature at load time. The team has to build that gate themselves: hash-pin every model, sign the artifact against an internal key during ingestion, and refuse to load anything whose signature does not verify.
The dependency tree is wider than software. A traditional SBOM tracks libraries. An AI/ML Bill of Materials (MLBOM, increasingly published in CycloneDX format) has to track the model, the tokenizer, the safety classifier, the LoRA adapters, the merged checkpoints, the quantization tooling that produced the deployed weights, the eval suite that gated the release, and the licenses attached to every link in that chain. A fine-tune of a fine-tune of a fine-tune can drag in a license clause from a base model the team never agreed to. The MLBOM is the only artifact that makes the chain auditable.
Updates are not idempotent. In the cloud version, a model bump is a config change. In the air-gapped version, a model bump is an artifact transfer that has to clear a signed-bundle release process, a security re-review, and a chain-of-custody log. The model weights themselves can be 10–400 GB and have to move on encrypted media or through a one-way data diode. Every update is a release; every release is a paperwork event. The team that built monthly model bumps into their roadmap discovers they shipped a process the security organization will not approve.
The Eval Stack Has to Live Inside the Boundary
The cloud AI eval pipeline almost always has a hosted dependency: an LLM judge that calls GPT-4 or Claude over the public internet, a benchmark dataset pulled from Hugging Face at runtime, a results dashboard that uploads metrics to a SaaS observability tool. None of it survives the air-gap.
- https://blog.dreamfactory.com/government-and-defense-air-gapped-llm-data-access-dreamfactory
- https://www.getdynamiq.ai/post/mastering-llm-security-an-air-gapped-solution-for-high-security-deployments
- https://thesoogroup.com/blog/sandboxed-ai-deploying-llms-airgapped
- https://datacendia.com/learn/air-gapped-ai-deployment/
- https://iternal.ai/best-ai-air-gapped-environments
- https://cyclonedx.org/capabilities/mlbom/
- https://www.paloaltonetworks.com/cyberpedia/what-is-an-ai-bom
- https://www.wiz.io/academy/ai-security/ai-bom-ai-bill-of-materials
- https://genai.owasp.org/llmrisk/llm032025-supply-chain/
- https://media.defense.gov/2026/Mar/04/2003882809/-1/-1/0/AI_ML_SUPPLY_CHAIN_RISKS_AND_MITIGATIONS.PDF
- https://www.wiz.io/academy/ai-security/malicious-ai-models
- https://www.secondfront.com/resources/blog/understanding-dod-cloud-computing-impact-levels/
- https://terrazone.io/zero-trust-dual/
- https://edenlab.io/blog/hipaa-compliant-ai-best-practices
- https://www.mindstudio.ai/blog/local-ai-regulated-professionals-compliance
- https://blog.premai.io/ai-data-residency-requirements-by-region-the-complete-enterprise-compliance-guide/
- https://www.mdpi.com/2079-9292/15/1/56
- https://github.com/EleutherAI/lm-evaluation-harness
- https://www.rand.org/pubs/tools/TLA4547-1.html
- https://llm-d.ai/blog/production-grade-llm-inference-at-scale-kserve-llm-d-vllm
