Two model implementation files inside vLLM (versions 0.10.1 through 0.17.x) hardcode trust_remote_code=True when loading LLM sub-components. This silently overrides any user-configured --trust-remote-code=False flag — a security setting organizations explicitly enable to prevent arbitrary remote code execution.
An attacker who can cause a vulnerable vLLM instance to load a crafted model repository (via Hugging Face, an internal model registry, or a supply chain substitution) achieves arbitrary code execution inside the inference process. No authentication to vLLM is required once a malicious model is in scope; the hardcoded flag does the rest.
vLLM is downloaded roughly 3.4 million times monthly and is deployed as production inference infrastructure at enterprises, cloud providers, and AI startups alike. The inference process typically holds service credentials with broad network and filesystem access, making this a high-value pivot point.
Immediate recommendation: Upgrade to vLLM ≥ 0.18.0. If upgrade is not immediately possible, audit all model sources loaded by your vLLM instances and restrict model registry access to verified, signed repositories only.