Back to Blog

vLLM / Container / Benchmark

Cropping the Official vLLM 0.11.0 Image into a Benchmark-Only Runtime

A note on building a benchmark-only container from the official vLLM image.

This project starts from the official vLLM 0.11.0 image. The goal is to preserve benchmark client behavior while removing unrelated runtime content.

Project Overview

The project keeps the official vllm/vllm-openai:v0.11.0 dependency set, benchmark CLI, tokenizer, dataset sampling, request functions, and small helper scripts.

The use case is a benchmark-only container for external OpenAI-compatible services. It helps compare service versions and run tests from a smaller runtime while keeping vLLM 0.11.0 command habits.

Image Layout

The build keeps the benchmark entry, OpenAI-compatible request dependencies, tokenizer dependencies, and helpers. Server components, development tools, temporary caches, and large test files are removed from the final image.

The entry maps to vllm.entrypoints.cli.main:main. That keeps familiar benchmark commands while narrowing the image responsibility.