This file contains the command line arguments for the vLLM's OpenAI-compatible server. It is kept in a separate file for documentation purposes.
 
 Arguments for the OpenAI-compatible frontend server.
Source code in vllm/entrypoints/openai/cli_args.py
 | 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 |  | 
 class-attribute instance-attribute  ¶
 allow_credentials: bool = False
Allow credentials.
 class-attribute instance-attribute  ¶
  Allowed headers.
 class-attribute instance-attribute  ¶
  Allowed methods.
 class-attribute instance-attribute  ¶
  Allowed origins.
 class-attribute instance-attribute  ¶
  If provided, the server will require one of these keys to be presented in the header.
 class-attribute instance-attribute  ¶
 chat_template: str | None = None
The file path to the chat template, or the template in single-line form for the specified model.
 class-attribute instance-attribute  ¶
 chat_template_content_format: ChatTemplateContentFormatOption = "auto"
The format to render message content within a chat template.
- "string" will render the content as a string. Example: "Hello World"
- "openai" will render the content as a list of dictionaries, similar to OpenAI schema. Example: [{"type": "text", "text": "Hello world!"}]
 class-attribute instance-attribute  ¶
 disable_fastapi_docs: bool = False
Disable FastAPI's OpenAPI schema, Swagger UI, and ReDoc endpoint.
 class-attribute instance-attribute  ¶
 disable_frontend_multiprocessing: bool = False
If specified, will run the OpenAI frontend server in the same process as the model serving engine.
 class-attribute instance-attribute  ¶
 disable_uvicorn_access_log: bool = False
Disable uvicorn access log.
 class-attribute instance-attribute  ¶
 enable_auto_tool_choice: bool = False
Enable auto tool choice for supported models. Use --tool-call-parser to specify which parser to use.
 class-attribute instance-attribute  ¶
 enable_force_include_usage: bool = False
If set to True, including usage on every request.
 class-attribute instance-attribute  ¶
 enable_log_outputs: bool = False
If True, log model outputs (generations). Requires --enable-log-requests.
 class-attribute instance-attribute  ¶
 enable_prompt_tokens_details: bool = False
If set to True, enable prompt_tokens_details in usage.
 class-attribute instance-attribute  ¶
 enable_request_id_headers: bool = False
If specified, API server will add X-Request-Id header to responses.
 class-attribute instance-attribute  ¶
 enable_server_load_tracking: bool = False
If set to True, enable tracking server_load_metrics in the app state.
 class-attribute instance-attribute  ¶
 enable_ssl_refresh: bool = False
Refresh SSL Context when SSL certificate files change
 class-attribute instance-attribute  ¶
 enable_tokenizer_info_endpoint: bool = False
Enable the /get_tokenizer_info endpoint. May expose chat templates and other tokenizer configuration.
 class-attribute instance-attribute  ¶
 exclude_tools_when_tool_choice_none: bool = False
If specified, exclude tool definitions in prompts when tool_choice='none'.
 class-attribute instance-attribute  ¶
 h11_max_header_count: int = H11_MAX_HEADER_COUNT_DEFAULT
Maximum number of HTTP headers allowed in a request for h11 parser. Helps mitigate header abuse. Default: 256.
 class-attribute instance-attribute  ¶
 h11_max_incomplete_event_size: int = (
    H11_MAX_INCOMPLETE_EVENT_SIZE_DEFAULT
)
Maximum size (bytes) of an incomplete HTTP event (header or body) for h11 parser. Helps mitigate header abuse. Default: 4194304 (4 MB).
 class-attribute instance-attribute  ¶
 log_config_file: str | None = VLLM_LOGGING_CONFIG_PATH
Path to logging config JSON file for both vllm and uvicorn
 class-attribute instance-attribute  ¶
 log_error_stack: bool = VLLM_SERVER_DEV_MODE
If set to True, log the stack trace of error responses
 class-attribute instance-attribute  ¶
 lora_modules: list[LoRAModulePath] | None = None
LoRA modules configurations in either 'name=path' format or JSON format or JSON list format. Example (old format): 'name=path' Example (new format): {"name": "name", "path": "lora_path", "base_model_name": "id"}
 class-attribute instance-attribute  ¶
 max_log_len: int | None = None
Max number of prompt characters or prompt ID numbers being printed in log. The default of None means unlimited.
 class-attribute instance-attribute  ¶
  Additional ASGI middleware to apply to the app. We accept multiple --middleware arguments. The value should be an import path. If a function is provided, vLLM will add it to the server using @app.middleware('http'). If a class is provided, vLLM will add it to the server using app.add_middleware().
 class-attribute instance-attribute  ¶
 response_role: str = 'assistant'
The role name to return if request.add_generation_prompt=true.
 class-attribute instance-attribute  ¶
 return_tokens_as_token_ids: bool = False
When --max-logprobs is specified, represents single tokens as strings of the form 'token_id:{token_id}' so that tokens that are not JSON-encodable can be identified.
 class-attribute instance-attribute  ¶
 root_path: str | None = None
FastAPI root_path when app is behind a path based routing proxy.
 class-attribute instance-attribute  ¶
 ssl_ca_certs: str | None = None
The CA certificates file.
 class-attribute instance-attribute  ¶
  Whether client certificate is required (see stdlib ssl module's).
 class-attribute instance-attribute  ¶
 ssl_certfile: str | None = None
The file path to the SSL cert file.
 class-attribute instance-attribute  ¶
 ssl_keyfile: str | None = None
The file path to the SSL key file.
 class-attribute instance-attribute  ¶
 tool_call_parser: str | None = None
Select the tool call parser depending on the model that you're using. This is used to parse the model-generated tool call into OpenAI API format. Required for --enable-auto-tool-choice. You can choose any option from the built-in parsers or register a plugin via --tool-parser-plugin.
 class-attribute instance-attribute  ¶
 tool_parser_plugin: str = ''
Special the tool parser plugin write to parse the model-generated tool into OpenAI API format, the name register in this plugin can be used in --tool-call-parser.
 class-attribute instance-attribute  ¶
 tool_server: str | None = None
Comma-separated list of host:port pairs (IPv4, IPv6, or hostname). Examples: 127.0.0.1:8000, [::1]:8000, localhost:1234. Or demo for demo purpose.
 class-attribute instance-attribute  ¶
 trust_request_chat_template: bool = False
Whether to trust the chat template provided in the request. If False, the server will always use the chat template specified by --chat-template or the ones from tokenizer.
 class-attribute instance-attribute  ¶
 uds: str | None = None
Unix domain socket path. If set, host and port arguments are ignored.
 class-attribute instance-attribute  ¶
 uvicorn_log_level: Literal[
    "debug", "info", "warning", "error", "critical", "trace"
] = "info"
Log level for uvicorn.
 staticmethod  ¶
 add_cli_args(
    parser: FlexibleArgumentParser,
) -> FlexibleArgumentParser
Source code in vllm/entrypoints/openai/cli_args.py
  
  Bases: Action
Source code in vllm/entrypoints/openai/cli_args.py
  
 __call__(
    parser: ArgumentParser,
    namespace: Namespace,
    values: str | Sequence[str] | None,
    option_string: str | None = None,
)
Source code in vllm/entrypoints/openai/cli_args.py
  
 create_parser_for_docs() -> FlexibleArgumentParser
 
 make_arg_parser(
    parser: FlexibleArgumentParser,
) -> FlexibleArgumentParser
Create the CLI argument parser used by the OpenAI API server.
We rely on the helper methods of FrontendArgs and AsyncEngineArgs to register all arguments instead of manually enumerating them here. This avoids code duplication and keeps the argument definitions in one place.
Source code in vllm/entrypoints/openai/cli_args.py
  
 validate_parsed_serve_args(args: Namespace)
Quick checks for model serve args that raise prior to loading.