NOTE: This API server is used only for demonstrating usage of AsyncEngine and simple performance benchmarks. It is not intended for production use. For production use, we recommend using our OpenAI compatible server. We are also not going to accept PRs modifying this file, please change vllm/entrypoints/openai/api_server.py instead.
 async  ¶
 _generate(
    request_dict: dict, raw_request: Request
) -> Response
Source code in vllm/entrypoints/api_server.py
  async  ¶
  Generate completion for the request.
The request should be a JSON object with the following fields: - prompt: the prompt to use for the generation. - stream: whether to stream the results or not. - other fields: the sampling parameters (See SamplingParams for details).
Source code in vllm/entrypoints/api_server.py
  async  ¶
    async  ¶
 init_app(
    args: Namespace,
    llm_engine: AsyncLLMEngine | None = None,
) -> FastAPI
Source code in vllm/entrypoints/api_server.py
  async  ¶
 run_server(
    args: Namespace,
    llm_engine: AsyncLLMEngine | None = None,
    **uvicorn_kwargs: Any,
) -> None