Transformers backend utilities.
 module-attribute  ¶
 Style = Literal[
    "colwise",
    "colwise_rep",
    "rowwise",
    "rowwise_rep",
    "replicate",
]
 
 can_enable_torch_compile(vllm_config: VllmConfig) -> bool
Callable to be passed to @support_torch_compile's enable_if argument.
Defaults to True but is disabled in the following situations:
- The model uses dynamic rope scaling.
Source code in vllm/model_executor/models/transformers/utils.py
  
  Source code in vllm/model_executor/models/transformers/utils.py
  
 init_on_device_without_buffers(device: device)
A context manager under which models are initialized with all parameters on the specified device. However buffers are not initialized on specified device.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| device | `torch.device` | Device to initialize all parameters on. | required | 
Source code in vllm/model_executor/models/transformers/utils.py
  
    
 replace_linear_class(
    linear: Linear,
    style: Style = "replicate",
    quant_config: QuantizationConfig | None = None,
    *,
    prefix: str = "",
) -> (
    ColumnParallelLinear
    | RowParallelLinear
    | ReplicatedLinear
)
Replace nn.Linear with one of vLLM's tensor parallel linear classes.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| linear | Linear | 
 | required | 
| style | Style | Tensor parallel style of the new linear, e.g. "colwise". | 'replicate' | 
| quant_config | QuantizationConfig | None | Quantization config for the new linear. | None | 
Returns: The new linear.
Source code in vllm/model_executor/models/transformers/utils.py
  
  Replace a Transformers RMSNorm with vLLM's RMSNorm.
This method assumes: - Weight is stored as weight. - Epsilon is stored as eps or variance_epsilon. - with_scale indicates whether the layer has a weight (Gemma3n only). - var_hidden_size is only ever used for Intern vision encoder in vLLM and Transformers doesn't appear to have the same concept.