module-attribute  ¶
   module-attribute  ¶
 device = (
    get_available_device()
    if get_available_device() != "hip"
    else "cuda"
)
 module-attribute  ¶
 is_intel_alchemist = (
    is_intel and "Intel(R) Arc(TM) A" in get_device_name(0)
)
 module-attribute  ¶
 is_nvidia_hopper = is_nvidia and (
    "NVIDIA H" in get_device_name(0)
    or get_device_capability()[0] >= 9
)
 module-attribute  ¶
 is_tma_supported = (
    is_nvidia
    and get_device_capability(0)[0] >= 9
    and (
        hasattr(
            language, "_experimental_make_tensor_descriptor"
        )
        or hasattr(language, "make_tensor_descriptor")
    )
)
 module-attribute  ¶
 use_cuda_graph = (
    is_nvidia and get("FLA_USE_CUDA_GRAPH", "0") == "1"
)
 cached  ¶
 _check_platform() -> Literal[
    "nvidia", "amd", "intel", "musa"
]
Source code in vllm/model_executor/layers/fla/ops/utils.py
  cached  ¶
  Source code in vllm/model_executor/layers/fla/ops/utils.py
  
  Source code in vllm/model_executor/layers/fla/ops/utils.py
   
  A decorator to make sure all input tensors are contiguous and set the device based on input tensors.
Source code in vllm/model_executor/layers/fla/ops/utils.py
  
  A decorator that caches the most recent results of a function with tensor inputs.
This decorator will store the output of the decorated function for the most recent set of input tensors. The cache is limited to a fixed size (default is 4). When the cache is full, the oldest entry will be removed.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| fn | Callable[..., Tensor] | The function to be decorated. It should take tensor inputs and return tensor outputs. | required | 
Returns:
| Type | Description | 
|---|---|
| Callable[..., Tensor] | Callable[..., torch.Tensor]: A wrapped version of the input function with single-entry caching. |