For Type 2 Devices CXL has defined two coherency “Biases” that govern how CXL processes the coherent data between Host - and Device-attached memory. The bias modes are referred to as Host bias and Device bias, and the operating mode can change as needed to optimize performance for a given task during operation of the link.
When a Type 2 Device (e.g., an accelerator) is working on data between the time-of-work submission to the Host and its subsequent completion, the Device bias mode is used to ensure that the Device can access its Device-attached memory directly without having to talk to the Host’s coherency engines. Thus, the Device is guaranteed that the Host does not have the line cached. This gives the best latency performance possible to the Device, making Device bias the main operating mode for work execution by the accelerator. The Host can still access Device-attached memory when it’s in Device bias mode, but the performance will not be optimal.
The Host bias mode prioritizes coherent access from the Host to the Device-attached memory. It is typically used during work submission when data is being written from the Host to the Device-attached memory, and it is used for work completion when the data is being ready out of the Device-attached memory by the Host. In Host bias mode, the Device-attached memory appears to the Device just like Host-attached memory, and if the Device requires access, it is handled by a request to the Host.
The bias mode can be controlled using either software or hardware via the two supported mode management mechanisms, which are software-assisted and hardware autonomous. An accelerator or other Type 2 Device can choose the bias mode, and if neither mode is selected, the system defaults to the Host bias mode such that all accesses to Device-attached memory must be routed through the Host. The bias mode can be changed with a granularity of a 4KB page and is tracked via a bias table implemented within the Type 2 Device.
An important feature of the CXL standard is that the coherency protocol is asymmetric. The Home caching agent resides only in the Host. Thus, the Host controls the caching of memory, which resolves system wide coherency for a given address from the attached CXL Device requests. This is in contrast to existing proprietary and open coherency protocols that are in use, particularly those for CPU-to-CPU connection, as they are generally symmetric, making all interconnected devices peers.
While this has some advantages, a symmetric cache coherency protocol is more complex, and the resulting complexity has to be handled by every Device. Devices with different architectures may take different approaches to coherency that are optimized at the micro-architecture level, which can make broad industry adoption more challenging. By using an asymmetric approach controlled by the Host, different CPUs and accelerators can easily become part of the emerging CXL ecosystem.