These implementations are enhanced with a unified memory architecture, which means that any CPU can access memory in another CPU cluster with similar access time, so that the software code can be agnostic to how the workloads are distributed between the different processing clusters. For these cases, it is critical that CPUs in one die can access memory in the other die with minimum latency while supporting cache coherency.
Often the link between the two dies will require cache coherency, leveraging the advantage of CXL or CCIX traffic to reduce link latency.
Maintaining a single unified memory architecture domain is typically possible if the link latency is in the range of 15 to 20 nanoseconds in each direction.
High performance heterogeneous compute architecture may also require coherency when both sides of the link share cache memory.
Applications such as IO access, where digital processing exists in a separate die from the IO functionality for flexibility and efficiency (IO examples can be electrical SerDes, optical, radio, sensors or others), typically don’t have coherency requirements and are more tolerant to link latency. For these cases, IO traffic is generally routed through standard protocols such as the AXI interface.
Similarly, parallel architectures such as GPUs and some categories of heterogeneous compute where an accelerator is connected to the CPU cluster, may only require IO coherency (if the accelerator die has no cache) or does not require coherency at all, as shown in Figure 3.