While PCs and laptops used to handle heavy computing workloads, this has shifted to data centers, the workhorses of our increasingly AI-driven digital world. Globally, data creation is anticipated to reach 180 zettabytes by 2025, according to statista. For the hyperscale data centers managing all this information, high bandwidth and low latency are key cornerstones to keep the digital world turning.
Data center architectures are changing in response to increasing data demands. In the hyperscale world, the trend is toward disaggregation, where homogeneous resources such as compute, storage, and networking are in separate boxes. The boxes are connected via optical interconnects, and a central intelligence unit determines and pulls just what is needed from each of the boxes. This setup frees remaining resources for other workloads.
Disaggregation allows memory pooling, which has become increasingly important in data centers. In a traditional data center architecture, each server has its own set of memory. Any application that runs on this server can access the memory associated with the server, limiting how much memory any particular application can use. Today’s data-driven applications, such as large language models (LLMs) like ChatGPT, are extremely thirsty for memory. Regardless of how much memory is allotted for a given server, applications like LLMs will figure out a way to demand more memory. So, the best way to solve this dilemma is to remove the memory wall and allow sharing and pooling of memory resources among multiple servers. When any program running on a particular server can access the memory pool based on the needs of the application, then overall memory usage becomes more efficient. Even if the request is a large amount of memory, pooled memory has capacity in the hundreds of terabytes, facilitating large memory applications.
In data center architectures that use accelerators, CXL provides the conduit for direct memory access, where accelerators can access the same data as the processor. This approach avoids the need to replicate data across the system. CXL takes care of memory allocation by assigning the appropriate amount of memory to applications in need and then releasing the memory back to the pool once the application has finished with it. This lowers latency and requires less software overhead, so the system is freed up to deliver better die-to-die communication in the multi-die systems that are becoming increasingly popular in data centers.
It’s no wonder why the CXL standard, with its cache coherency and extremely low latency, is quickly gaining traction among data center SoC designers. Its computational offloading capability and interoperability with PCI Express present a broad range of design possibilities. By enabling disaggregation of memory and other peripheral components, CXL is integral to the emerging composable memory architecture. The standard’s power efficiency also is critical in mitigating the energy demands of today’s data centers, which consume roughly 1% of the world’s energy.