The Compute Express Link™ (CXL™) Consortium recently announced the CXL 4.0 specification, marking the next major step in the evolution of coherent interconnects. Each CXL generation has advanced system design in new ways—first establishing coherency and memory attach, then expanding device management, pooling, and switching. Now the focus has shifted to bandwidth. With CXL 4.0 aligning to PCI Express 7.0 at 128 GT/s, architects are turning their attention to how data can move faster and more efficiently across CPUs, accelerators, and memory pools at rack scale.
Today, most designs still revolve around CXL 3.x fabrics. Those architectures enable coherent memory sharing and expansion, helping system designers balance performance and utilization across disaggregated components. But as AI, analytics, and high-performance computing (HPC) workloads continue to scale, attention is quickly moving to CXL 4.0-class bandwidth. Higher data rates mean not just faster signaling, but new topologies for linking compute and memory domains with the precision and throughput next-generation systems demand.
While the public specification has only recently reached CXL 4.0, many design teams have already been preparing for this transition. The step to 128-Gig SerDes operation represents more than a doubling of link speed—it introduces refinements in how data moves through the system. Techniques such as lane bundling and streamlined ports are designed to improve effective throughput, allowing more useful data to be transferred per cycle. These architectural changes complement tighter channel requirements, better equalization, and lower energy per bit, creating a foundation for coherent fabrics that operate efficiently at very high speeds.
For system architects, CXL 4.0’s evolution means every layer of design—electrical, logical, and architectural—must work in concert. The technology no longer sits only between processors and memory devices; it now defines how compute clusters communicate and scale in a multi-die, multi-chip environment.
The momentum behind CXL 4.0 is best understood through its three primary use cases. Each reflects a real and growing need for coherent, high-bandwidth interconnects that stretch beyond a single package or socket.
Figure 1: CXL use cases in compute and AI expansion
The first is memory pooling and expansion, the most widely deployed CXL capability today. In this configuration, the host CPU connects to external memory controllers through CXL links, extending memory capacity and allowing multiple hosts to share resources coherently. It’s a practical approach to improving utilization and flexibility across data-center nodes. As signaling speeds rise to 128 GT/s, this model becomes even more compelling—remote memory access begins to approach local latency, allowing servers to operate with much greater efficiency.
The second use case is CPU-to-AI accelerator connectivity. AI workloads create extraordinary data-movement challenges as models and datasets grow. CXL’s coherent, low-latency data paths allow CPUs and accelerators to share memory directly, maintaining cache consistency without costly software workarounds. At CXL 4.0 bandwidths, these links can sustain the throughput needed for large-scale training and inference, where every cycle and every bit of latency counts.
The third is CPU-to-CPU scale-out. As processors approach physical limits in die size and pin count, designers are increasingly scaling performance horizontally—linking multiple CPUs together to behave as one. High-speed, coherent CXL connections enable this expansion at the chip-to-chip or board-to-board level. The result is a more flexible, distributed compute fabric where bandwidth directly determines scalability.
Together, these three scenarios—memory pooling, CPU-to-AI connectivity, and CPU-to-CPU expansion—define why CXL 4.0 exists. Each depends on coherent, high-bandwidth communication to realize the next generation of performance and efficiency.
Because CXL builds on the same electrical foundation as PCIe, the two technologies often coexist in a system, each optimized for different goals. PCIe remains the backbone for broad device interoperability and transaction-rich I/O. It provides the structure, ordering, and flexibility required for diverse traffic and complex data management.
CXL, by contrast, is purpose-built for low-latency, coherent data movement. It strips away unnecessary layers of packetization, creating a more direct link between memory-centric endpoints. This streamlined design allows faster handoffs between compute, accelerators, and memory devices. Designers select the right tool for the task: PCIe when transaction depth and flexibility are key; CXL when coherence and latency dominate. The two work hand-in-hand to connect an increasingly heterogeneous computing environment.
For most teams, the immediate challenge isn’t whether to adopt CXL 4.0—it’s how to prepare for it while finalizing CXL 3.x and PCIe 6.0 products. The shift to PAM4 signaling and FLIT mode in the current generation was significant, demanding new methodologies in signal integrity, verification, and compliance. Now, with PCIe 7.0 and CXL 4.0 targeting even higher speeds, engineers must validate their channels, connectors, and equalization strategies earlier in the design cycle.
Fortunately, the transition from PCIe 6.x/CXL 3.x to PCIe 7.0/CXL 4.0 builds on an established base. Many of the tools, flows, and learnings from current designs carry forward. What changes is the timing and the performance envelope: faster links, tighter margins, and more concurrent design activity between generations.
Figure 2: World's First CXL 3.1 Multi-Vendor Interoperability Demo Showcases New Memory Possibilities for Hyperscale Data Centers. Read blog.
Synopsys helps teams navigate this shift. We already support the underlying PHY bandwidth that CXL 4.0 requires, with 128 GT/s SerDes designed for PCIe 7.0 signaling and power efficiency. Synopsys CXL IP delivers proven interoperability—demonstrated in multi-vendor demos at industry events—providing a stable platform for customers as they plan and prototype their next-generation fabrics. This foundation allows teams to characterize their 128-Gig channels now while continuing production of their existing products.
Synopsys CXL 4.0 Verification IP—now available for early-access customers—delivers robust compliance with the CXL 4.0 specification and seamless integration with the PCIe 7.0 interface. Engineered to address critical demands in performance, scalability, fabric management, and security, it empowers data center workloads, especially those driving AI and Machine Learning innovation. Developed in collaboration with leading industry partners, this solution builds on proven CXL architecture and introduces advanced application and configuration interfaces for streamlined access to the CXL protocol stack.
Bandwidth is now the leading factor shaping system architecture. With the arrival of the CXL 4.0 specification, the industry has entered a new phase where coherent interconnects are defined by speed, efficiency, and readiness for scale. Designers building for AI, HPC, and data-center applications can start planning today—evaluating physical channels, simulation margins, and architectural trade-offs—to ensure their designs are ready for 128 GT/s performance.
At Synopsys, we’re working alongside customers and partners to accelerate that readiness. Whether you’re ramping CXL 3.x deployments or exploring 4.0-class feasibility, our industry-leading IP portfolio, verification tools, and silicon-proven SerDes technology provide a path forward. Bandwidth may define the future—but preparation defines who gets there first.