If a processor offers a CXL interface, accelerators can have access to the same data as the processor, avoiding the need to replicate data across the system.
Here's an example of how this helps the efficiency of your system:
Imagine you are designing a security camera application. There's a physical camera, and it dumps frames of data into system memory maybe 30-, 60-, 100-frames per second, or more. The processor takes those frames of data in the memory, and it recognizes a face, and another face, and another. The processor needs to parse out which face is Ted, which is Michael, and which is Sophia.
In the past, there was a lot of back and forth of the control and the copying of data to do this kind of operation. The CPU would have to tell the driver to copy the frames of data from memory and deliver it to the accelerator through the system bus. After the data was delivered to a memory buffer in the accelerator, the accelerator would analyze the data to determine who those faces were. All that data would then have to travel back through the system to the CPU that would write the names associated with the faces into the memory.
With CXL, instead of the driver copying the face data over to a buffer on the accelerator through the system bus, the accelerator has direct memory access. This means that the CPU can simply send pointers to the accelerator that say (for instance), "look at the addresses 1,000,000, 1,100,000, and 1,200,000 in the memory. Those are faces. Let me know who those face are." The accelerator can update the system memory directly, defining the faces as Ted, Michael, and Sophia without sending the data back and forth through the system.
With CXL, data only gets moved as the co-processor needs it, and even then, when it accesses a face, it does not copy all the data across the system bus, it only copies the information that is absolutely necessary—not the entire frame. This equates to less software overhead and latency, freeing your system up for better die-to-die communications.