AI is pushing data center design into new territory. As training clusters grow from dozens to hundreds, and now thousands, of accelerators, the network can no longer be treated as a background function. It has become one of the most critical factors determining overall system performance.
In these systems, accelerators are constantly exchanging intermediate data and synchronizing at every step. It's not just about moving data fast—it's about moving it predictably. Even small delays can ripple across the system, leaving expensive compute resources idle.
Figure 1. AI Accelerator Clusters Need Tightly Orchestrated Interconnects to Scale Efficiently
Traditional Ethernet was designed as a best-effort network — highly efficient at moving large volumes of data, but built on the assumption that occasional packet loss, retries, and variable latency are acceptable. This model works well for general-purpose cloud workloads, but breaks down for AI systems.
AI workloads are constrained by tail latency — the slowest packet in the system. Even if most data arrives quickly, a single delayed packet can stall the entire cluster, leaving expensive accelerators idle. Packet loss makes this worse: retransmissions introduce additional delay and variability, further impacting synchronization across the cluster. AI networks require lossless or near-lossless behavior and highly deterministic latency — far beyond what traditional Ethernet was designed to provide.
These inefficiencies compound at scale. Bandwidth waste and excessive retransmissions translate directly into higher power consumption, increased infrastructure cost, and reduced scalability. As clusters grow to hundreds of thousands of accelerators, this becomes unsustainable.
From a system perspective, the goal of a scale-up network is to make the network effectively disappear. Accelerators should behave as if they are part of a single, unified machine, with communication that feels local rather than remote. Achieving this requires a network that is not only fast, but also predictable, efficient, and tightly integrated with the compute fabric.
Ethernet does include congestion management mechanisms like PFC and ECN, but these were designed for general-purpose networking, not for the tightly coupled, latency-sensitive communication patterns in AI scale-up systems. Several limitations emerge at scale:
These challenges have led some designs toward proprietary interconnects—but those come with tradeoffs: fragmented ecosystems, limited flexibility, and higher long-term risk.
Figure 2. ESUN vs Standard Ethernet: Architectural Enhancements
Instead of replacing Ethernet, the industry is evolving it. ESUN (Ethernet for Scale-Up Networks) adapts Ethernet with targeted enhancements for AI workloads while preserving what makes Ethernet valuable in the first place:
The enhancements themselves are focused and targeted:
This approach lets AI systems benefit from Ethernet's scale, interoperability, and ecosystem without being constrained by its original assumptions.
One of the most immediate improvements in ESUN is at the packet level.
Traditional Ethernet relies on IP headers that add 28–48 bytes of overhead per packet. ESUN replaces this with a compact, 4-byte header — significantly improving packet efficiency and throughput in environments where communication is frequent and latency-sensitive.
Figure 3. IP Header v/s E-SUN header
Importantly, the essential capabilities are still there. Traffic prioritization, congestion signaling, and load balancing are all preserved, just implemented more efficiently.
ESUN also introduces two key mechanisms to make communication more deterministic:
Together, these mechanisms enable ESUN to deliver lossless, low-latency, and bandwidth-efficient communication — while preserving the openness and interoperability of Ethernet.
To help accelerate adoption, Synopsys is introducing the industry’s first complete ESUN IP solution spanning both Layer 1 and Layer 2, a fully integrated, pre-verified stack designed to work together out of the box.
The solution includes:
By delivering a complete IP solution, Synopsys removes much of the integration burden—and the risk—that typically comes with assembling high-performance networking systems.
End-to-End Integration
All components — MAC, PCS, PHY, and UE/ESUN Link Layer Controller — are co-designed and validated as a single system. By pre-verifying the complete L1/L2 stack, Synopsys eliminates the integration risk that comes with sourcing and stitching together IP from multiple vendors.
One Architecture for Scale-Up and Scale-Out
The same IP can be used for both ESUN (scale-up) and Ultra Ethernet (scale-out). This simplifies system design and allows teams to reuse a common architecture across different parts of the data center.
Optimized for Real-World Constraints
The solution is optimized for low power consumption, high performance, and area efficiency, with multi-rate configurations from 100G to 1.6T and flexible FEC modes to match diverse system requirements.
A Proven PHY Foundation
At the core is the Synopsys 224G PHY, available across multiple advanced process nodes. Synopsys has publicly demonstrated 224G silicon interoperability since October 2022, with over 30 multi-vendor demonstrations at ECOC and OFC, delivering zero post-FEC errors over channels up to 45 dB of loss. This gives designers confidence that what works in theory will hold up in silicon.
As AI continues to scale, the expectations on the network will only increase. It needs to be fast—but also predictable, efficient, and scalable.
ESUN, backed by over 175 companies and the industry's leading hyperscalers and silicon vendors, represents the convergence of Ethernet toward this reality. Synopsys is committed to enabling this transition with the industry's first complete ESUN IP solution.
The ESUN IP solution is part of a larger Synopsys HPC IP portfolio designed to support the full AI system stack:
Together, these technologies provide the foundation for building next-generation AI systems—where compute, memory, and connectivity all scale together.