Enabling the Next Generation of AI Infrastructure with Ethernet for Scale-Up Networking (ESUN)

Abhinav Kothiala

Jun 30, 2026 / 6 min read

Subscribe to Our Blog
Thanks for subscribing to the blog! You’ll receive your welcome email shortly.

AI is pushing data center design into new territory. As training clusters grow from dozens to hundreds, and now thousands, of accelerators, the network can no longer be treated as a background function. It has become one of the most critical factors determining overall system performance.

In these systems, accelerators are constantly exchanging intermediate data and synchronizing at every step. It's not just about moving data fast—it's about moving it predictably. Even small delays can ripple across the system, leaving expensive compute resources idle.

Figure 1. AI Accelerator Clusters Need Tightly Orchestrated Interconnects to Scale Efficiently 

The AI Networking Challenge: It Only Takes One Slow Packet

Traditional Ethernet was designed as a best-effort network — highly efficient at moving large volumes of data, but built on the assumption that occasional packet loss, retries, and variable latency are acceptable. This model works well for general-purpose cloud workloads, but breaks down for AI systems.

AI workloads are constrained by tail latency — the slowest packet in the system. Even if most data arrives quickly, a single delayed packet can stall the entire cluster, leaving expensive accelerators idle. Packet loss makes this worse: retransmissions introduce additional delay and variability, further impacting synchronization across the cluster. AI networks require lossless or near-lossless behavior and highly deterministic latency — far beyond what traditional Ethernet was designed to provide.

These inefficiencies compound at scale. Bandwidth waste and excessive retransmissions translate directly into higher power consumption, increased infrastructure cost, and reduced scalability. As clusters grow to hundreds of thousands of accelerators, this becomes unsustainable.

From a system perspective, the goal of a scale-up network is to make the network effectively disappear. Accelerators should behave as if they are part of a single, unified machine, with communication that feels local rather than remote. Achieving this requires a network that is not only fast, but also predictable, efficient, and tightly integrated with the compute fabric.

Why Traditional Ethernet Isn’t Enough for Scale-Up AI

Ethernet does include congestion management mechanisms like PFC and ECN, but these were designed for general-purpose networking, not for the tightly coupled, latency-sensitive communication patterns in AI scale-up systems. Several limitations emerge at scale:

  • Protocol overhead: Standard Ethernet relies on IP headers that add 28–48 bytes per packet. In scale-up environments with frequent, small messages, this overhead directly reduces effective throughput.
  • No link-level error recovery: Beyond FEC, Ethernet has no link-level recovery mechanism. Residual errors must be resolved by upper-layer protocols, introducing significant latency penalties.
  • Coarse congestion management: PFC operates as a blunt, link-wide pause mechanism, lacking the granularity to manage multiple traffic classes.

These challenges have led some designs toward proprietary interconnects—but those come with tradeoffs: fragmented ecosystems, limited flexibility, and higher long-term risk.

Figure 2. ESUN vs Standard Ethernet: Architectural Enhancements

ESUN: Evolving Ethernet for AI Scale-Up

Instead of replacing Ethernet, the industry is evolving it. ESUN (Ethernet for Scale-Up Networks) adapts Ethernet with targeted enhancements for AI workloads while preserving what makes Ethernet valuable in the first place:

  • Operational familiarity — network teams can leverage existing Ethernet expertise, tooling, and management practices rather than adopting an entirely new fabric
  • Infrastructure reuse — scale-up and scale-out traffic share the same Ethernet switching architecture and physical infrastructure, reducing cost and complexity
  • One unified fabric — Ethernet becomes the common connective tissue for both scale-up (within racks and pods) and scale-out (across the data center) as AI workloads evolve

The enhancements themselves are focused and targeted:

  • The physical layer remains standard Ethernet
  • The link layer adds lossless reliability and fine-grained congestion control
  • The network layer is streamlined to reduce overhead

This approach lets AI systems benefit from Ethernet's scale, interoperability, and ecosystem without being constrained by its original assumptions.

A Smaller Header That Makes a Big Difference

One of the most immediate improvements in ESUN is at the packet level.

Traditional Ethernet relies on IP headers that add 28–48 bytes of overhead per packet. ESUN replaces this with a compact, 4-byte header — significantly improving packet efficiency and throughput in environments where communication is frequent and latency-sensitive.

Figure 3. IP Header v/s E-SUN header

Importantly, the essential capabilities are still there. Traffic prioritization, congestion signaling, and load balancing are all preserved, just implemented more efficiently.

Designed for Predictable, Lossless Communication

ESUN also introduces two key mechanisms to make communication more deterministic:

  • Link-Level Retry (LLR) detects and recovers from link errors locally at the data link layer, rather than relying on higher layer transport protocols that take orders of magnitude longer. This significantly reduces tail latency and eliminates costly end-to-end retransmission.
  • Credit-Based Flow Control (CBFC) replaces the coarse, link-wide pause behavior of PFC with fine-grained, per virtual channel congestion management. Senders transmit only when the receiver has confirmed buffer capacity, preventing overflow and enabling lossless operation without the head-of-line blocking associated with PFC.

Together, these mechanisms enable ESUN to deliver lossless, low-latency, and bandwidth-efficient communication — while preserving the openness and interoperability of Ethernet. 

Introducing the Industry’s First Complete ESUN IP Solution

To help accelerate adoption, Synopsys is introducing the industry’s first complete ESUN IP solution spanning both Layer 1 and Layer 2, a fully integrated, pre-verified stack designed to work together out of the box.

SNPS-131300-ESUN-Launch-Blog-Image

The solution includes:

  • 1.6T multi-rate Ethernet MAC — supporting up to four independent 400G channels configurable for 1×1.6T, 2×800G, 4×400G, and 4×200G modes over 224G SerDes
  • 1.6T PCS with RS-FEC — with RS544 FEC for robust error correction and RS272 FEC for low-latency operation
  • UE/ESUN Link Layer Controller — implementing LLR and CBFC for lossless reliability and fine-grained congestion management
  • Silicon-proven 224G PHY — optimized for low power, latency, and signal integrity
  • Comprehensive Verification IP and system-level validation support

By delivering a complete IP solution, Synopsys removes much of the integration burden—and the risk—that typically comes with assembling high-performance networking systems.

Built for Integration, Performance, and Scale

End-to-End Integration

All components — MAC, PCS, PHY, and UE/ESUN Link Layer Controller — are co-designed and validated as a single system. By pre-verifying the complete L1/L2 stack, Synopsys eliminates the integration risk that comes with sourcing and stitching together IP from multiple vendors.

One Architecture for Scale-Up and Scale-Out

The same IP can be used for both ESUN (scale-up) and Ultra Ethernet (scale-out). This simplifies system design and allows teams to reuse a common architecture across different parts of the data center.

Optimized for Real-World Constraints

The solution is optimized for low power consumption, high performance, and area efficiency, with multi-rate configurations from 100G to 1.6T and flexible FEC modes to match diverse system requirements.

A Proven PHY Foundation

At the core is the Synopsys 224G PHY, available across multiple advanced process nodes. Synopsys has publicly demonstrated 224G silicon interoperability since October 2022, with over 30 multi-vendor demonstrations at ECOC and OFC, delivering zero post-FEC errors over channels up to 45 dB of loss. This gives designers confidence that what works in theory will hold up in silicon.

Looking Ahead: Networking Built for AI

As AI continues to scale, the expectations on the network will only increase. It needs to be fast—but also predictable, efficient, and scalable.

ESUN, backed by over 175 companies and the industry's leading hyperscalers and silicon vendors, represents the convergence of Ethernet toward this reality. Synopsys is committed to enabling this transition with the industry's first complete ESUN IP solution.

A Broader HPC Connectivity Platform

The ESUN IP solution is part of a larger Synopsys HPC IP portfolio designed to support the full AI system stack:

Together, these technologies provide the foundation for building next-generation AI systems—where compute, memory, and connectivity all scale together.

Learn more about Synopsys ESUN IP in our datasheet.

Continue Reading

ASK SYNOPSYS
BETA
Ask Synopsys BETA This experience is in beta mode. Please double check responses for accuracy.

End Chat

Closing this window clears your chat history and ends your session. Are you sure you want to end this chat?