AI scaling has exposed a simple truth. Raw compute performance alone does not scale AI. As models grow larger and clusters grow denser, the fabric that connects accelerators becomes just as important as the accelerators themselves. The recently announced UALink 2.0 directly addresses this reality. Rather than treating the network as a passive transport, the 2.0 specification introduces four architectural enhancements that make the fabric itself an active participant in AI scale up.
With these changes, UALink 2.0 extends the accelerator interconnect defined in 1.0 into an AI-aware fabric. The network is no longer just moving data between endpoints. It is actively helping the system compute. As AI continues to scale, interconnects will define system performance just as much as compute.
During training, each accelerator processes a different slice of data and computes how the model’s output should change to reduce error. These changes are captured as gradients. A gradient represents the direction and magnitude by which each model parameter should be updated to improve accuracy.
Every training step requires gradients to be exchanged and combined across dozens or hundreds of accelerators so that the model remains consistent. Gradients are inherently collective in nature because no single accelerator can update the model in isolation.
Collectives are communication operations where many accelerators participate together to move or combine data in a coordinated way. Instead of one accelerator talking to another one to one, a collective involves one to many, many to one, or many to many communications with well-defined behavior.
In modern training workloads, a large fraction of network traffic comes from collective operations. When these collective operations are inefficient, training slows down regardless of how fast the compute is.
Four mainly deployed collectives are:
Broadcast: One accelerator has some data and needs to send the same data to all other accelerators. Think of model parameters being shared at the start of a training step.
Reduce: All accelerators have a value and those values are combined using an operation like sum, max, or min. The result ends up at one destination. For example summing partial gradients.
All reduce: This is the most common in AI training. Each accelerator has a value, those values are reduced using an operation like sum, and the final result is sent back to every accelerator. This keeps all accelerators in sync.
Reduce scatter: The data is reduced across all accelerators, but each accelerator only receives a slice of the final result. This improves bandwidth efficiency for large tensors.
In UALink 1.0, collective operations are handled entirely in software. Accelerators exchange many point to point messages, while software libraries coordinate ordering, synchronization, and completion. This approach works, but it introduces avoidable latency and amplifies traffic across the fabric.
UALink 2.0 takes a different approach. It introduces In Network Collectives (INC), where the switches in the fabric understand collective operations and participate directly. Instead of every accelerator communicating with every other accelerator, the network can combine, replicate, and route data intelligently.
At a system level, this means the fabric itself becomes collective aware. Switches are no longer passive forwarding elements. They actively participate in broadcast, reduce, all reduce, and reduce scatter operations in a coordinated and deterministic way. The specification introduces collective primitives and block collectives that define how these operations are established, how data flows through the fabric, and how completion is tracked.
Switches maintain only the minimal state required to participate safely and efficiently, preserving determinism while avoiding unnecessary complexity. The result is lower latency, reduced traffic amplification, and significantly better scaling behavior as pod sizes grow.
Next major evolution in UALink 2.0 is the security model and protection features aims at providing confidentiality and optional integrity (including replay protection) for data exchanged between accelerators belonging to the same virtual pod. It is built for multi-tenant AI systems.
UALink 1.0 included encryption and authentication at the link level. This provided basic protection, but it did not fully address multi-tenant deployments where multiple users share the same physical fabric.
UALink 2.0 introduces a comprehensive, confidential computing model. The specification formalizes the concept of pods and virtual pods. A virtual pod represents a set of accelerators that belong to a single tenant. Each virtual pod has its own security context, including encryption keys and authentication state.
UALink 2.0 supports per virtual pod keying, key derivation, and key rotation enabling long running training jobs without static keys. Encryption and authentication are applied consistently across requests, responses, and collective traffic to ensure confidentiality and integrity.
Switches become part of the trusted computing base when they participate in collectives. The specification defines how switches are authenticated, how cryptographic keys are derived and rotated, and how isolation is enforced across tenants. It also allows switches to actively process encrypted traffic: accelerator data that is simply being forwarded remains untouched, while collective operation traffic between an accelerator and a switch can be securely decrypted, processed, and re encrypted as needed.
This is critical for cloud deployments, regulated environments, and any scenario where multiple workloads share infrastructure. Security is no longer an afterthought. It is designed into fabric from the beginning.
As systems grow larger, failures become normal. Links go down. Devices reset. Partial outages happen. UALink 2.0 addresses this reality. The specification strengthens error handling and isolation. It defines how failures are detected, how traffic is contained, and how recovery occurs without taking down the entire pod. This is especially important for collective operations, where partial failures must be handled carefully to avoid data corruption or deadlock.
UALink 2.0 improves resiliency by explicitly supporting multi path routing, strengthening failure isolation and recovery, making collective operations failure aware, and enabling controlled degradation through link folding and coordinated pod level recovery.
It also enables better bandwidth utilization and improved fault tolerance. Traffic can be distributed across paths while still preserving ordering and correctness guarantees. These capabilities are not optional at scale. They are required.
UALink 2.0 recognizes that a fabric is not just hardware. It is a system. The specification formalizes the role of a pod controller. This entity is responsible for topology discovery, configuration, partitioning into virtual pods, lifecycle management, and health monitoring.
By defining these concepts at the specification level, UALink reduces fragmentation and encourages interoperable implementations. Operators can reason about UALink systems in a consistent way across vendors.
This is an important step toward making UALink deployable at scale without excessive custom software.
UALink 2.0 preserves the strengths that made 1.0 successful. It builds a clean memory semantic model, a scalable switch-based architecture, and tight alignment with Ethernet physical layers. These fundamentals remain unchanged.
At the same time, UALink 2.0 adds what modern AI systems actually need. Hardware accelerated collectives that reflect real training workloads. Strong multi-tenant security designed for shared infrastructure. Better resiliency as a first order design goal. And clear manageability for large scale deployments.
UALink 2.0 delivers these advances in an open, vendor neutral way. It enables a truly interoperable, multi-vendor, AI first trusted compute fabric that scales with the needs of the industry.
Synopsys is deeply committed to advancing UALink as a foundational technology for AI scaleup systems. As an active member of the UALink Consortium and a contributor to the specification’s evolution, Synopsys brings decades of expertise in high-speed interface IP, security, and system level integration to support customers building scaleup architectures. This commitment is demonstrated through a complete, silicon-proven UALink IP solution—including controller, PHY, security, and verification IP—designed to meet the real-world demands of largescale AI deployments.