Delivering on the Promise of Guaranteed Isochronous Traffic in USB 3.1

By Matthew Myers, USB Hardware Engineer, Synopsys

Introduction

Over the years, certain communication protocols have included a special class of traffic, called isochronous, that provides Quality of Service (QoS). Unlike file transfers in computer systems, isochronous data transfers do not need guaranteed delivery, but they do need guaranteed service opportunities. Typical examples of isochronous traffic are audio and video streams where the synchronization of the two must be strictly maintained and the latency must be kept to a minimum, but a slight glitch in either one does not result in devastating data corruption.

Since its introduction in 1996, the USB specification has defined the isochronous transfer type as providing QoS by having "guaranteed bandwidth" with "bounded latency." Because the host performs all of the transfer scheduling in USB, most of the responsibility for delivering on these promises depends on the host in ways that span software, hardware, and the USB protocol.

However, each time the USB protocol is enhanced with a new speed but keeps backward compatibility, modifications are often necessary in hosts and hubs. For example, when the USB 2.0 specification added "High Speed" to existing "Low/Full Speed" devices, the specification also changed the way isochronous transfers were scheduled to maintain the guaranteed bandwidth and latency. USB 2.0 hubs needed a "Transaction Translator" to bridge the gap between a High Speed upstream port and a Low/Full Speed downstream port. Hosts were forced to split their requests to Low/Full Speed devices behind these hubs in order to not affect the bandwidth and latency requirements of isochronous endpoints on High Speed devices.

The same is true with 10G USB 3.1, which offers the first USB specification where two separate speeds of devices will coexist in the same topology on the USB 3.x wires. As shown in Figure 1, significant changes are needed to continue the promise of guaranteed bandwidth and bounded latency, including the following:

  • Hub buffering and prioritization rules which can result in packet reordering
  • Hosts, hubs, and devices include the transfer type in packet headers
  • Additional link layer credit type (Type 1 vs. Type 2) to support the separation of asynchronous (bulk and control) and periodic (interrupt and isochronous) traffic
  • Pipelined acknowledgements for isochronous IN traffic

In this article, we explain how each piece of the USB 3.1 topology needs to be modified to support isochronous traffic in a mixed speed environment.

Figure 1: Scope of USB 3.1 isochronous changes in the topology 

Hub Arbitration Rules Prioritize Isochronous

The article "Achieving 10 Gbps Data Rates in USB 3.1 Using Multiple INs and Hub Payload Buffering" described the reason for two of the major enhancements in the USB 3.1 protocol. With the addition of multiple IN transactions, multiple OUT transactions (which are already allowed in USB 3.0), and hub payload buffering to deal with rate matching, USB 3.1 hubs will find themselves in situations where they have to choose between different packets to transmit on a port (upstream or downstream).

In Figure 2, the hub has an isochronous data packet ready to transmit upstream towards the host as well as a bulk packet. Without proper rules in the hub about which packets have higher priority, it is possible that the bulk packet would end up blocking and delaying the isochronous packet, interfering with the bounded latency guarantee.

Figure 2: Host starts multiple INs; Hub needs to choose packet to transmit upstream 

In USB 3.1, if a hub port has multiple packets buffered up for transmission, it is required to service the packets using the following priority order to choose which one to transmit first:

  1. Any Transaction Packet (TP) (which includes ACK, PING, PING_RESPONSE, STALL, NRDY, ERDY) a. This rule primarily allows ACK TPs to be delivered quickly to devices so that they may start transmitting data upstream. It also allows PING/PING_RESPONSE to bypass other traffic which is important because these TPs are part of the isochronous protocol.
  2. Any Data Packet (DP) which is assigned to an interrupt or isochronous endpoint a. This rule prioritizes periodic data packets over asynchronous data packets to satisfy the bounded latency guarantee.
  3. Any DP which is assigned to a bulk or control endpoint a. This rule uses a weighted round-robin mechanism between multiple bulk/control packets using a new “Arbitration Weight” packet field.

With the arbitration rules, the hub picks the isochronous packet first, resulting in one of the sources of reordering that can now occur in USB 3.1.

Hosts, Hubs, and Devices Include the Transfer Type in Packet Headers

To support the distinction between priority rules #2 and #3 above, packet headers (TPs and DPs) in USB 3.1 now include the transfer type in a previously reserved region of the packet format. This transfer type is set to control, bulk, isochronous, or interrupt, and it is produced by 3.1 hosts and devices for use by the hub to for prioritization.

For these priority rules, there is one backward compatibility problem when a 3.0 device is connected to a 3.1 hub. For an IN transaction, the 3.0 device will be using the old packet format which has no transfer type when it transmits its DP. The hub would be unable to determine the priority of that packet versus packets from 3.1 devices, as shown in Figure 3.

USB 3.1 hubs have a new requirement to store the transfer type generated in the ACK TP from the host in a transfer type table so that they can modify the DP from the device and insert the correct transfer type. In this way, the arbitration rules work with existing 3.0 devices.

Figure 3: Transfer type tables in USB 3.1 hubs support USB 3.0 devices 

Hub Arbitration Rules Require Extra Buffering that is Randomly Accessible

Interrupt/isochronous DPs have a higher absolute priority than bulk/control DPs, so this implies that hubs must have separate buffering for these two classes of packets. In fact, a hub needs separate buffering for each of the two classes per port in the following amounts (each packet size is assumed to be 1024 bytes): 

Table 1: Packet buffer requirements for USB 3.1 hubs

This is significantly more buffer space than USB 3.0 hubs which have an elasticity buffer that may be capable of storing 1 to 3 packets. Because TPs have a higher priority than interrupt/isochronous DPs, this buffer needs to be randomly accessible instead of a FIFO. 

Separate Link Credits for Isochronous

The link layer of USB 3.0 defined the terminology of “link credits.” The link layer protocol ensures the delivery of packet headers from the transmitter to the receiver using the credits as a backpressure mechanism to report whether the receiver had enough buffer space to accept another packet header. The receiver must be able to buffer at least 4 packet headers.

In USB 3.1, even though the hub now has separate buffers for isochronous packets to allow them to bypass bulk packets, there is still another obstacle to providing guaranteed bandwidth and latency. To demonstrate the problem, imagine a scenario where the upstream port of a hub is transmitting bulk packets toward the host. If the host stops returning link credits for a period of time, the port will be unable to transmit any more packets until the host releases a credit. Now if a device on another port transmits an isochronous packet upstream (Figure 4), it cannot continue to the host even though the arbitration rules say that the port must choose the isochronous packet over the bulk packet.

Figure 4: Isoc packet is blocked for upstream transmission due to lack of credits 

To solve this piece of the puzzle, the USB 3.1 specification adds another link credit type, which means the link layer needs buffering for 4 more header packets. Traffic is separated into Type 1 and Type 2:

  • Type 1 traffic class applies to isochronous and interrupt DPs, all TPs, Isochronous Timestamp Packets (ITPs), and Link Management Packets (LMPs)
  • Type 2 traffic class applies to control and bulk DPs

As shown in Figure 5, asynchronous traffic that consumes all four of the Type 2 credits on a link cannot block the transmission of isochronous traffic which uses separate Type 1 credits.

Figure 5: Isoc packet can be transmitted upstream because of separate link credits 

Pipelined Isochronous IN Transactions

To explain the need for this final new isochronous feature in USB 3.1, take the example of an isochronous IN endpoint on a USB 3.0 device that can return 4 packets per microframe. When the entire topology is running at USB 3.0 speeds and hubs are not buffering data, it is relatively efficient for the host to request the 4 packets, 2 at a time. The only delay is the time through the hub, as shown in Figure 6. 

Figure 6: Isochronous delay through hub in USB 3.0

Given that hosts will issue multiple IN transactions and hubs will buffer payload data, there is a performance problem with isochronous transactions that interferes with the "bounded latency" guarantee when more than one device is connected behind a USB 3.1 hub. Figure 7 depicts the system inefficiencies of plugging in a USB 3.0 device into a USB 3.1 hub. The host performs multiple IN transactions to an isochronous endpoint on Device 0 and a bulk endpoint on Device 1. 

Figure 7: Isochronous delay through hub in USB 3.1 (without Pipelined Isoch IN)

In Figure 7, the long, inefficient delays on the host’s and device’s links occur for two reasons:

  1. Delay upstream: The hub is occasionally in the process of transmitting Device 1’s packets upstream so the isochronous packets from Device 0 are delayed until this transmission is complete. This, in turn, delays the next ACK requesting more data and the device is unable to transmit any more packets until it receives the ACK.
  2. Delay downstream: The delay through the hub for the second ACK to propagate from the host to the device.

With more devices in the topology and a larger hub depth, these inefficiencies add up. Each hub can add up to 400ns of delay in each direction. In a five tier system, the propagation of the ACK could incur 2us of downstream delay, and the propagation of the DPs could incur 2us of upstream delay. In addition, it is possible that each level of hub is already transmitting a packet upstream which would cause an upstream delay of up to 5us (a 1K packet takes about 1us to transmit on the 10G link). All told, a device could see as much as 8us of delay between ACKs which interferes dramatically with the bounded latency guarantee as seen in Figure 8.

Figure 8: Delay between ACKs interferes with bounded latency guarantee 

Therefore, USB 3.1 introduces the "Pipelined Isochronous IN" feature to remedy this problem. It means that the host can send another ACK to the isochronous endpoint, requesting more data preemptively, before the device has returned all the packets from the previous request (Figure 9). The delay between packets at the host is reduced, as well as the delay between transmissions by the device. 

Figure 9: Isochronous delay through hub in USB 3.1 (with Pipelined Isoch IN) 

There are some obvious restrictions on this behavior:

  • The host can't request more packets than the "Max Burst Size" or "Bytes Per Interval" of the endpoint (these are values that the endpoint returns as part of its descriptor when it is first being configured)
  • Once the device sets its Last Packet Flag (lpf) in a DP, the host must stop sending IN ACK TP's. Lpf is the indication that the device does not have any more data to transmit in the interval.

Relaxation of Isochronous Bursting Rules

The USB 3.0 specification required hosts to either perform a single burst or split isochronous transfers into smaller bursts of 2, 4, or 8 DPs followed by a final burst with the remaining DPs for that service interval. The classic example from the spec is the list of possibilities of how a host can burst 11 packets to or from a device:

  • One burst of 11 packets
  • One burst of 8 packets followed by a burst of 3
  • Two bursts of 4 packets followed by a burst of 3
  • Five bursts of 2 packets followed by a burst of 1
  • Eleven bursts of 1 packet

This restriction was intended to aid with host and devices in scheduling isochronous transfers to reduce the set of possibilities into bursting powers of two. However, this artificial restriction did not prove to be very useful, so it was removed in USB 3.1. Now, hosts can decide what bursting pattern is most efficient based on the overall isochronous schedule and the topology of the bus.

Conclusion

In USB 3.0, responsibility for isochronous bandwidth and latency guarantees was primarily left to the host controller because the bus acted like a single lane of traffic with each transaction completing sequentially. The addition of multiple INs and hub buffering in USB 3.1 requires the creation of a virtually separate bus for isochronous traffic that extends from the host, through the hubs, to the device. This virtually separate bus is supported by pipelined isochronous IN transactions, hub buffering and arbitration rules, and separate link credits. This cascading set of requirements satisfies the original promise of "guaranteed bandwidth" with "bounded latency” that was made 18 years ago.