The PCIe configuration space has traditionally been implemented in simple flip-flop-based registers. This is a good fit due to the potential for a PCIe device to have six or so address decoders, various control bits, numerous error and other status bits – all of which operate completely independently.
Flip-flop-based address decoders minimize latency, while flop-flop-based control and status registers can be routed directly to/from the relevant logic, all simplifying the designer’s work and making for straightforward synthesis. Unfortunately, as the number of VFs increases, and as the number of PCIe capabilities per VF increases (particularly register-heavy features such as AER and MSI-X), the gate cost of a register implementation can become burdensome. Adding a couple of hundred fully featured VFs to a PCIe controller could add as many as 2 to 3 million gates to a design!
Since the SR-IOV specification was written to support over 64 thousand VFs in a single device, the PCI-SIG put a lot of effort into enabling implementations other than directly mapping to flip-flops. Wherever possible, control and status functionality for all VFs was consolidated in their associated PF. All the PCI Express link-level controls fall into this category – as one VF shouldn’t take down the link it shares with other VFs. Only controls that absolutely must be implemented individually for each VF (such as Bus Master Enable) are replicated. Address decoding is greatly simplified by contiguously locating all the VF copies of each PF region – so only two additional decoders per region are required for any number of VFs, rather than needing an additional decoder per VF (Figure 2). Because of that effort,most of the storage for a VF can be fairly slow and high latency in comparison to the same-clock-cycle access time for directly mapped flip-flops.