Deciding on FIFO Sizes When Implementing DW Digital Cores
Ralph Grundler, CAE Manager
Introduction
One of the most common support questions asked about DesignWare Cores is, "What FIFO size should I select for my design?" This question sounds simple at first glance but quickly one can realize the complexity of the question. It is a balance of the latency of the system, data bandwidth of the system bus, I/O protocol overhead and data bandwidth for the I/O protocol bus the user is connecting to, with the IP FIFO buffering in-between. Also, in every FIFO size decision the user needs to understand the trade off of size (gate count) compared to performance (throughput).
Compound this with the fact that the user of the core needs to make this hardware configuration decision very early in the design process and it sometimes becomes almost impossible to answer this question because the designer does yet not know all the details for the system (or the I/O protocol for that matter.) Since the cores are designed to accommodate all the system and product tradeoffs, the design choices for FIFOs may become overwhelming. This article explains the basic thought process and different strategies needed to determine the right FIFO sizes for the various DesignWare Cores.
Basic System Decisions
The first concept that needs to be established is the design goals
of the product. Hopefully your marketing team has given you this data, but
if not, you need to request it. Does the product need to support the maximum
bandwidth or minimal gate count? If gate count is the most important factor
then selecting the minimum FIFO size will suffice. Be sure to check the documentation
to make sure the core will work at that FIFO size. Sometimes the products provide
so much flexibility that they give the user the ability to configure the core
into a non-desirable state.
Selecting the Maximum FIFO size is done when nothing is known
about the system, protocol, or latencies, and the user would like to achieve
the best possible bandwidth in all cases of transmission of data. You can
always go back and reconfigure the FIFO size after you get some system level
simulations running to play with bandwidth results. You can also use the default
configuration to get to that point, but be aware it may not be useful for your
application. Typically the design needs something in between, and the user needs
to go a little deeper into the design constraints and limitations.
System Latencies
After the design goals have been addressed, the user then needs to understand
the system limitations. First the user needs to calculate a good estimate on
the system latency. These latencies could include, arbitration, bridges, memory
access, interrupts, etc. So system latency plus the buffing filling capability
of the system minus the protocol overhead should be less than or equal to the
protocol bandwidth desired, or
system latency + buffer fill - protocol overhead </= protocol bus bandwidth
This enables the protocol side of the core to empty the buffers no faster than
the system can fill the FIFOs avoiding underrun, or the protocol side can not
fill faster than the system can empty avoiding a FIFO overrun situation.
The next system detail the user needs to consider is the system
bus bandwidth. The bandwidth on the system side should be faster than the
protocol side unless of course you are not trying to achieve the maximum bandwidth
on the protocol bus.The faster the system bus, the less latency in filling
the FIFOs and transmitting on the protocol bus. If the system can fill the
FIFOs faster it can make up for some of the system latency.
Store and Forward vs. Cut-Through FIFOs
After these calculations have been reviewed, the user can
figure out what type of data flow the system can handle. "Store and Forward"
works as the name implies. The data is stored in a local buffer (FIFOs) in
the core until all of the data is received and the data integrity is checked
and then forwarded to memory by the system or a DMA controller. The clear advantage
of this type of data flow is the system does not need to handle data that is
corrupt, and if the system is not ready for the data it can wait in the local
buffer. The disadvantage of Store and Forward is the FIFOs need to be large
enough to hold the largest packet possible and hence some systems that are strict
on size will use a "Cut-Through" data flow. In the Cut-Through data
flow, the FIFOs can be much smaller but still need to be large enough to handle
system latencies. If the total system latencies are not understood it is safest
to use Store and Forward. (Note that this still does not guarantee maximum data
through put on the protocol side of the core.) Cut-Through does not allow the
core to check the data integrity before it is sent to the system so the system
needs to be able to discard or resend data if the core "signals" that
the data was corrupted. In general if the system bus is lightly loaded and
there are not a lot of system delays, the preferred data flow would be Cut-Through
and if the user does not know these things at the design time it is safer to
use Store and Forward. In some cases because of the restrictions in the protocol
side of the bus or standard software, the core does not give this selection
or the software does not allow it to be used.
FIFO Sizing for Particular Cores
Various DesignWare Cores databooks and SolvNet articles attempt
to take some of the guess work out of sizing the FIFOs, but with so many variables
it still takes some skill on the engineer's part. Since the issues are different
for each protocol, the issue of FIFO sizing or configurations is different for
each core.The following is a brief overview for configuring the FIFOs of
different products.
DesignWare Ethernet Core FIFO sizing is fairly conventional as there are separate
FIFOs for the TX and RX paths. The user would need to consider if they are
using a Store and Forward data flow or if there application can handle the extra
work of a Cut-Through data flow.
For the DesignWare USB 2.0 Host Controller, there are several
options for the customer to consider when planning buffering. The basic configuration
options are Config1 or Config2 and there are various options within each configuration.
Config1 is the smaller/lower performance configuration option and consists of
a single FIFO that can be configured for size and threshold levels. In general,
a common starting point is 512bytes or 1Kbytes with the threshold registers
set at the maximum size of the FIFO. More information on thresholding is at:
https://solvnet.synopsys.com/retrieve/016685.html.
Keep in mind thresholding levels can only be modified by the host controller
driver. If you are using standard drivers (from Microsoft for example) select
the threshold needed (becomes reset default) as you can not change the value
once the Host Driver has started. Config2 is for higher performance systems
and can buffer up to 4k of data and descriptors. This configuration can give
maximum performance but is at a higher cost in terms of gate count. See Appendix
D of the DesignWare USB 2.0 Host Controller Subsystem-AHB databook for more
details on the timing and advantages of each configuration.
When considering the configuration of the DesignWare
Hi-Speed USB OTG Controller FIFOs, the user needs to understand that
the configuration options have been designed to be very efficient with gates
by reusing logic. This makes the calculations more complex and requires the
user to better understand the USB transfers supported however it will reward
the user with a low gate count design. By sharing the FIFOs for the Device
and Host functions, the controller saves gates but also implies the user should
select these to FIFO sizes to be the same for maximum performance for that configuration.
Basically the number of device endpoints corresponds to the number of endpoints
the core can support in Host mode.
There are more details of this in the article at: https://solvnet.synopsys.com/retrieve/016804.html
The DesignWare IP for PCI Express Core has the similar options to other cores but offers one more configuration for receive buffering. Transmit buffering is only Bypass. The data from the application is transmitted directly on the PCIe link and a copy of the data is saved in the Retry Buffer as required by the specification. For receive buffering there are three options. Store and Forward is where the core will automatically return credits and drop error packets and the application can throttle data. For lower latency the user can select Cut-Through. In this configuration the application must discard the data in the case of an error and the application can do limited throttling of data. If Bypass (no FIFO) is selected the application needs to handle error packets and it can not throttle data, but this selection has the lowest receive latency. There are many other options in the PCI Express configuration that the influence the sizing for the FIFOs (also call RAMs, buffers or queues in the PCI Express databook) we give the user the option to autosize the FIFO sizing. Autosize is a good starting point to test the system. Users can always overwrite these suggested values to their own selections.
FIFO Sizing Conclusions
The FIFO sizing of the design is an important design consideration
with choices between sizing, system design and data flow. Please take the time
to consider this configuration option carefully. The configuration options
are powerful and in the end you have created verilog that can only be modified
by rerunning the configuration. Sometimes the cores give the user the option
to dynamically resize or reassign the memory through the software but the physical
memory size will still stay the same. Keep in mind that once the chip is made,
the memory size can not be physically changed, bigger or smaller. This may
sound obvious but is often overlooked in the hurry to tapeout. So spend the
time to understand the system, the software being used and I/O protocol to make
sure the design meets the original goals of the product.
|