As shown in Figure 1, a typical memory channel consists of a DDR controller that interfaces with an SoC interconnect, such as an AXI interconnect. The DDR controller converts the incoming AXI transactions from the interconnect into DDR commands, and schedules the commands in an optimal fashion to be sent to the DDR memory through the PHY and the memory channel. The DDR PHY is a conduit between the controller and the DDR memory and plays a critical role for transferring the data reliably without any bit-errors between the controller and the memory. To ensure the DDR channel robustness during mission mode, the memory interface on the SoC and the DRAM are trained during initialization after power-up. At a high level, the training involves sending various patterns to the memory and exercising the channel by varying time delays and voltages for both Reads (RD) and Writes (WR), and then finding the optimal settings in both time/voltage domains for each of the RD/WR parameters. This is applicable to both command/address and data lanes, depending on the DDR standard and operation speed. Hence, one of the key requirements for a robust memory system is to train the DDR channel such that the channel has optimal signal integrity in both the time and voltage domains. As a result, the resulting data eyes at both the receivers in the memory interface on the SoC and those in the DRAM can handle the peak-traffic during mission mode.
There are three different ways a DDR memory interface can be trained:
- By the core CPU through software (SW) or firmware (FW)
- By the PHY or controller using dedicated hardware (HW) state machines
- By the PHY using FW code
The first option (i.e., CPU taking the responsibility to train the memory interface for every channel through SW or FW code) is very time-consuming since it takes away the precious CPU cycles for initializing other components.
The second option, although faster than the first, involves committing the training algorithms to HW state machines. Hence, it doesn’t have the flexibility that the other two options have when it comes to field-upgradability. Additionally, fixing any bugs in the HW often involves time and money to re-spin the SoC. This option is also design-intensive and consumes more area and power while supporting multiple DDR standards, since each of the standards may require its own custom algorithms and implementation. Finally, supporting complex data-patterns may not be feasible from the area and power perspective. Hence, the training patterns typically implemented in this scheme are often traditional, simpler patterns that toggle at a fixed frequency and do not excite many signal integrity affects such as cross-talk, inter-symbol interference, and jitter to the worst-case degree.
The third option, i.e. training by the PHY using FW code, is the most robust of all the three.