Synopsys IP Technical Bulletin Article

DesignWare Mobile Storage Host Controller Core: Handling Timing Requirement for SD3.0 Cards

P.K. Venkataraghavan
Vishwanath Kakarla
Venkata Giri Kumar P
Devashish Dutta

This article describes the timing requirement needed for interfacing the DesignWare Mobile Storage (DWC_mobile_storage) Host Controller core (version 2.20a and version 2.30a) to SD cards that have different modes of operation. It explains how an external clock multiplexor structure is needed to meet the timing requirements across the various card operating modes described in the SD Memory card specification Version 3.0.

Phase shift values must be determined after place and route and then used for back annotation. This article describes the structure with typical delays at a 65nm process technology. The I/O pad delays should be kept to a minimum in order to reduce stress on timing at both the host controller and card interface.

Timing Requirements for SD 3.0 cards
Table 1 below shows the input timing requirements that act as constraints for the Mobile Storage Host Controller output path to the SD 3.0 card.

MODEMaximum Freq of OperationHold Time RequirementSetup Time Requirement
SDR104200Mhz.8ns1.4ns
SDR50100Mhz.8ns3ns
DDR50(CMD line)50Mhz.8ns6ns
DDR50(DAT line)50Mhz.8ns3ns
SDR2550Mhz2ns6ns
SDR1225Mhz5ns5ns
Identification Mode400Khz5ns5ns
Table-1: SD 3.0 specification for card input timings

Table 2 below shows the card output timing requirements that act as constraints for the Mobile Storage Host Controller input path from the card.

MODEMaximum Freq of OperationMaximum Card Output Delay
SDR104200Mhz10
SDR50100Mhz7.5
DDR50(CMD)50Mhz13.7
DDR50(DAT)50Mhz7
SDR2550Mhz14
SDR1225Mhz14
Identification Mode400Khz50
Table-2: Card output delay timing requirements

For the SDR104 mode, the standard specifies a 2 Unit Interval (UI) as the maximum card delay before tuning. The post tuning timing uncertainty that is due to temperature variation is specified as 1.9ns (350ps to 1550ps) where the UI is one bit nominal time, SDCLOCK nominal period.

The tuning to resolve the 2UI ambiguity will be done with the new CMD19 command (a tuning command to adjust the host sampling clock). In Table-2, this 2UI is listed as 10ns as one card clock period (one UI) and is equivalent to 5ns. Since 2UI is the worst case ambiguity compared to 1.5ns ambiguity after tuning, 2UI is taken as the maximum card output delay in Table 2.

To use the SDR104 mode, the user must select the FIFO size based on the number of blocks to be used. If one block of data is to be read or written, then the FIFO size should be at least 512 bytes. If four blocks (512 bytes * 4) are to be used, then the user must select a FIFO depth of at least 2K bytes while configuring the DWC_mobile_storage host controller. In these circumstances, for the SDR104 mode, the maximum number of blocks that can be programmed in one transfer are:

  • 64 when the AHB data width is selected as 64
  • 32 when the AHB data width is selected as 32
This selection must be done during the configuration of the DWC_mobile_storage host controller.
Guidelines for Using the DesignWare Mobile Storage Host Controller in SDR104 Mode
According to the SD Memory Card Specification, Version 3.0, the SDR104 mode supports Variable Output Delay on CMD and Data lines with a delay from 0 to 2 Unit Intervals (UI)-where 1 UI equals one card clock period. Due to this variable delay, it is recommended that the host should not stop the clock during the data phase; that is, the clock should be stopped between the blocks of data only.

The following RTL configuration requirements and programming flow for software drivers should be used in order to use the DWC_mobile_storage host controller in SDR104 mode applications:

  • FIFO size should be selected as multiple of 512 bytes at time of RTL configuration.
  • Program BYTCNT register to value equal to multiple of 512 bytes and less than or equal to FIFO size.
  • Program BLKSIZ register to 512-block length fixed to 512 in SDR104 mode.
  • If transfer size is more than FIFO size, then it should be split into multiple transfers-size of each transfer can be equal to FIFO size; a new transfer is programmed after data done is received for previous transfer.
Clock Requirements and Recommendations

There are several clock requirements and recommendations when interfacing the DWC_mobile_storage host controller with cards.

The DWC_mobile_storage host controller uses the following clocks to achieve a reliable communication with the interfaced cards:

  • cclk_in - Clock the logic in the Card Interface Unit (CIU) clock domain.
  • cclk_in_drv - Has the same frequency as cclk_in, but is phase shifted. Satisfies the minimum hold time requirement of 5 ns at the input to the cards while operating in SDR12 or identification modes.
  • cclk_in_sample - Has the same frequency as cclk_in, but is phase shifted. Provides a "best sampling window" for the DWC_mobile_storage host controller while reading data from the card. Has the same frequency as cclk_in, but is phase shifted. The need for cclk_in_drv is to satisfy a minimum hold time requirement of 5ns at the input to the cards while operating in the SDR12 or Identification modes.
Clock Generation Recommendations

The following are recommendations for clock generation:

  • The cclk_in frequency input to the DWC_mobile_storage core can be switched, depending on the data rate. For example, if the interfaced card is communicating in the SDR50 mode, then cclk_in = 50 Mhz; if the interfaced card is communicating in the SDR104 mode, then cclk_in = 200 Mhz.
  • You should ensure that there are no glitches in the clock inputs while switching clock frequencies. The cclk_in_drv and cclk_in _sampkle_clock frequrencies mnyst switch in relation to cclk_in frequencies.
  • The cclk_in_drv and cclk_in_sample clocks are phase-shifted versions of cclk_in. The value of the phase shift should be selectable, based on the enumerated data rate. The phase shift can have a resolution of 90 degrees with respect to the cclk_in clock period.
You should instantiate the above structure described outside the DWC_mobile_storage host controller. The clock speed for cclk_in and the phase shifts for cclk_in_drv and cclk_in_sample should be changed according to the mode of operation-that is, the different data rates-chosen at the end of enumeration. The system software must perform the frequency switch based on the enumerated data speeds. The structure described above needs to be instantiated outside the DWC_mobile_storage host controller The clock speed for cclk_in and the phase shifts for cclk_in_drv & cclk_in_sample will be changed as per the mode of operation (i.e. different data rates) chosen at the end of enumeration.

The system software needs to perform the frequency switch based on the enumerated data speeds.

Simulation-based Solution Analysis for the Above Recommendation

The following assumptions pertain to the timing and clocking recommendations:

  • The DWC_mobile_storage host controller in the design is instantiated with one port.
  • The following delays are assumed:
    • Host-controller output path:
      • Delay_O = cclk_in to cclk_out delay = 1.4ns (Typical value)
    • Host-controller input path:
      • tODLY = cclk_out to cdata_in delay = max value (specified in Table 2)
      • Delay_I = IO_pad delay + routing delays within host controller = 2.35 ns (typical value)
      • Delay_S = Delay_O + Delay_I = 3.75 ns
      • Delay(total turnaround) between Host Controller output and input = tODLY + Delay_S (Delay_O + Delay_I)
  • Synthesis is done in 65nm; setup and hold times for the flip-flops within the DWC_mobile_storage host controller are assumed to be 1 ns.
  • Figure 1 below illustrates how the DWC_mobile_storage host controller and card delays relate to each other:

    Figure 1 Clock frequency set at 50 MHz or 200 MHz, depending on card type
DesignWare Mobile Storage Host Controller Output Path Recommendation - Test Results
Table-3 lists the simulation-based test results for the DWC_mobile_storage host controller output paths. All operating modes have at least one phase shift value that can be used to communicate with the card.

MODE Freq of Operation Input Frequency (cclk_in) D
i
v
i
d
e
r
Hold time
delays to be
introduced
(clk_in_drv) through
phase shifters
Hold
time
delays
to be
checked
Resultant
Hold
Time
Setup
time
to be
checked
Resultant
Setup
Time
Final Effective Result Comments
SDR104 200Mhz 200Mhz 1 1.25ns .8ns 4.85 1.4ns 0.15 Fail  
2.5ns 1.1 3.9 Pass  
3.75ns 2.35 2.65 Pass This is suggested to keep the resultant hold time to be away from 1ns
SDR50 100Mhz 200Mhz 2 1.25ns .8ns 9.85 3ns 0.15 Fail  
2.5ns 1.1 8.9 Pass  
3.75ns 2.35 7.65 Pass This is suggested to keep the resultant hold time to be away from 1ns
DDR50
(CMD line)
50Mhz 50Mhz 1 5ns .8ns 3.6 6ns 16.4 Pass This is suggested to keep it same as DAT line
10ns 8.6 11.4 Pass  
15ns 13.6 6.4 Pass  
DDR50
(DAT line)
50Mhz 50Mhz 1 5ns .8ns 3.6 3ns 6.4 Pass Only 90 degree phase shift could match both setup and hold time
10ns 8.6 1.4 Fail  
15ns 3.6 6.4 Fail Start bit for half cycle
SDR25 50Mhz 50Mhz 1 5ns 2ns 3.6 6ns 16.4 Pass  
10ns 8.6 11.4 Pass Has better margin for both setup and hold time.
15ns 13.6 6.4 Pass  
SDR12 25Mhz 50Mhz 2 5ns 5ns 3.6 5ns 36.4 Fail  
10ns 8.6 31.4 Pass  
15ns 13.6 26.4 Pass Has better margin for both setup and hold time.
Identification Mode 400Khz 50Mhz 125 5ns 5ns 3.6 5ns 2516.4 Fail  
10ns 8.6 2511.4 Pass  
15ns 13.6 2506.4 Pass Has better margin for hold time. Setup time is anyways very large.
Table-3: Simulation-based test results for Host controller output path

Dark Green Pass: Data integrity results with best timings for both setup and hold
Light Green Pass: Hold tome or Setup time pass independently, does not guarantee both
Red: Margins below 1ns

DesignWare Mobile Storage Host Controller Input Path Solution - Test Results
Table-4 below lists the simulation results for the DWC_mobile_storage host controller input paths. All operating modes have at least one phase shift value that can be used to communicate with the card.

MODE Freq of Operation Input Frequency (cclk_in) Divider ratio Sampling delays to be introduced, by phase shifter Delays on cdata_in (introduced by card) Sampling Result, data integrity checks available hold time available setup time Final Result (combines data integrity, setup/hold time > 1ns, startbit validity) Comment
SDR104 200Mhz 200Mhz 1 0ns 0ns Pass 3.75 1.25 Pass  
4.8ns Pass 3.55 1.45
5ns Pass 3.75 1.25
9.6ns Pass 3.35 1.65
10ns Pass 3.75 1.25
1.25ns 0ns Pass 2.5 2.5 Pass  
4.8ns Pass 2.3 2.7
5ns Pass 2.5 2.5
9.6ns Pass 2.1 2.9
10ns Pass 2.5 2.5
2.5ns 0ns Pass 1.25 3.75 Fail  
4.8ns Pass 1.05 3.95
5ns Pass 1.25 3.75
9.6ns Pass 0.85 4.15
10ns Pass 1.25 3.75
3.75ns 0ns Pass 0 5 Fail  
4.8ns Pass 4.8 0.2
5ns hangs    
9.8ns Pass 4.6 0.4
10ns hangs    
SDR50 100Mhz 200Mhz 2 0ns 0ns Pass 3.75 6.25 Pass  
3.6ns Pass 7.35 2.65
7.5ns Pass 1.25 8.75
1.25ns 0ns Pass 2.5 7.5 Fail  
3.6ns Pass 6.1 3.9
7.5ns Pass 9.85 0.15
2.5ns 0ns Pass 1.25 8.75 Pass  
3.6ns Pass 4.85 5.15
7.5ns Pass 8.75 1.25
3.75ns 0ns Pass 5 5 Pass  
3.6ns Pass 8.6 1.4
7.5ns Pass 2.5 7.5
DDR50(DAT) 50Mhz 50Mhz 1 0ns 0ns Pass 3.75 6.25 Fail All the test are failures as the total delay is more than 10ns; 1.4ns + 7ns + 2.35 ns = 10.75ns which is greater than the allowed delay of 10ns
3.5ns Pass 7.25 2.75
7ns Fail 0.75 9.25
5ns 0ns Fail 8.75 1.25 Fail
3.5ns Pass 2.25 7.75
7ns Pass 6.4 3.6
10ns 0ns Fail 3.75 6.25 Fail
3.5ns Fail 7.25 2.75
7ns hangs    
15ns 0ns hangs     Fail
3.5ns Fail 2.25 7.75
7ns Fail 5.75 4.25
SDR25, DDR50(CMD) 50Mhz 50Mhz 1 0ns 0ns Pass 3.75 16.25 Pass  
7ns Pass 10.75 9.25
14ns Pass 17.75 2.25
5ns 0ns Pass 18.75 1.25 Pass  
7ns Pass 5.75 14.25
14ns Pass 12.75 7.25
10ns 0ns Pass 13.75 6.25 Fail  
7ns Pass 0.75 19.25
14ns Pass 7.75 12.25
15ns 0ns Pass 8.75 11.25 Pass  
7ns Pass 15.75 4.25
14ns Pass 2.75 17.25
SDR12 25Mhz 50Mhz 2 0ns 0ns Pass 3.75 36.25 Pass  
7ns Pass 10.75 29.25
14ns Pass 17.75 22.25
5ns 0ns Pass 38.75 1.25 Pass  
7ns Pass 5.75 34.25
14ns Pass 12.75 27.25
10ns 0ns Pass 33.75 6.25 Fail  
7ns Pass 0.75 39.25
14ns Pass 7.75 32.25
15ns 0ns Pass 8.75 31.25 Pass  
7ns Pass 15.75 24.25
14ns Pass 22.75 17.25
Identification Mode 400Khz 50Mhz 125 0ns 0ns Pass 3.75 2516.3 Pass  
25ns Pass 28.75 2491.3
50ns Pass 53.75 2466.3
5ns 0ns Pass 2518.8 1.25 Pass  
25ns Pass 23.75 2496.3
50ns Pass 48.75 2471.3
10ns 0ns Pass 2513.8 6.25 Pass  
25ns Pass 28.75 2491.3
50ns Pass 43.75 2476.3
15ns 0ns Pass 8.75 2511.3 Pass  
25ns Pass 33.75 2486.3
50ns Pass 58.75 2461.3
Table 4: Simulation test results for input path

Green Pass: Hold tome or Setup time pass independently, does not guarantee both
Orange: Margins below 2ns, but not as good as margins marked in green.
Red: Margins below 1ns

The timing in the DDR50 (DAT) fails because the total delay is greater than 10ns. This is calculated as TODLY max + Delay_S = 7 + 2.35 + 1.4 = 10.75ns.

The following can meet the DDR50 (DAT) timing requirements:

  1. The delay on the cdata_in from the card has a delay greater than .75ns, where the cclk_in_sample has a 90 degree (5ns) phase shift.
  2. Within DWC_mobile_storage_ciu, delay the cclk_in to other modules by 1.4ns except for DWC_mobile_storage_clkcntl.v. This eliminates phase shifts on cclk_out and the cclk_in
Summary
An external clock multiplexor structure is needed to meet timing requirements across different card operating modes described in the SD 3.0 specification. The phase shift values need to be determined after the actual place and route and then used for back annotation. The key is to keep the I/O pad delay to a minimum to reduce the stress on timing for the DWC_mobile_storage host controller and card interface.

References
  1. "Physical Layer Specification", Part 1, Version 3.01, Draft 0.82