Memory designers should incorporate long channel devices wherever possible when building custom cache memory instances for CPUs and DSPs, as this will reduce leakage power in these high speed memories. This technique is also useful when building memories for GPUs.
By nature, GPUs are datapath intensive designs and involve a lot of FIFO usage. These FIFOs are built out of memories that have one read port and one write port. Traditionally, FIFOs were built using 8-transistor bitcells to support asynchronous clocks for the read and write ports. However, today’s GPU cores have a single clock across the whole SoC. Memory designers can take advantage of this single clock to reduce area and power leakage. Instead of using 8-transistor bitcells and supporting two asynchronous clocks in the two-port memory, designers can use 6-transistor bitcells and support a single clock going to both the read and write ports. Both read and write operations then need to complete in a single memory clock cycle which increases the minimum cycle time. However, even with this increase in cycle time, 6-transistor FIFOs can support a 400 MHz clock frequency, resulting in tremendous area and leakage savings on each GPU core. In the example illustrated in Table 1, replacing a High-Density Two-Port Register File (HD 2P RF), which uses an 8-transistor bitcell, with an Ultra High-Density Two-Port Register File (UHD 2P RF) using a 6-transistor bitcell, results in an almost 50% reduction in area and a reduction in leakage by a third.