The bank groups feature used in DDR4 SDRAMs was borrowed from the GDDR5 graphics memories. In order to understand the need for bank groups, the concept of DDR SDRAM prefetch must be understood. Prefetch is the term describing how many words of data are fetched every time a column command is performed with DDR memories. Because the core of the DRAM is much slower than the interface, the difference is bridged by accessing information in parallel and then serializing it out the interface. For example, DDR3 prefetches eight words, which means that every time a read or a write operation is performed, it is performed on eight words of data, and bursts out of, or into, the SDRAM over four clock cycles on both clock edges for a total of eight consecutive operations. Fundamentally, it can be thought of that for DDR3’s prefetch of eight, the interface is eight times faster than the DRAM core.
The downside to the prefetch is that it effectively determines the minimum burst length for the SDRAMs. For example, it is very difficult to have an efficient burst length of four words with DDR3’s prefetch of eight. The bank group feature allows designers to keep a smaller prefetch while increasing performance as if the prefetch is larger.
Since the core speed of the DRAM does not change significantly from generation to generation, the prefetch has increased with every DDR generation to offer increased speed at the SDRAM interface. However, continuing the trend with DDR4 would have required DDR4 to adopt a prefetch of sixteen. This change would make the DRAMs much larger because of all the wires that have to be included. It would make the DRAMs too expensive, so designers saved cost by not going to a prefetch of sixteen. More importantly, a sixteen word prefetch would not match the 64 byte cache line size common in today’s computers. With a 64 bit or 72 bit interface in a typical compute environment, which uses a 64 byte cache line, a prefetch of eight along with a burst length of eight is a better match. Any such misalignment of cache line size and burst length can have a negative impact on the performance of embedded systems.