AI models use a significant amount of memory, adding cost to the silicon. Training neural networks can require gigabytes to tens of gigabytes of data, creating a need for the latest in capacity requirements offered in DDR. As an example, VGG-16, which is an image neural network, requires about 9 GB of memory to train. A more accurate model, VGG-512, requires 89 GB of data to train. To improve the accuracy of an AI model, data scientists use larger datasets. Again, this either increases the time it takes to train the model or increases the memory requirements of the solutions. Due to the massively parallel matrix multiplication required and the size of the models and number of coefficients needed, external memories are required with high bandwidth accesses. New semiconductor interface IP such as High Bandwidth Memory (HBM2) and future derivatives (HBM2e) are seeing rapid adoption to accommodate these needs. Advanced FinFET technologies enabling larger arrays of SRAM on-chip and unique configurations with custom memory-to-processor and memory-to-memory interfaces are being developed to better replicate the human brain and address the memory constraints.
AI models can be compressed. This is a required technique to ensure the models can operate on constrained memory architectures found in SoCs at the edge in mobile phones, automobiles, and IoT applications. Compression is done using techniques called pruning and quantification without reducing the accuracy of the results. This enables traditional SoC architectures, featuring LPDDR or in some cases no external memory to support neural networks, however, there are power consumption and other tradeoffs. As these models are compressed, the irregular memory access and irregular compute intensities increase, prolonging the execution time and latency of the systems. Therefore, system designers are developing innovative, heterogeneous memory architectures.