Jamil Kawa, Synopsys Fellow, Synopsys
Jamil Kawa, Synopsys Fellow, Synopsys
The pace of deep machine learning (ML) and artificial intelligence (AI) is changing the world of computing at all levels of hardware architecture, software, chip manufacturing, and system packaging. Two major developments have opened the doors to implementing new techniques in machine learning. First, vast amounts of data, i.e., “Big Data,” are available for systems to process. Second, advanced GPU architectures now support distributed computing parallelization. With these two developments, designers can take advantage of new techniques that rely on intensive computing and massive amounts of distributed memory to offer new, powerful compute capabilities.
Neuromorphic computing-based machine learning utilizes techniques of spiking neural networks (SNNs), deep neural networks (DNNs) and restricted Boltzmann machines (RBM). Combined with Big Data, “Big Compute” is utilizing statistically-based high-dimensional computing (HDC) that operates on patterns, supporting reasoning built on associative memory and on continuous learning to mimic human memory learning and retention sequences.
Emerging memories range from compute-in-memory SRAMs, STT-MRAMs, SOT-MRAMs, ReRAMs, CB-RAMs, and PCMs. The development of each type is simultaneously trying to enable a transformation in the computation for AI. Together, they are advancing the scale of computational capabilities, energy efficiency, density, and cost.
Several challenges face system designers in choosing the optimal computing architecture and the associated combination of memories supporting their objectives for an ML/AI application. Although designers utilize traditional embedded SRAM, caches, and register files today, no generic nor exotic memory solution can satisfy the newly required AI loads in development. However, as machine learning is projected to consume a majority of the energy consumed, optimizing memories for machine learning helps designers hit their power budgets. This has major implications for system design.
Designers balance the requirements of their designs as they determine which of the nine major challenges are most critical at a given time:
Each of these memory challenges can be addressed in multiple ways, as usually there is more than one alternative for the same objective. Each alternative will have pros and cons, including further scalability implications for architectural decisions.
For example, designers must choose between using SRAMs or a ReRAM array for compute-in-memory. The power and scalability implications of these two options are at extreme opposites. The SRAM option is the right choice when the size of the memory block is relatively small, the required speed of execution is high, and the integration of the in-memory compute within a system-on-chip (SoC) comes naturally as the most logical option (although SRAM is costly in area and in power consumption – both dynamic and leakage). On the other hand, a highly parallelized matrix multiplication typical of deep neural networks requiring a huge amount of memory makes the argument for using ReRAM, because of the density advantages.
Multi-port SRAMs play a special and unique role in compute-in-memory architectures because Boolean logic functions are operations involving multi-inputs and require the ability to simultaneously read data from multiple addressable locations and write the results back in desired memory locations. Multi-port SRAMs and Register files offer that precise flexibility. Also, multi-port SRAMs can be used to construct register files for GPUs crucial for efficient multi-threading.
Table 1: Comparison between emerging memories for neuromorphic computing shows that no single memory type can be the “perfect” memory for all AI chips, but each has their advantages
The list of memories involved in neuromorphic computing is not complete without addressing classical SRAM memories. SRAMs and register files remain the backbone of AI/ML architectures for neuromorphic computing with their unmatched latency in all memory categories. However, the overriding theme of maximizing the TOPS/W metric for AI applications and for Von-Neumann architectures dictates the use of parallelism that can be accomplished with multi-port memories with utmost configuration flexibility to accommodate compute-in-memory and near memory computing. Synopsys actively supports research in compute-in-memory while supporting near-memory computing as the most energy-efficient yet a versatile form of computing.
The era of Big Data and Big Compute is here. Per OpenAI, compute demand by deep learning has been doubling every three months for the last 8 years. Neuromorphic computing with deep neural networks is driving AI growth; however, it is heavily dependent on compact, non-volatile energy-efficient memories with various attractive features to suit different situations. These include STT-MRAMs, SOT-MRAMs, ReRAMs, CB-RAMs, and PCMs. Neuromorphic computing relies on new architectures, new memory technologies, and more efficient than current processing architectures, and it requires compute-in-memory and near memory computing as well as the expertise in memory yield, test, reliability and implementation.