Wireless (cellular and WiFi) mobile communication standards drove the need for greater computational complexity, and 3.9G modems, with multiple antenna, multiple input and multiple output (MIMO), channel aggregation, and estimation algorithms, saw the need for more software programmability to support greater functionality on a mobile baseband chip. In parallel, compiler technology has also evolved to recognize DSP vector data types and match to loop strides, and performed basic auto vectorization inside inner loops.
As performance requirements increased, DSPs’ SIMD width increased to 16, 32, and even 64 MACs/cycle for cellular mobile and infrastructure applications. In addition, a more customized instruction set architecture (ISA) that included more DSP filter acceleration instructions, matrix computation acceleration, and addressing modes further accelerated operations for better performance.
A dual load store architecture with wider load and/or store units was used to support higher throughput of complex vector data in the MAC units. Coupled with this, register files were updated with larger numbers of dedicated vector data register files to balance the architecture to the computation throughput and minimize internal register pressure.
The VLIW architecture also expanded to support higher computation throughput. For example, to perform at optimal performance, the FFT function requires Load, Load, Execution, and Store, hence 4-issue VLIW for DSP vector operations.
These DSPs integrated vector and scalar engines, which can execute in parallel. Some architectures have VLIW schemes of split DSP vector operations and scalar operations, resulting in large instruction word lengths. Other DSPs merge the VLIW vector and scalar operations, resulting in smaller instruction word length but more instruction decoding complexity. On these DSPs, the control and DSP functions are pre-mapped and scheduled by the compiler into VLIW instructions.
As the computational complexity increased with LTE category number, there was a shift to using multiple cores in modem systems to address varying processing requirements within the system. The processors had different architectures and performance capabilities depending upon the functional component of the system in which they were used. For example, the front-end of the LTE modem requires complex vector data computation on DSP functions. Separate from that is the need for soft-bit domain processing that consists of scalar 16-bit data best suited for a different type of architecture/ISA, and better served by smaller or task-specific processors.
With 4G (LTE-Advanced) modems, the computation complexity increased about 10x compared to 3G modems. To support this, DSPs are further optimized and have further acceleration of wireless communications algorithms with customized instructions, either as part of the base ISA or as extendable options. The addition of floating point support used for infrastructure MIMO computation algorithms was another key advancement in DSP processor technology.