Figure 1. Different types of processing in machine learning inference applications
Memory requirements for low/mid-end machine learning inference are typically modest, thanks to limited input data rates and the use of neural networks with limited complexity. Input and output maps often are of limited size, i.e. a few 10s of kBytes or less, and the number and size of the weight kernels are also relatively small. Using the smallest possible data types for input maps, output maps and weight kernels helps to reduce memory requirements.
Low/mid-end machine learning inference applications require the following types of processing:
The different types of processing may be implemented using a heterogeneous multi-processor architecture, with different types of processors to satisfy the different processing requirements. However, for low/mid-end machine learning inference, the total compute requirements are often limited and can be handled by a single processor running at a reasonable frequency, provided it has the right capabilities. Using a single processor eliminates the area overheads and communication overheads associated with multi-processor architectures. It also simplifies software development, as a single toolchain can be used for the complete application. However, it requires that the processor can perform the different types of processing, i.e. DSP, neural network processing, and control processing, with excellent cycle efficiency.
Figure 2. Two types of vector MAC instructions of the DesignWare ARC EM9D processor
Figure 3. ARC EM9D processor with XY memory and address generations units (AGUs)