DesignWare® ARC® NPX Neural Processor IP family provides a high-performance, power- and area-efficient IP solution for a range of applications requiring AI enabled SoCs. The ARC NPX6 NPU IP is designed for deep learning algorithm coverage including both computer vision tasks such as object detection, image quality improvement, and scene segmentation, and for broader AI applications such as audio and natural language processing.
The NPX6 NPU family offers multiple products to choose from to meet your specific application requirements. The architecture is based on individual cores that can scale from 4K MACs to 96K MACs for a single AI engine performance of over 250 TOPS and over 440 TOPS with sparsity. The NPX6 NPU IP includes hardware and software support for multi-NPU clusters of up to 8 NPUs achieving 3500 TOPS with sparsity. Advanced bandwidth features in hardware and software, and a memory hierarchy (including L1 memory in each core and a high-performance, low-latency interconnect to access a shared L2 memory) make scaling to a high MAC count possible. An optional tensor floating point
unit is available for applications benefiting from BF16 or FP16 inside the
To speed application software development, the ARC NPX6 NPU Processor IP is supported by the MetaWare MX Development Toolkit, a comprehensive software programming environment that includes a neural network Software Development Kit (NN SDK) and support for virtual models. The NN SDK automatically converts neural networks trained using popular frameworks, like Pytorch, Tensorflow, or ONNX into optimized executable code for the NPX hardware.
The NPX6 NPU Processor IP can be used to create a range of products – from a few TOPS to 1000s of TOPS – that can be programmed with a single toolchain.
DesignWare ARC NPX6 NPU Family for AI / Neural Processing
Downloads and Documentation
- Scalable real-time AI / neural processor IP with up to 3,500 TOPS performance
- Supports CNNs, RNNs/LSTMs, transformers, recommender networks, etc.
- Industry leading power efficiency (up to 30 TOPS/W)
- 1-24 cores of an enhanced 4K MAC/core convolution accelerator
- Tensor accelerator providing flexible activation and support of Tensor Operator Set Architecture (TOSA)
- Software Development Kit
- Automatic mixed mode quantization tools
- Bandwidth reduction through architecture and software tool features
- Latency reduction through parallel processing of individual layers
- Seamless integration with DesignWare ARC VPX vector DSPs
- High productivity MetaWare MX Development Toolkit supports Tensorflow and Pytorch frameworks and ONNX exchange format
|Enhanced Neural Processing Unit providing 16,384 MACs/cycle of performance for AI applications||STARs
|Enhanced Neural Processing Unit providing 32,768 MACs/cycle of performance for AI applications||STARs
|Enhanced Neural Processing Unit providing 4096 MACs/cycle of performance for AI applications||STARs
|Enhanced Neural Processing Unit providing 65,536 MACs/cycle of performance for AI applications||STARs
|Enhanced Neural Processing Unit providing 8,192 MACs/cycle of performance for AI applications||STARs
|Enhanced Neural Processing Unit providing 98,304 MACs/cycle of performance for AI applications||STARs
|Optional extension of NPX6 NPU tensor operations to include floating-point support with BF16 or BF16+FP16||STARs