DPU PROCESSORS
Kalray's MPPA® DPU Processors
A New Class of Processors, Specialized in Intelligent Data Processing, for Infrastructure, Compute and AI Acceleration
Coolidge™ is the third generation of Kalray’s MPPA® DPU (Data Processing Unit) processors. Coolidge™ is natively capable of managing multiple workloads in parallel with no bottlenecks to enable smarter, more efficient and energy-wise data-intensive applications.
Taking full advantage of Kalray’s patented MPPA® (Massively Parallel Processor Array) architecture, Coolidge™ is a scalable 80-core processor designed for intelligent data processing. It offers a unique alternative to GPU, ASIC or FPGA, bringing unique value to multiple applications from Data Centers, to the Edge and in Embedded systems.

Key Benefits of MPPA® DPU Processor
- High Performance Computing: Performance scalability within die, die-to-die, many dies.
- Heterogeneous Multi-processing: Parallel execution of dozens of heterogeneous critical tasks, including AI inference.
- Easy Programming: C/C++ / Open CL™ / Linux / POSIX / RTOS
- Real-time Data Processing: High speed I/O, RDMA architecture type
- Power Efficiency
- Security/Safety: Determinism, Freedom from interference, Secure boot
Use Cases
USE CASE

Develop Next Gen Storage and Networking Systems
Flexible integration in state-of-the-art PCIe Gen4, 100GbE appliances:
- JBOF Target controller, I/O controller, SmartNIC, SmartSSD use cases
- 2 configurations: stand-alone or x86 CPU offloading
- Support virtualized, containerized or bare metal infrastructures
- Dynamic resource allocation for Control, Data & Management Planes
.
Acceleration of high-performance protocols, services and QoS:
- NVMe-oF, RoCE/RDMA, TCP/IP, NVMe, OVS/NFV protocols
- Smart Load-Balancer, Priority Flow Control, Stateless L1-L4 parsing
- RAID6: 154 Gbit/s Erasure Coding (Reed-Solomon) per cluster
- Line-rate encryption/decryption/hash (IPSEC, TLS, XTS, MACsec)
- AI capability for analytics and adaptive configuration
USE CASE

Build Accelerated Compute-Intensive Applications
Acceleration of complex workloads:
- Patented core + co-processor boosting Machine Learning Inference
- Computer Vision
- Signal Processing (e.g. FFT), Cryptography, Mathematics
Build stand-alone intelligent embedded systems:
- Multi-OS (Linux, RTOS) systems
- Support ”Freedom from Interference” for mixed criticality
Build next gen Edge Computing Systems:
- Process Data at the Intelligent Edge
- Real-time analytics for automation, prediction, and control
- Easy integration into existing systems
USE CASE

With MPPA® DPU Processors, the Possibilities Are Endless Allowing You to Innovate Without Borders
Powered by 80 cores, MPPA® processor is a new generation of intelligent processor with unique capabilities in terms of programmability, performance, parallel execution of multiple criticial tasks, energy efficiency, safety and security. Our breakthrough MPPA® technology is paving the way to a new data processing era.
.
The kind of intelligent processor that give you the power to do more. More to propel fast developing sectors from 5G telecom networks, autonomous vehicles all the way to healthcare equipment, industry 4.0, drones and robots … and more!
Technical Corner / Key Features
CORE
64-bit/32-bit architecture |
From 600MHz to 1.2 GHz |
6-issue VLIW |
16KB instruction cache / 16KB data cache with MMU |
IEEE 754-2008 Floating Point Unit (FPU) |
Square root and reciprocal operations in floating single precision |
64-bit integer multiplication (Asymmetric cryptography) |
Up to 4 execution rings |
Up to 256-bits per cycle Load/Store |
CO-PROCESSOR (ONE PER CORE)
Acceleration of INT8, INT16 or FP16 accuracy |
Up to 128 MAC per cycle |
CLUSTER
16 Application Cores + 1 Management/Security Core |
4 MB of Memory / L2 Cache – 600GB/s Low Latency / High Speed |
Configurable cluster/chip cache coherency & deterministic modes |
SYSTEM-ON-CHIP
5 clusters (total of 80 Application Cores + 5 Management Cores) |
Up to 1.15 TFLOPs (SP) / 384 GFLOPs (DP) |
Up to 3 TFLOPs (16 bits) / 25 TOPs (8bits) for deep learning |
56GB/s chip-to-chip communications (16 +12.5) x 2 |
16-lane PCIe GEN4 Endpoint (EP) or Root Complex (RC) |
Bifurcation up to 8 downstream ports in RC mode |
SR-IOV up to 8 Physical Functions / 248 Virtual Functions |
Address translation and protection |
Up to 2048 MSI-X & 64 MSI interrupts |
Support for Hot Plug |
Up to 512 DMAs for multi queues / kernel bypass drivers |
Direct PCIe-to-clusters and PCIe-to-DDR transfers |
Support for NVMe and VIRTIO emulation |
64-bit DDR4/LPDDR4-3200 channels with sideband/inline ECC |
Up to two ranks per DDR4 Channel |
2 DDR channels (up to 32GB) with channel interleaving |
8×1/8×10/8×25/2×40/4×50/2×100 GbE |
RDMA over Converged Ethernet (RoCE v1 and v2) |
Jumbo Frame Support (9.6KB) |
Support for PTP/IEEE 1588v2 |
Priority Flow Control (PFC), IEEE 802.1Qbb |
Checksum offload Header & Payload |
Line rate packet classification/smart load balancing |
Hash & Round-robin based dispatch policy |
Secure Boot with authentication & encryption |
True Random Number Generators (TRNG) |
RSA, Diffie-Hellman, DSA, ECC, EC-DSA and EC-DH acceleration |
AES-128/192/256 (ECB/CBC/ICM/CTR/GCM/GMAC/CCM) |
AES-XTS for storage application |
MD5/SHA-1, SHA-2, SHA-3 |
Kazumi/Snow 3G, ZUC |
GPIOs/UARTs/SPI/I2C/CAN/PWM |
SSI Controller for serial NOR Flash with optional boot |
SDCARD UHS-I / eMMC 4.51 memory controller |
2x USB 2.0 OTG ULPI |
JTAG IEEE 1149.1 |
16-bit Parallel Trace Interface |
Mix criticality support |
Lockable critical configuration |
Capability to bank memory and caches for non-interference & time-predictable execution |
L1 Cache coherency enabling/disabling |
Get Started Now!
Want to learn more about the MPPA®DPU?
Related Content
Software development environment for developping applications using open coding standards on Kalray's processors.
Kalray's programmable, low-power PCIe card that can be used in acceleration or standalone mode.
Fully programmable cards that bring the benefits of the MPPA® DPU technology to data centers for higher performance & more flexible solutions.