MPPA® DPU ARCHITECTURE
A MASSIVELY PARALLEL PROCESSOR ARRAY ARCHITECTURE
Kalray MPPA® DPU Manycore
A Massively Parallel Processor Array Architecture
The world is facing an explosion of data which current technologies were not initially designed for and cannot always handle efficiently. Industry needs a new type of processor. Enter the era of Intelligent DPU (Data Processing Unit) Processors.

The overall architecture of the Kalray 3rd generation of MPPA® DPU (Data Processing Unit) processor aka Coolidge™ is based on a “Massively Parallel Processor Array” architecture, which is characterized by the association of computing clusters connected to each other, to the external memory and to the I/O interfaces via two independent interconnects.
The MPPA® DPU’s interconnects are suited to different types of data transfers. The first interconnect is an AXI Fabric bus grid, for use read/write access from cores to memories and peripherals connected by PCIe. The second interconnect by an RDMA NoC (Network-on-Chip), that supports data transfers to or from the Ethernet network interfaces and connect all clusters together.
The robust partitioning necessary for safe operation of the processor is carried out at the granularity of the computing cluster and is based on the configuration of memory management units (MMUs), memory protection units (MPUs), and on the deactivation or not of network on chip links.

MPPA® DPU Unique Architecture
(Data Processing Unit)
The Coolidge™ DPU (Data Processing Unit) processor cluster is partitioned between a secure area and an user application area. The secure area includes a core (RM core) dedicated to security and safety functions, associated with an isolated memory bank.
The user application zone brings together 16 processing cores (PE cores) and a data movement engine (DMA) connected to a 4MB local memory called SMEM and composed of 16 banks.
The 16 processing cores operates in two modes:
- An SMP (Symetric Multi-Processing) mode intended for high performance applications, where the PE cores behave like a multicore processor CPU.
- An AMP (“Assymetric Multi-Processing”) mode intended for real-time applications, where the ¨PE cores behave like sixteen independent single-core CPUs.
DPU
The cores used by the Coolidge™ DPU processor all implement the same architecture, of VLIW (“Very Long Instruction Word”) type. VLIW architecture is used on embedded processors for signal processing and predictability, as well as increased resilience to Meltdown and Spectre security attacks. On a VLIW core, the exploitable parallelism between the instructions is detected by the compiler then explained in the binary code by marking the packets of instructions that can be executed in parallel. This allows cores with precise temporal behavior, more compact for a given processing capacity, which also allows the integration of a greater number of cores.
The solution adopted is to tightly couple each core to a coprocessor specializing in performing mixed precision matrix operations.
The data is transferred in blocks of 32 bytes between the memory and the registers of the coprocessor, according to the flow of the program executed by the core. When this data is processed by the coprocessor, it is interpreted as arrays of four rows and a varying number of columns depending on the size of the elements: integers between 8 and 64 bits, 16 or 32 bit floating point number.
Operating on two-dimensional data allows the coprocessor to achieve high computational intensity, up to sixteen dot products between vectors of eight elements and sixteen accumulations per cycle.
Kalray’s solution offers both extremely high computing power capable of processing considerable data volumes while minimizing energy consumption; as well as on-the-fly heterogenous processing capabilities. The main product is a standard PCIe card, highly configurable for composability needs, based on Kalray’s MPPA® DPU (Data Processing Unit) processor implementing leading edge interfaces, including PCIe GEN4 x16 and 2x100G Ethernet. As an example, coupled with the latest PCIe GEN4 AMD processor, the MPPA® DPU can be used as an accelerator delivering a full duplex bandwidth of up to 256 Gbit/s.
One of the key benefits of Massively Parallel Architecture Array architecture used on manycore processors is the ability to scale from 1 to N clusters. This was indeed one of the key triggers of the invention of manycore processors i.e. integrate hundreds of cores into a single piece of silicon with a high capability to scale which overcomes the traditional limitations of multicore processors in this area.. At chip level, scalability still exist i.e. you can add multiple MPPA® DPU processors to increase performances of your system. Either within a monolithic implementation or multi-chip implementation, cluster scalability capability is unique to the MPPA® DPU architecture.
In conclusion, taking full advantage of Kalray’s patented MPPA® (Massively Parallel Processor Array) architecture and 16nm FinFet technology, the MPPA® DPU Coolidge™ processor is a scalable 80-core processor designed for intelligent systems. It offers a unique alternative to conventional approaches such as GPU, ASIC and FPGA, bringing unique value to multiple applications from Data Centers, to Edge or Embedded systems.

COOLIDGE™ MPPA® DPU Processor
The Massively Parallel Processor Array (MPPA®) is Kalray’s ground-breaking manycore technology, giving DPU chips more processing power with less power consumption.
Get Started Now!
Want to learn more about our unique MPPA® DPU technology
and related solutions?
Related Content
Software development environment for developping applications using open coding standards on Kalray's processors.
Kalray's programmable, low-power PCIe card that can be used in acceleration or standalone mode.
This Kalray presentation describes Kalray’s KV3 VLIW core (key component of the MPPA® DPU processor) and experience in the development of its LLVM compiler backend.
Fully programmable cards that bring the benefits of the MPPA® DPU (Data Processing Unit) technology to data centers for higher performance & more flexible solutions.