The world is facing an explosion of data which current technologies were not initially designed for and cannot always handle efficiently. Industry needs a new type of processor. Enter the era of Intelligent Processors.
The overall architecture of the Kalray 3rd generation of MPPA® processor aka Coolidge™ is based on a “Massively Parallel Processor Array” architecture, which is characterized by the association of computing clusters connected to each other, to the external memory and to the I/O interfaces via two independent interconnects.
The MPPA®'s interconnects are suited to different types of data transfers. The first interconnect is an AXI Fabric bus grid, for use read/write access from cores to memories and peripherals connected by PCIe. The second interconnect by an RDMA NoC (Network-on-Chip), that supports data transfers to or from the Ethernet network interfaces and connect all clusters together.
The robust partitioning necessary for safe operation of the processor is carried out at the granularity of the computing cluster and is based on the configuration of memory management units (MMUs), memory protection units (MPUs), and on the deactivation or not of network on chip links.
MPPA® Unique Architecture
An Innovative Cluster Partition
The Coolidge™ processor cluster is partitioned between a secure area and an user application area. The secure area includes a core (RM core) dedicated to security and safety functions, associated with an isolated memory bank.
The user application zone brings together 16 processing cores (PE cores) and a data movement engine (DMA) connected to a 4MB local memory called SMEM and composed of 16 banks.
The 16 processing cores operates in two modes:
- An SMP (Symetric Multi-Processing) mode intended for high performance applications, where the PE cores behave like a multicore processor CPU.
- An AMP (“Assymetric Multi-Processing”) mode intended for real-time applications, where the PE cores behave like sixteen independent single-core CPUs.
Kalray's VLIW Core
The cores used by the Coolidge™ processor all implement the same architecture, of VLIW (“Very Long Instruction Word”) type. VLIW architecture is used on embedded processors for signal processing and predictability, as well as increased resilience to Meltdown and Spectre security attacks. On a VLIW core, the exploitable parallelism between the instructions is detected by the compiler then explained in the binary code by marking the packets of instructions that can be executed in parallel. This allows cores with precise temporal behavior, more compact for a given processing capacity, which also allows the integration of a greater number of cores.
Tightly Coupled Accelerators
The solution adopted is to tightly couple each core to a coprocessor specializing in performing mixed precision matrix operations.
The data is transferred in blocks of 32 bytes between the memory and the registers of the coprocessor, according to the flow of the program executed by the core. When this data is processed by the coprocessor, it is interpreted as arrays of four rows and a varying number of columns depending on the size of the elements: integers between 8 and 64 bits, 16 or 32 bit floating point number.
Operating on two-dimensional data allows the coprocessor to achieve high computational intensity, up to sixteen dot products between vectors of eight elements and sixteen accumulations per cycle.
Very High-speed Interfaces
Kalray’s solution offers both extremely high computing power capable of processing considerable data volumes while minimizing energy consumption; as well as on-the-fly heterogenous processing capabilities. The main product is a standard PCIe card, highly configurable for composability needs, based on Kalray’s MPPA® processor implementing leading edge interfaces, including PCIe GEN4 x16 and 2x100G Ethernet. As an example, coupled with the latest PCIe GEN4 AMD processor, MPPA® can be used as an accelerator delivering a full duplex bandwidth of up to 256 Gbit/s.
Scalability: From Cluster to Cluster IP to Multi-processors
One of the key benefits of Massively Parallel Architecture Array architecture used on manycore processors is the ability to scale from 1 to N clusters. This was indeed one of the key triggers of the invention of manycore processors i.e. integrate hundreds of cores into a single piece of silicon with a high capability to scale which overcomes the traditional limitations of multicore processors in this area.. At chip level, scalability still exist i.e. you can add multiple MPPA® processors to increase performances of your system. Either within a monolithic implementation or multi-chip implementation, cluster scalability capability is unique to the MPPA® architecture.
In conclusion, taking full advantage of Kalray’s patented MPPA® (Massively Parallel Processor Array) architecture and 16nm FinFet technology, the MPPA® Coolidge™ processor is a scalable 80-core processor designed for intelligent systems. It offers a unique alternative to conventional approaches such as GPU, ASIC and FPGA, bringing unique value to multiple applications from Data Centers, to Edge or Embedded systems.
AN INNOVATIVE CLUSTER PARTITION:
KALRAY's VLIW CORE: