TO KALRAY'S BLOG!
February 03, 2021
Open Standards and High Performance Programming: Offloading on Manycore Architecture (Part 1/2)
Kalray is proud to have officially reached OpenCL™ conformance at the close of 2020 for Coolidge™, Kalray’s 3rd generation of MPPA® (Massively Parallel Processor Array) DPU intelligent processor.
Kalray MPPA® DPU intelligent processors are a new generation of processors specialized in intelligent data processing from cloud to edge. They are able to capture and analyze on the fly massive data flows, and interact in real time with the outside world. These processors are capable of running demanding AI algorithms and simultaneously a wide set of different processing and control tasks such as intensive mathematical algorithms, signal processing, network or storage software stacks.
Let’s explore why it is important for our users that Coolidge™ DPU now runs a conformant implementation of OpenCL™, and how we offer open standard programming for high performance and flexible applications on manycore processors.
The Complex Choice of Programming Framework
The evolution of high-performance systems is crying out for disruptive hardware architectures and innovative software programming models. The challenges encountered by embedded compute system users reside in the selection of the appropriate hardware technologies and in the selection of the programming models for Computer vision, Neural networks, Machine learning…
It usually requires:
- Re-use of legacy code
- Ease of finding high qualified engineers
- Flexibility for porting from one hardware architecture to another
- Long term maintenance
- Rapid prototyping up to productization…
Some solution providers are proposing a proprietary framework and API while others are implementing a defined Standard API for full open framework. Such frameworks needs to support high level interfaces for several types of applications and help users to initialize, use and “combine” these applications.
The Deliberate Choice of Open Standards
At Kalray, we are convinced Open Standards answer the requirements set out above. This is why Kalray’s Software Development Kit, AccessCore® SDK, relies massively on Open Standards and why we directly worked with Khronos (www.khronos.org) for selecting the most appropriate programming solution for parallel architectures and performance offloading.
You will find in this paper why the OpenCL™ has been the obvious choice to offer efficient, open, portable, known and extensible programming model!
The definition of OpenCL™:
OpenCL™ (Open Computing Language) is an open, royalty-free standard for cross-platform, parallel programming of diverse accelerators found in supercomputers, cloud servers, personal computers, mobile devices and embedded platforms. OpenCL™ greatly improves the speed and responsiveness of a wide spectrum of applications in numerous market categories including professional creative tools, scientific and medical software, vision processing, and neural network training and inferencing (from www.khronos.org/opencl).
It is a fit. Let’s confirm this.
A platform for OpenCL™ typically embeds one processor, we will call the “host”, and a dedicated hardware element for acceleration of algorithm, we will call the “device”. A device can be a GPU, a FPGA, and obviously a MPPA® processor.
The host is in charge of launching acceleration requests on the device and to explicitly manage all data transfers between itself and the device. The device can execute these requests and provides results. These requests are comprised of executing code, we will call the “kernels”, and the data.
Mapping OpenCL™ on MPPA® DPU (Data Processing Unit) Processor
Let’s explain how OpenCL™ is a really good fit for our ultra-performant hardware architecture. If you don’t know OpenCL™, take it as a quick introduction, if you are an OpenCL™ expert, you will want to jump to the detail of how MPPA® DPU maps to each of the OpenCL™ models.
OpenCL™ relies on three main concepts:
- The Platform: Defining the hardware topology being used
- The Execution Model: Defining the way programs will execute on the Platform
- The Memory Model: Defining the way the memory will be used on the Platform during Execution
Let’s be pragmatic and present here the OpenCL™ definition (on the left) and the mapping on MPPA® architecture (on the right) for each of these concepts so you have a clear view of it.
Advantages to Users
We are seeing an explosion of the most demanding applications that require a tremendous range of advanced computing capabilities. The focus so far has been to execute these applications on a dedicated type of architecture, the GPU (initially conceived for graphical demands). As the industry expands its needs for neural network, algebra calculation and computer vision algorithms, more adapted architectures are being developed and used.
Here enters Kalray’s MPPA® DPU intelligent processor which provides high performance for heterogeneous computation while keeping a homogenous architecture. The challenges, as mentioned above, are for the users to be able to re-use already developed applications, to port them and to evaluate benefits of our architecture (execution time, latency, power consumption…). In addressing these challenges, we must also minimize the learning ramp up and the maintenance burden of a new language whilst reducing the need for extensive training.
By adopting an open standard, Kalray makes MPPA® DPU adoption easy for developers. They can use legacy code, they know the programming environment, they are not surprised about configuration capabilities and are even used to the optimization methods.
With this proof of commitment to the OpenCL™ conformance of Coolidge™, our 3rd generation of MPPA® DPU intelligent processor, Kalray ensures that our users can rely on our implementation as much as that from other major actors in the industry. In addition, as Khronos member, we are involved into Khronos Working Groups for contributing to these Open Standards evolutions and adoption.
Stay tuned, our upcoming post will jump into more technical details!