About Stéphane Cordova

Stéphane Cordova is Vice President, Embedded Technology Business Unit at Kalray. Stéphane is a successful executive with a combined 20 years of experience in business (Sales, Marketing, business development and business unit management) in the semiconductor industry around wireless, video and embedded applications. Prior to his current position, Stéphane has held various business and management roles at STMicroelectronics and ST-Ericsson, where he worked in business development with major OEMs and platform-makers for mobile applications and the multimedia industry.

Kalray and Qosmos Announce Partnership Partnership to Benefit Security Applications and Data Center Providers San Francisco Calif., – February 29, 2016 – Kalray a leading provider of acceleration solutions for data centers, and Qosmos®, the leader in deep packet inspection …

Towards an Explicit RTM Stencil Computation Framework on Kalray TurboCard2

RTM seismic imaging migration algorithms are very IO- and compute-intensive, and can benefit from using accelerators such as GPGPU or Manycores.

One of the main concern to the use of accelerators is the integration of legacy code with accelerators and the ability to leverage the standard programming models the industry has relied upon: code written in FORTRAN and parallelized with MPI and OpenMP-3 is not always easy to port to accelerators.

Most of the time the most computationally intensive part of algorithms such as RTM can be reduced to small parts of the code, mostly constituted of loops nests, called kernels. The problem then becomes how to move (or offload) those kernels to the accelerators and how to integrate the offloaded kernels with the rest of the application.

To achieve this, some approaches extends well-known programming models such as OpenMP-3 to support accelerators offloading such as OpenACC and OpenMP-4. One of the shortcomings of this “#pragma based” approach is that it does not always allow to extract most of the performance of the accelerators, because of being too high-level.

For examples, explicit RTM schemes using stencil computation models are mostly IO-bound on current accelerators architectures, and could benefit from strategies such as cache-blocking or time-skewing, but those optimization strategies need to be explicitly described for a specific architecture. These optimization can be hard to implement and defeat the main goals of those approaches which is to be accelerator-independent.

An alternative approach is to bring domain specific languages or libraries to allow the scientists to concentrate on the model itself while letting experts optimize for each platform. Several tools and libraries exists for stencil computation such as ArrayLib, Pluto or Pochoir, but they are not targeting accelerators and they can be intrusive.

In order to facilitate explicit stencils computation such as explicit RTM schemes on the new TurboCard2 accelerator we decided to concentrate our efforts on a stencil library, which will abstract and optimize the domain decomposition and the data distribution on the accelerators directly from the host, while letting the programmers implement their kernels using their usual tools and models.