loader image

Reduce TCO in the Data Center

Tim Lieber

5 min read

DPU Use Cases and Calculations

We’ve discussed the role a DPU can play in driving efficiency in the data center, and how DPUs work in harmony with GPUs and CPUs in data center architecture.

Now, let’s explore an important aspect of DPU adoption – the savings and TCO benefits.

Efficiency Is Everything

The most important reason DPU processors have emerged is to reduce TCO (Total Cost of Ownership). DPUs must create savings by taking work away from other processors in a data center. This could be network offloading, compute offload, or data services offloading. The savings should be both CAPEX (reducing costs of components or eliminating components) and OPEX (reducing the utility bill and increasing reliability through simplification of the DC). A DPU processor must use less power than other processing units, must be lower cost than other processing units, and must improve the reliability of the data center. For a DPU to be a DPU it must contribute to several if not all these TCO challenges, achieving the best performance efficiency, per watt and per dollar.

DPUs – A Complementary Game-Changer

The DPU must be more efficient than other processors in the data center for the work that it will offload. In a modern data center, the CPU (central processing unit) and GPU (graphics processing unit) are the dominant targets for workload replacement by a DPU. The optimal modern data center has all three processor types working in complete unison in the data center. We believe that there is a place for all three to coexist, each doing the work best suited to its capabilities. The Kalray DPU is an excellent data manager. It keeps the data safe, secure and available for the heavy data crunch of the GPU or for general use by all the applications running on the CPU.

The Practical Value of DPUs in x86 Offloading

To calculate value, we need to consider each of the above use cases separately. DPUs can be companions to CPUs and GPUs, i.e., reside in the server of the CPU/GPU farm and offload networking and storage functions, including storage management, or they can work independently from another processor, as is the case when integrating them into a JBOF to offload the same storage functions. Both offer savings to different degrees. Each data center is different with different economies of scale, but there is a recipe to calculate savings that can be applied to each scenario to give a clearer picture. In the server environment, the DPU must at a minimum replace all NIC functionally. In reality, to create value, the DPU must assume all network functions including NIC and CPU networking tasks.

At maximum, you want the DPU to take the NIC functions and all the management of local or remote storage. This is where the real value can add up. Think of an enterprise data center where the NIC and fibre channel HBA are combined into one. The DPU should have the ability to perform all forms of data security, protection, reduction, or placement required of the server. The x86 should be completely out of the picture for these services.

The savings exist along a continuum between minimum and maximum. At a minimum, the server eliminates the NIC and some network data offloading from the x86. Call these savings X. For most data centers, savings are increased by offloading extra storage function work from the x86. Instead of X% savings from network offload, you get (X+Y)% offload from both network and storage offload, where Y equals the reduced cost of storage management.

DPUs Reduce Software License Costs 

There is a third component of savings for some data centers. If the data center relies on some purchased or licensed software to provide data services, eliminating those additional licenses and passing control to the DPU creates another opportunity for savings, which we’ll call Z.

Z could be the number of servers in the data center, or storage devices in the data center, times the license charge. For many data centers, Z is a substantial number.

The way to look at the totality of these savings is either X% fewer servers or (X+Y)% fewer servers required to perform the same function in the data center plus any reduction in the licensing fees (Z) which the DPU alleviates. X and Y are relatively easy to determine in each data center scenario. Calculate your CPU savings for network offload and storage offload, then determine how many servers can be eliminated by bringing the remaining servers back to full utilization without the data services. And since the cost of the DPU is about the same as that of a SmartNIC and takes about the same or less power, there’s no real cost added to the servers, which remain in the data center. The result? Lots of upside in savings with little downside as long as the DPU integrates cleanly into a data center’s existing infrastructure. A DPU should integrate into a data center without burdening the DC with additional cost or engineering changes to the hardware or software stack.

The Practical Value of DPUs in NVMe-oF Remote Storage

In the following use case, DPUs replace the x86 server in a storage array. The TCO and benefit statement of this scenario may be even more attractive for data centers that can integrate the architecture.
The simple economics are favorable, because the x86-less storage array requires much less power (maybe 50% less) and costs less. If the data center takes advantage of disaggregating local SSDs in its NVMe storage server to eliminate local storage in servers, then the TCO calculation includes those efficiencies as well.

In the case of an NVMe storage array, instead of having some number of SSDs per server with poor utilization of performance, capacity, or both, the disaggregated model has some smaller fraction of the total NVMe SSDs in remote arrays sharing the performance and capacity across the client server pool. Since the storage array now handles all storage management, this use case also offloads storage from the client server farm, resulting in fewer servers. This scenario creates fabric savings in addition to the reduced TCO calculation.

Let’s look at an example of where the server is required to protect the data before storing it remotely. The server must break the data up into chunks, calculate parity chunks, and then ship everything to a remote storage pool. This results in significant amplification in IOPS and bandwidth consumption, not to mention server x86 and DDR resource consumption.

As an alternative, data can be shipped to the DPU storage pool to let the DPU deal with the protection steps, including distributing the data per applicable QoS-based rules of the data set to either local or remote storage arrays.

All the TCO savings mentioned are additive, and none are exclusive. Each TCO-boosting function can be adopted separately or as part of a complete integration.

Adopting QLC NVMe Devices with DPU Technology

Another use case worth exploring involves using the Kalray DPU as a technology insulator. The DPU performs this role in addition to standard offload functions as a companion to an X86 or standalone in a JBOF.

A good example of this is the incorporation of ZNS QLC based SSDs into the data center. SMR HDDs are very similar in their behavior, and it took 8–10 years before they were adopted in the data center, the reason being that file systems, operating systems, and in some cases applications, needed to be modified to handle the write algorithms of the SMR HDD. The same holds true for ZNS SSDs and QLC SSDs in general.

The DPU is a perfect candidate to completely virtualize QLC SSDs (and the future PLC SSDs) since it can handle the highly random, any block size writes, and manage the backend sequentialization required by SSDs. With DPUs, any data center can take advantage of the TCO savings of the latest technologies without making changes to their software stack.

For Max TCO Efficiency, Don’t Skip the DPU DPUs are highly complementary to CPUs and GPUs and must coexist with them to realize maximum TCO efficiency. A flexible DPU can be adopted in many different workloads and architectures. But not all DPUs are created equal, the best ones are highly programmable with hardwire and computational capabilities, which lets them adapt to each data center’s needs and use cases.

As a DPU pioneer, Kalray stands out in the DPU landscape with a unique architecture that offers exceptional capabilities, with the best performance per watt and per dollar in the market. The many core architecture of the Kalray DPU makes it highly adaptable to implement any or all of the TCO saving measures detailed above.


Next Up: DPUs & Dta-Centric Workflows


Lead Solutions Architect, Kalray

Tim Lieber is a Lead Solution Architect with Kalray working with product management and engineering. His role is to innovate product features utilizing the Kalray MPPA DPU to solve data center challenges with solutions which improve performance and meet aggressive TCO efficiency targets. Tim has been with Kalray for approximately 4 years. He has worked in the computing and storage industry for 40+ years in innovation, leadership and architectural roles.

You also may like:

Sorry, no content found.