Skip to content
MK edited this page Oct 9, 2019 · 12 revisions

pFaces is a generic Cloud-ready, multi-compute-platform accelerator for parallel programs. It considers a heterogenous model of compute platforms, as shown in the next figure. A hardware configuration (HWC) is a network of compute nodes. Compute nodes (CN) are containers of computing of compute devices. compute devices are responsible for running the kernels of pFaces and they can have various types. For example, a compute device can be a CPU with multiple cores, a GPU, or a HW-accelerator card. Each compute device contributes to the HWC by one or more processing elements (PE), such as cores inside CPUs. Each PE can run one or more thread of serial instructions.

pFaces has an acceleration engine that works accepts kernels representing the code to be accelerated. The kernel is developed by the user in C++ and OpenCL languages. pFaces comes with an SDK that users can use to develop the kernels independent of the pFaces accelerator. Once the kernel is developed, it can be passed to pFaces as a loadable object (a.k.a plugin) that is executed by pFaces on the HWC. The user can also control the behavior of his developed kernel via text configuration files without the need to modify the kernel object. The following figure shows the main components inside pFaces allowing loading the supplied kernel and running it in the HWC.

Features of pFaces

  • A task-independent engine allowing for extensible usage and automated acceleration of any compue-intensive algorithm.
  • Implemnts a hybrid MPI/OpenCL model targeting non-uniform clusters of heterogenous parallel compute devices, including: CPUs, GPUs and HW-accelerators.
  • Automated identification and utilization of node-level computation power and memory matching the case under acceleration.
  • Supports for automated online compilation of source kernels to match and utilize the features of underlying processing power of executing devices.
  • Customizable logging engine.
  • Native support for Linux, Windows and MacOS operating systems.

pFaces is intended to be a commertial acceleration ecosystem. We provide here a demo version of pFaces with the following limited functionalities:

  • Running within a single compute-node. This means: you cannot use this demo version of pFaces within a cluster of multiple interconnected machines.
  • Running pFaces with one CPU, one GPU, or one HWA. Thios means: you cannot combine multiple devices to increase the number of processing elements. The targeted device can not have more than 20 compute units (e.g., 20 processor cores in an Intel CPU).
  • We provide only binaries that are compatible Ubuntu 16.04, Ubuntu 18.04, Windows x64, and MacOS.
  • Kernel auto-tuning is disabled.

Interessted users/developers can request exclusive non-redistributable EULA-covered access to a fully functional version after contacting Mahmoud Khaled personally.

Kernels in pFaces

Kernels in pFaces are developed in C++ and an extensible version of OpenCL using some inline pFaces-related codes. Any legacy OpenCL kernel function can be reimplemented to work with pFaces.

The code in C++ is used to coordinate the execution of OpenCL codes, implement task-parallel C++ using threading, and communicate with pFaces. The compiled version of the C++ code is called the kernel driver. The kernel driver may also render the kernel's OpenCL codes based on the provided configuration files.

The OpenCL codes are used to implement data-parallel tasks. Upon launching the kernel in pFaces and after the driver configures the OpenCL codes, pFaces takes care of compiling the OpenCL code on the file to match the HW and the task. Then pFaces starts to execute a list of instructions provided by the driver. Such list specifies the order of execution between the task-parallel jobs and the data-parallel jobs, in pre-built C++ codes and the OpenCL codes just compiled on-the-fly. During the execution of the instruction, pFaces collects profiling info and later hand them to the user as a report.

The following figure shows what an example kernel driver can do to run a parallel program. All the boxes in the diagram represent calls pFaces to achieve the required action.