Skip to content

Framework Design

Erik Smistad edited this page Jun 12, 2015 · 41 revisions

This page is currently outdated, but will be updated soon!

This page contains a description of FAST.

Contents


Overview

The framework consists of six main layers as illustrated with colors in the figure above. The bottom layer is the actual hardware, i.e. the CPUs and GPUs. The second layer consists of drivers for this hardware and are provided by the hardware manufacturers. Next is the library layer in turquoise which include several libraries that are needed in the framework. The libraries in this layer are:

  • OpenCL - An open standard for parallel programming on heterogeneous systems, including multi-core CPUs, PUs, and FPGAs. It is supported by most processor manufacturers including AMD, NVIDIA and Intel.
  • OpenGL - A cross-platform library for visualization.
  • GL Extension Wrangler (GLEW) - A library for handling OpenGL extensions.
  • Eigen - A fast cross-platform linear algebra library.
  • Qt - A cross-platform graphical user interface (GUI) toolkit
  • Boost - A C++ utility library

Next is the core of the framework which is split into several groups:

  • Data objects - Objects for data (both static and dynamic) such as images and meshes, which enables the synchronized processing of such data on a set of heterogeneous devices.
  • Importers/Exporters - Data import and export objects for different formats such as MetaImage (.mhd), raw, ITK and VTK.
  • Streamers - Objects that enable streaming of data
  • Algorithms - A set of commonly used filtering, segmentation and registration algorithms.
  • Visualization - A set of renderers such as image, segmentation, volume, slice and mesh renderers.
  • Tests - A set of tests for the framework which ensures that all parts of the framework are working properly.
  • Benchmarks - Mechanisms for measuring, assimilating and reporting the runtime of all operations in the framework.

Finally, on top is the application. The framework may be both a stand-alone application which enables benchmarking and tests of a heterogeneous system and an external library for another medical image computing application.

The execution pipeline

FAST uses a demand-driven execution pipeline similar to what is used in ITK and VTK. This entails that each processing step is first linked together to form a pipeline. Pipelines are not executed until the update method is called on one of the processing objects in the pipeline. This can be done in two ways:

  • Explicitly by calling the update method on a object in the pipeline.
  • Implicitly by a renderer which will call update on its input connections repeatedly.

A pipeline consists of process objects which extend the abstract base classes ProcessObject. A process object performs processing and may have zero, one or several input connections which connect it to other process objects. Most process objects produce data objects which extend the abstract base class DataObject. These data objects can be added to output ports, which are used to connect a process object to the input ports of other process objects. The following example shows how a process object A is connected to a process object B in FAST.

processObjectA->setInputConnection(processObjectB->getOutputPort());

In this example, process object A uses the output of process object B as input.

Similar to the newest version of VTK (version 6), FAST uses a pipeline where the data objects are not explicitly part of the pipeline. The figure below shows an example of a simple pipeline with these two types of objects.

The data objects have an internal timestamp (positive integer) which is always updated when the data is changed. Each process object has a list of timestamps for each input connection representing which version of the data objects was used the last time the process object was executed. In addition, each process object has a flag indicating whether it has been modified or not. This could be a parameter or input change.

When the update method is called on a process object it will first call update on all input connections which again will call update on their input connections. Thus update will be called on all object backwards in the pipeline until a process object with no input connection is encountered (i.e. an importer object). If a process object is either modified or one of its parent data objects have changed timestamps the object will re-execute by calling its execute method. Thus each process object will implement its own execute method, while the update method is the same for each process object.

Data organization on heterogeneous devices

Data organization and synchronization is one of the key components in this framework as everything will be built on top of it. Image data is represented by an object called Image which are used for both 2D and 3D image data. These image objects represents an image on all devices (CPUs, GPUs etc.) and its data is guaranteed to be coherent on any devices after being altered. Thus, if an image is changed on one device it will also be changed on the other devices before the data is used on those devices.

Images of different data types

Medical images are represented in different formats. Some common examples are: Ultrasound (unsigned 8 bit integer), CT (signed/unsigned 16 bit integer) and MR (unsigned 16 bit integer).

The framework currently supports the following data formats for images:

  • TYPE_FLOAT - 32 bit floating point number
  • TYPE_UINT8 - 8 bit unsigned integer
  • TYPE_INT8 - 8 bit signed integer
  • TYPE_UINT16 - 16 bit unsigned integer
  • TYPE_INT16 - 16 bit signed integer

An image can also have multiple channels, or components, and currently 1-4 channels are supported.

Data access

Two forms of data access are possible in the framework: 1) Read-only, 2) Read and write. The general rule is that several devices can perform read-only operations on an image coherently. However, if any device needs to write to an image, only that device can have access to the image at that time. This is to ensure data coherency across devices. Thus, if a device wants to write to an image, it has to wait for all other operations on that image to finish. And when it is writing to the image, no other devices can read or write to that image.

To facilitate this, several DataAccess objects are introduced:

  • OpenCLBufferAccess - provides access to an OpenCL buffer of the image. The pixels are stored as x+ywidth (2D) and x+ywidth+zwidthheight (3D)
  • OpenCLImageAccess2D - provides access to an OpenCL 2D image object. Not valid for 3D images.
  • OpenCLImageAccess3D - provides access to an OpenCL 3D image object. Not valid for 2D images.
  • ImageAccess - provides access to an data array (void ) on the host. The pixels are stored as x+ywidth (2D) and x+ywidth+zwidth*height (3D).

These objects are created by calling one of the following methods on the Image object:

  • getOpenCLBufferAccess()
  • getOpenCLImageAccess2D()
  • getOpenCLImageAccess3D()
  • getImageAccess()

Arguments to these methods are which device wants to access the image and what type of access is desired (Read-only or Read-and-write). From these objects an OpenCL Image (texture on a GPU) or Buffer object can retrieved which is needed to perform OpenCL computations on the image or pointer to main memory can be request for doing processing on the CPU using C++. The DataAccess objects also have methods for releasing the access, thus enabling other devices to perform write operations on the image. The access will also be released in the destructor of this object to avoid programmers creating a deadlock. When the access is released, the OpenCL Image/Buffer object pointer is invalidated to ensure that the program can no longer manipulate the data. However, this does not delete the actual data on the device. When access to an object is requested, the framework will check that any previous access objects has been released. If not an exception should be thrown. This is ensure that the user doesn't have multiple access to one image object at the same time.

Data change

Every time an image is changed on a device, the change should be reflected on the other devices as well. However, this doesn't have to be done immediately after the change is finished. The updating of data can be done the next time the data is requested on another device. This is often referred to as lazy loading. The benefit of lazy loading is that the number of data transfers can be reduced. However, the drawback is that there will be a transfer cost the next time some processing has to be performed on another device which doesn't have the updated data.

Thus, each image object has a set of flags indicating whether the data (in the form of OpenCL buffers and images and data array) is up to date for each device. When one device has changed some data, these are set to false for all devices except the device in which the change was performed. Next time the data is requested on a device, the flag is checked and if is false a data transfer will start and the flag will be set to true for that device.

Data streaming

Streamers are objects that provide access to dynamic data. This can be for instance real-time images from an ultrasound probe or a series of images stored on disk. The output of streamer objects is an object called DynamicData which has a method called getNextFrame() which returns a data object. The streamers read data into the DynamicData object in a separate thread so that processing and data streaming can be performed concurrently. The process objects in FAST are design to handle both static and dynamic/temporal data without any change to the code.

Streamers have three different streaming modes:

  • STREAMING_MODE_NEWEST_FRAME_ONLY - This will only keep the newest frame in the DynamicData object, so that its size is always 1.
  • STREAMING_MODE_PROCESS_ALL_FRAMES - This will keep all frames in the DynamicData object, but will remove the frame from the object after it has been returned by the getNextFrame() method.
  • STREAMING_MODE_STORE_ALL_FRAMES - This will store all frames in the DynamicData object.

Visualization

The Qt framework is chosen as the graphical user interface (GUI) because:

  • Popular C++ framework, also in the medical domain.
  • Cross-platform. Supports Windows, Linux and Mac.
  • Supports multi-threading. (However, the Qt main/event loop is limited to be run in the main thread)
  • Allows creating widgets (QGLWidget) that can be rendered to directly by OpenGL.
  • Supports event handling (keyboard and mouse).
  • Object oriented (C++).

Currently, the framework has two types of built-in windows SimpleWindow and DualViewWindow. SimpleWindow has only one view, while the DualViewWindow has two views. A view shows one visualization and may either be 2D or 3D. All windows are implemented using a QWidget, while the View object is based on the QGLWidget which is a widget that may be rendered to by OpenGL. The rendering is done by renderer objects. Several different types of renderers are available in the framework:

  • Image renderer - For displaying 2D images.
  • Segmentation renderer - For displaying an image segmentation in 2D.
  • Mesh renderer - Renders a mesh of triangles
  • Slice renderer - Extracts data from a volume in an arbitrary plane using trilinear interpolation.
  • Volume renderer - Creates an image of a volume using ray casting.

Tests

FAST also comes with a large set of unit and system tests, these are all located in the Tests folder and use data which can be downloaded separately. The testing framework Catch is used which only consist of a single header file (catch.hpp). FAST is compiled and tested continuously as new code is contributed to its git repository. This is done on three systems with different CPUs, GPUs and operating systems to ensure that the framework is working properly on different platforms and hardware.

Clone this wiki locally