Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve testing infrastructure. #871

Merged
merged 1 commit into from
Dec 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 131 additions & 0 deletions doc/developer_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,3 +91,134 @@ release. Once this is done perform a final round of updates:
* Update BlueBrain Spack recipe to use the archive and not the Git commit.
* Update the upstream Spack recipe.

## Writing Tests
### Generate Multi-Dimensional Test Data
Input array of any dimension and type can be generated using the template class
`DataGenerator`. For example:
```
auto dims = std::vector<size_t>{4, 2};
auto values = testing::DataGenerator<std::vector<std::array<double, 2>>::create(dims);
Copy link
Collaborator

@tdegeus tdegeus Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use a library for this! For example, you can use the NumPy equivalent xtensor. Writing and maintaining some extra custom code is extra work and error-prone. I would say that testable data are as simple as

xt::xarray<int> a = xt::arange<int>(5 * 4 * 3).reshape({5, 4, 3}); 

Then you can use functions like ravel, ravel_index, ... just like in NumPy. No new API, no new code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to test the following:

template<class Container>
void check_write_dataset(const Container& container) {
  file.createDataSet("foo", container);
}

Hence, I need to able to create arrays of all supported types

std::vector<T>
std::vector<std::array<T, 4>>
std::array<std::vector<T>, 4>
boost::
// xt:: (once supported in core)

and so forth, and be able to fill them with exactly the correct values. Being able to have any container filled with sensible values, requires being able to a) allocate containers of a given shape, b) assign values. Those two feature make up a good chunk of the PR. Then there very minor niceties like copying, etc.

The second issue is that I need a somewhat uniform API for creating the product of template<class T> Container with a bunch of different scalar types T. That's the other large chunk of the PR.

How does an xt::array help in this situation? Yes, I can test XTensor, but I can't test writing of any of the other containers. If I use XTensor arrays to write, I can read the other containers, but I still can't check that the values are correct, because I don't have access to some API unifying trait to hide the difference of a[i][j], a(i,j) etc. (gets worse in 3D).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a pressing reason to use a custom implementation, of course, it's fine. Anyway, this is not a public implementation, nor will change a lot.

I just thought it would be easier to write some casters but avoid implementing (un)ravel(index) functions and equality operators.

```
Generates an `std::vector<std::array<double, 2>>` initialized with suitable
values.

If "suitable" isn't specific enough, one can specify a callback:
```
auto callback = [](const std::vector<size_t>& indices) {
return 42.0;
}

auto values = testing::DataGenerator<std::vector<double>>::create(dims, callback);
```

The `dims` can be generated via `testing::DataGenerator::default_dims` or by
using `testing::DataGenerator::sanitize_dims`. Remember, that certain
containers are fixed size and that we often compute the number of elements by
multiplying the dims.

### Generate Scalar Test Data
To generate a single "suitable" element use template class `DefaultValues`, e.g.
```
auto default_values = testing::DefaultValues<double>();
auto x = testing::DefaultValues<double>(indices);
```

### Accessing Elements
To access a particular element from an unknown container use the following trait:
```
using trait = testing::ContainerTraits<std::vector<std::array<int, 2>>;
// auto x = values[1][0];
auto x = trait::get(values, {1, 0});
Copy link
Collaborator

@tdegeus tdegeus Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the command above:

auto x = a(1, 0);

there is also

auto y = a.flat(10)

to read directly from the flat storage.

(comment applicable below)


// values[1][0] = 42.0;
trait::set(values, {1, 0}, 42.0);
```

### Utilities For Multi-Dimensional Arrays
Use `testing::DataGenerator::allocate` to allocate an array (without filling
it) and `testing::copy` to copy an array from one type to another. There's
`testing::ravel`, `testing::unravel` and `testing::flat_size` to compute the
position in a flat array from a multi-dimensional index, the reverse and the
number of element in the multi-dimensional array.

### Deduplicating DataSet and Attribute
Due to how HighFive is written testing `DataSet` and `Attribute` often requires
duplicating the entire test code because somewhere a `createDataSet` must be
replaced with `createAttribute`. Use `testing::AttributeCreateTraits` and
`testing::DataSetCreateTraits`. For example,
```
template<class CreateTraits>
void check_write(...) {
// Same as one of:
// file.createDataSet(name, values);
// file.createAttribute(name, values);
CreateTraits::create(file, name, values);
}
```

### Test Organization
#### Multi-Dimensional Arrays
All tests for reading/writing whole multi-dimensional arrays to datasets or
attributes belong in `tests/unit/tests_high_five_multi_dimensional.cpp`. This
includes write/read cycles; checking all the generic edges cases, e.g. empty
arrays and mismatching sizes; and checking non-reallocation.

Read/Write cycles are implemented in two distinct checks. One for writing and
another for reading. When checking writing we read with a "trusted"
multi-dimensional array (a nested `std::vector`), and vice-versa when checking
reading. This matters because certain bugs, like writing a column major array
as if it were row-major can't be caught if one reads it back into a
column-major array.

Remember, `std::vector<bool>` is very different from all other `std::vector`s.

Every container `template<class T> C;` should at least be checked with all of
the following `T`s that are supported by the container: `bool`, `double`,
`std::string`, `std::vector`, `std::array`. The reason is `bool` and
`std::string` are special, `double` is just a POD, `std::vector` requires
dynamic memory allocation and `std::array` is statically allocated.

Similarly, each container should be put inside an `std::vector` and an
`std::array`.

#### Scalar Data Set
Write-read cycles for scalar values should be implemented in
`tests/unit/tests_high_five_scalar.cpp`.

#### Data Types
Unit-tests related to checking that `DataType` API, go in
`tests/unit/tests_high_data_type.cpp`.

#### Selections
Anything selection related goes in `tests/unit/test_high_five_selection.cpp`.
This includes things like `ElementSet` and `HyperSlab`.

#### Strings
Regular write-read cycles for strings are performed along with the other types,
see above. This should cover compatibility of `std::string` with all
containers. However, additional testing is required, e.g. character set,
padding, fixed vs. variable length. These all go in
`tests/unit/test_high_five_string.cpp`.

#### Specific Tests For Optional Containers
If containers, e.g. `Eigen::Matrix` require special checks those go in files
called `tests/unit/test_high_five_*.cpp` where `*` is `eigen` for Eigen.

#### Memory Layout Assumptions
In HighFive we make assumptions about the memory layout of certain types. For
example, we assume that
```
auto array = std::vector<std::array<double, 2>>(n);
doube * ptr = (double*) array.data();
```
is a sensible thing to do. We assume similar about `bool` and
`details::Boolean`. These types of tests go into
`tests/unit/tests_high_five_memory_layout.cpp`.

#### H5Easy
Anything `H5Easy` related goes in files with the appropriate name.

#### Everything Else
What's left goes in `tests/unit/test_high_five_base.cpp`. This covers opening
files, groups, dataset or attributes; checking certain pathological edge cases;
etc.
70 changes: 70 additions & 0 deletions tests/unit/create_traits.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
#pragma once

namespace HighFive {
namespace testing {

/// \brief Trait for `createAttribute`.
///
/// The point of these is to simplify testing. The typical issue is that we
/// need to write the tests twice, one with `createDataSet` and then again with
/// `createAttribute`. This trait allows us to inject this difference.
struct AttributeCreateTraits {
using type = Attribute;

template <class Hi5>
static Attribute get(Hi5& hi5, const std::string& name) {
return hi5.getAttribute(name);
}


template <class Hi5, class Container>
static Attribute create(Hi5& hi5, const std::string& name, const Container& container) {
return hi5.createAttribute(name, container);
}

template <class Hi5>
static Attribute create(Hi5& hi5,
const std::string& name,
const DataSpace& dataspace,
const DataType& datatype) {
return hi5.createAttribute(name, dataspace, datatype);
}

template <class T, class Hi5>
static Attribute create(Hi5& hi5, const std::string& name, const DataSpace& dataspace) {
auto datatype = create_datatype<T>();
return hi5.template createAttribute<T>(name, dataspace);
}
};

/// \brief Trait for `createDataSet`.
struct DataSetCreateTraits {
using type = DataSet;

template <class Hi5>
static DataSet get(Hi5& hi5, const std::string& name) {
return hi5.getDataSet(name);
}

template <class Hi5, class Container>
static DataSet create(Hi5& hi5, const std::string& name, const Container& container) {
return hi5.createDataSet(name, container);
}

template <class Hi5>
static DataSet create(Hi5& hi5,
const std::string& name,
const DataSpace& dataspace,
const DataType& datatype) {
return hi5.createDataSet(name, dataspace, datatype);
}

template <class T, class Hi5>
static DataSet create(Hi5& hi5, const std::string& name, const DataSpace& dataspace) {
auto datatype = create_datatype<T>();
return hi5.template createDataSet<T>(name, dataspace);
}
};

} // namespace testing
} // namespace HighFive
Loading