Skip to content

Configure The Test Suite

Emmanuel Mathot edited this page Mar 18, 2024 · 13 revisions

The command line tools rely on a configuration file in YAML format in which information about the various providers is configured. The YAML format is hierarchical and this page will describe each of the nodes for every level.

The software suite contains a sample configuration file that is ready to use apart from the user credentials which have to be replaced with correct ones.

The following sections explain he configuration which consists of two main nodes:

global node

The following settings have a general scope and apply across providers.

  • country_shapefile_path: Path to the shapefile containing country borders; used for building complex queries using country coverage in TC202 and TC212.

The following settings are related to processing-related test scenarios and apply across providers.

  • docker_config: The location of the Docker configuration file (config.json) containing the authentication. This file is required to authenticate with the CDAB Docker repository to install the image for the testing on the virtual machine. This type of file can be obtained by running the following command and authenticating with the repository username and password.
  • connect_retries: Number of attempts to connect to virtual machines via SSH.
  • connect_interval: Interval between attempts to connect to virtual machines via SSH (in seconds, fraction also possible).
  • ca_certificate: An array of optional certificates to be installed on virtual machines used for processing.
  • max_retention_hours: The number of hours after which previously created virtual machines are considered idle (in case they were not deleted properly during test execution, check takes place before creating a new VM on a service provider).

service_providers node

Under this node, each key correspond to a configured Target Site in the benchmark. Each section has a key node that should be identified after the Target Site. Under that key node there are several node for settings related to that provider.

  • max_catalogue_thread: Maximum number of thread querying the target catalogue service in parallel.

  • max_download_thread: Maximum number of thread downloading the target download service in parallel.

  • data: A node with further settings related to catalogue search and download.

  • compute: A node with further settings related to remote execution of test scenarios.

  • storage: A node with further settings related to storage of user-produced data.

The following sections explain the more complex nodes more in detail.

service_providers.<name>.data node

These settings regard the data access of each provider. They must be comprehensive and thus may be complex, especially the definition of collections. It is recommeded to use the sample configuration file and adjust only the credentials. These settings are relevant for all Test Scenarios.

  • url: Data Access Entry Point, usually the URL to the catalogue.

  • credentials: Data Access Credentials, usually in the form username:password or apiKey.

  • s3_key_id: Data Access Object Storage key id used for download using S3 protocol.

  • s3_secret_key: Data Access Object Storage secret key used for download using S3 protocol.

  • catalogue: Data Access Catalogue configuration of the Target Site. This section defines all the catalog and collections available at the Target Site and used for the benchmark.

service_providers.<name>.data.catalogue node

This section has a unique node sets that declares all the Catalogue Sets of the Target Sites. in this section, each set is identified by an arbitrary key and the is additionally classified according to the set type:

  • type: defines the set type.
    • baseline: The catalogue collections' set refers to Common Baseline Sets defined globally in the data node. It requires the reference_set_id setting to set the Baseline set identifier. Those sets are used in TC5O1, TC5O2, TC6O1 & TC6O2).
    • local: The catalogue collections' set is locally managed by the Target Site and defines the Online data sets. Those sets are used is TC503 to measure the Data Offer Consistency.
  • reference_set_id: (Mandatory for type baseline) The reference set id link the current set with a globally defined Common Baseline Sets.
  • reference_target_site: The reference Target Site for the Comparative Catalog Function of the Test Case. This is the key used to defined the Target Site in the service_providers node.

service_providers.<name>.compute node

These settings configure the processing within the cloud infrastructure of the service providers. They are relevant for Remote Test Scenarios.

The cdab-remote-client uses OpenStack which all DIAS providers support. Most of the values for the various keys can be obtained from the OpenStack dashboard of the cloud environment in question (by inspecting the file clouds.yaml that can be downloaded under API Access > Download OpenStack RC File > OpenStack clouds.yaml File), others have to be set with knowledge of concrete items that are configured on the cloud environment in question.

  • connector: The value has to be openstack to which it defaults (making it optional in this case).
  • auth_url: Authentication access point. Obtain value from auth_url key in clouds.yaml.
  • username: Cloud username (same username as for access to the OpenStack dashboard).
  • password: Cloud password (password for user).
  • project_id: Project ID. Obtain value from project_id key in clouds.yaml. This setting is optional.
  • project_name: Project name. Obtain value from project_name key in clouds.yaml.
  • user_domain_name: User domain name. Obtain value from user_domain_name key in clouds.yaml.
  • region_name: Authentication region name. Obtain value from region_name key in clouds.yaml. This setting is optional.
  • interface: Interface. Obtain value from interface key in clouds.yaml.
  • identity_api_version: Identity API version, Obtain value from identity_api_version key in clouds.yaml.
  • volume_api_version: Volume API version (set value to 2 if version 3 is not supported).
  • key_name: Name of predefined public key for SSH connection to new virtual machine. On the OpenStack dashboard, key pairs can be created on the OpenStack dashboard and the private key can be downloaded, check under Compute > Key Pairs.
  • image_name: Name of image to be used for new virtual machine. On the OpenStack dashboard, choose from Compute > Images.
  • flavor_name: Name of flavour (hardware characteristics) for new virtual machine. On the OpenStack dashboard, check under Compute > Instances > Launch Instance > Flavours/Flavors. The value can also be an array, e.g. ['flavourA', 'flavourB']
  • network_name: Name of network to which new virtual machine is connected. On the OpenStack dashboard, check under Network > Networks. This setting is optional. The value can also be an array, e.g. ['networkA', 'networkB']
  • security_group: Name of security group for new virtual machine (optional; this setting might be necessary in order to permit remote access to virtual machines). On the OpenStack dashboard, check under Network > Security groups, if available. This setting is optional.
  • floating_ip: Explicitly assign floating IP (set this to True if public IP addresses are not assigned automatically at the creation of a virtual machine and otherwise to False). On the OpenStack dashboard, check under Network > Floating IPs, if available.
  • floating_ip_network: Network from which to assign floating IP. This setting is optional.
  • private_key_file: Location of the private key file for SSH connections to virtual machine (must correspond to public key in key_name).
  • remote_user: User on virtual machine for SSH connections.
  • use_volume: Create an external volume for docker image and test execution; this is useful for flavours that have very limited main disk. The size of the additional disk is 100 GB.
  • private_key_file: Location of the private key file for SSH connections to virtual machine (must correspond to public key in key_name).
  • remote_user: User on virtual machine for SSH connections.
  • vm_name: Preferred name of virtual machines to be created (sequential number is appended).
  • cost_monthly: Monthly cost for VM of specified flavour (instance type or machine type, see sections below). The default is 0. If there is more than one flavour, the value has to be an array of the same size.
  • cost_hourly: Hourly cost of VM of specified flavour. The default is 0. If there is more than one flavour, the value has to be an array of the same size.
  • currency: Payment currency. The default is EUR.

service_providers.<name>.storage node

These settings configure storage access for the service providers. They are relevant for TS07.

For OpenStack-compatible storage offers, the following settings can be used (same as in the service_providers.<name>.storage section):

  • auth_url: Authentication access point.
  • username: Cloud username.
  • password: Cloud password (password for user).
  • project_id: Project ID.
  • project_name: Project name.
  • user_domain_name: User domain name.

For storage offers supporting the S3 protocol, the following settings have to be used instead:

  • s3_service_url: The service URL for S3 access.
  • s3_key_id: The S3 key ID, a combination of numbers and letters, usually 20 characters long.
  • s3_secret_key: The corresponding S3 secret key, another string consisting of numbers, letters and slashes, usually 40 characters long.

In all cases, settings for the size range of the test content can be made:

  • min_upload_size: Minimum size of generated file to upload (and subsequently download), in MB.
  • max_upload_size: Maximum size of generated file to upload (and subsequently download), in MB.

data node

This global section defines the It has a unique node sets that declares all the Common Baseline Sets of the benchmark. in this section, each set is identified by an arbitrary key used then globally among all service_providers.<name>.data.catalogue node as a reference.

Common Configuration Nodes

This section describes common configuration node.

Catalogue or Data set

This configuration section define a set of data collections. It is used globally in the data node to declare all the Common Baseline Sets of the benchmark and in the [service_providers.<name>.data.catalogue node](#serviabout

💡 More information about the sets and their relationships in the Catalogue Sets page.

Each value of the sets dictionary contains:

  • name: Name and label of the set
  • collections: this dictionary section declares all the Data collection of the set. Each collection is identified by an arbitrary key.
  • parameters: this array section declares all the OpenSearch Parameters common to all collections.

Data collection

Each value of the collections dictionary contains:

  • label: Label of the collection used in the benchmark reporting to label the collections metrics
  • parameters: this array section declares all the OpenSearch Parameters to build the catalogue query that defines the collection

OpenSearch parameter

Each collection in a set is defined by the filters used to build the catalogue query towards the Target Site. Since all target Sites do not used the same filters, the OpenSearch and its extensions are used as a common standard to define those filters. The Test Suiste then maps the filters accordingly for each Target Site's data access catalogue search function.

Each item of the parameters array contains:

  • key: unique key in the array identifying the filter
  • full_name: OpenSearch Fully Qualified Name of the filter in the form {namespace}name. The following namespaces and their filters are supported:
  • value: value of the filter
  • label: label of the filter used in the benchmark reporting to label the metrics.