Skip to content

Latest commit

 

History

History
90 lines (52 loc) · 7.64 KB

Scior-Dataset-Test2.md

File metadata and controls

90 lines (52 loc) · 7.64 KB

Test 2 Files

This document describes the structure of the files generated by the execution of Scior Test 2. For a complete description of Test 2, please access this link.

Each section below regards a different type of file. The files here generated are equivalent to the ones generated in Test 1, hence here we are going to rely on the descriptions presented in that test's documentation, highlighting only the differences that the generated files may have. The nomenclature's definitions of the files presented here can be accessed in this link.

Test 2 was executed in both complete and incomplete Scior modes (more information about the Scior execution modes are presented in this link), generating the content presented in two folders inside each one of the catalog's datasets: tt002_ac and tt002_an, respectively. These folders contain the same type of documents here described, but with the different content that resulted from the different execution modes.

Contents

Execution Statistics csv Files

The execution statistics csv files generated by Test 2 have the same structure as the ones generated in Test 1 (i.e., all columns available in those files are also available in this test's ones), with the addition of the following column:

  • percentage: integer that indicates the percentage of the model's classes that were used as input for Test 2

To access the complete description of the execution statistics csv file in Test 1, click here.

This file is generated according to the pattern statistics_XXX_tt002_MM_txYYY_exZZZ_pcKKK.csv (e.g., statistics_aguiar2018rdbs-o_tt002_an_tx001_ex001_pc005.csv, for the first execution using 5 percent of classes as input of the taxonomy 001 of the dataset aguiar2018rdbs-o).

Execution Times csv Files

Just like the previous file, the column percentage is the only difference between the execution times file generated for Test 1 and Test 2. The new column indicates the percentage of input classes that were used for the test results in that line.

To access the complete description of the execution times csv file in Test 1, click here.

This file is generated according to the pattern times_XXX_tt002_MM_txYYY_exZZZ_pcKKK.csv (e.g., times_aguiar2018rdbs-o_tt002_an_tx001_ex001_pc005.csv, for the first execution using 5 percent of classes as input of the taxonomy 001 of the dataset aguiar2018rdbs-o).

Settings csv Files

This file extends the one presented in Test 1 (available here), registering not only information about the used hardware and software, but also the configurations set performing Test 2.

A single file entitled settings_XXX_tt002_MM.csv is generated once for each model's (e.g., settings_aguiar2018rdbs-o_tt002_an.csv).

In addition to the columns described for the Settings csv file in Test 1, the corresponding file for Test 2 is added with the following columns:

  • minimum_allowed_number_classes
  • percentage_initial
  • percentage_final
  • percentage_rate
  • number_of_executions_per_dataset_per_percentage

These columns correspond to the variables that must be set by the user before executing Test 2, as described here.

Considering that this file is created only once for each test in a dataset, it will always have only two lines: a first one with the headers and the second one with values.

Inconsistencies csv File

This file reports each execution that leads to an inconsistency in Test 2 in the dataset's taxonomy being evaluated. A single file is created for the whole test and the creation only occurs when inconsistencies are found. The csv file contains the following columns:

  • taxonomy_name: a string representing name of the taxonomy file in which the inconsistency was detected.
  • percentage: integer that indicates the percentage of the model's classes that were used as input for Test 2 when the inconsistency was found
  • execution_number: is the first column of the csv file. Registers the number of the execution of the test in which the inconsistency was detected

This file is generated according to the pattern inconsistencies_tt002_MM.csv (e.g., inconsistencies_tt002_ac.csv).

The Scior-Tester creates this file only when Test 2 detected at least one inconsistency during its executions.

Results yaml Files

There are no structural differences between the results yaml files generated for Test 1 and for Test 2. To access the complete description of the results yaml files in Test 1, click here.

This file is generated according to the pattern complete_XXX_tt002_MM_txYYY_exZZZ_pcKKK.yaml (e.g., complete_aguiar2018rdbs-o_tt002_an_tx001_ex001_pc005.yaml, for the first execution using 5 percent of classes as input of the taxonomy 001 of the dataset aguiar2018rdbs-o).

Results csv Files

There are no structural differences between the results csv files generated for Test 1 and for Test 2. To access the complete description of the results csv files in Test 1, click here.

This file is generated according to the pattern simple_XXX_tt002_MM_txYYY_exZZZ_pcKKK.csv (e.g., simple_aguiar2018rdbs-o_tt002_an_tx001_ex001_pc005.csv, for the first execution using 5 percent of classes as input of the taxonomy 001 of the dataset aguiar2018rdbs-o).

Knowledge Matrix csv Files

There are no structural differences between the knowledge matrix csv files generated for Test 1 and for Test 2. To access the complete description of the knowledge matrix csv files in Test 1, click here.

This file is generated according to the pattern matrix_XXX_tt002_MM_txYYY_exZZZ_pcKKK.csv (e.g., matrix_aguiar2018rdbs-o_tt002_an_tx001_ex001_pc005.csv, for the first execution using 5 percent of classes as input of the taxonomy 001 of the dataset aguiar2018rdbs-o).

Divergences csv Files

There are no structural differences between the Divergences csv files generated for Test 1 and for Test 2. To access the complete description of the Divergences csv files in Test 1, click here.

This file is generated in the catalog/ folder according to the pattern divergences_tt002_MM.csv (e.g., divergences_tt002_an.csv).