The Scior-Tester Test 2 evaluates the amount of information inferred through successively increasing the quantity of input provided.
Like in Test 1, only eligible classes are counted and used as possible inputs in Test 2. Eligible classes are the ones with a gUFO ontological category different from the string "other" (click here for more information).
All tests implemented in the Scior-Tester execute Scior using it as a package. The tests call the Scior function run_scior_tester
. The called function is a lighter version of the regular Scior function: it implements the same rules and have the same logics, however it does not allow the use of all arguments and does not present information (both at the screen or in files).
The aim of this test is to evaluate how information discovery varies with the increasing amount of input provided. Through the analysis of the generated dataset, a user must be able to answer the following questions for general research:
- On average, how much information a user must provide to a model so that she/he can have more inferred information than asserted information? I.e., at which point does a user has more information discovered than provided?
- On average, how much information a user must provide to a model so that she/he can have all information about a model completely known?
- On average, how much information a user must provide to a model to discover if it is consistent or not? I.e., at which point a user can detect that a model is inconsistent?
Regarding utility for specific models, the analysis of the data resulting from Test 2 can answer questions like:
- Were inconsistencies or incompleteness found in the model?
- Do the inferred classifications match the original classifications?
The Tester does not perform any analysis, it only provides the data to be analyzed further by interested users. We provide the resulting files generated by an execution of Test 2 in a dedicated repository created for hosting this dataset. Any user can access this dataset using this link. Please note that, differently from Test 1, where the execution is always going to generate the same resulting data, in Test 2 different executions are going to generate a different dataset to be evaluated, even when using the same configuration, because of its random selection of the input classes.
Test 2 requires the setting variables for its execution. First, the user must inform if the Tester must consider the models to be tested as complete or incomplete (more information in this link (create link)), which is done by editing the variable is_complete
in the file __init__.py
file (link to be created). For the Tester, the Scior's configuration is_automatic
is always set as true.
The other variables to be configured in the same file are:
MINIMUM_ALLOWED_NUMBER_CLASSES
: an integer corresponding to the minimum number of classes that a taxonomy must have to be tested. I.e., if the taxonomy does not have this value, it is skipped by Test 2PERCENTAGE_INITIAL
: a float number corresponding to the first percentage of classes used as input to be tested (e.g., 10, corresponding to 10% of the classes of a taxonomy)PERCENTAGE_FINAL
: a float number corresponding to the last percentage of classes used as input to be tested (e.g., 90, corresponding to 90% of the classes of a taxonomy)PERCENTAGE_RATE
: a float number that represents the rate at which thePERCENTAGE_INITIAL
is increased up to reach thePERCENTAGE_FINAL
value (e.g., 10, corresponding to an increase of 10% for each execution round).NUMBER_OF_EXECUTIONS_PER_DATASET_PER_PERCENTAGE
: an integer representing the number of times the same execution (i.e., with a same percentage of input) is performed using the same input quantity, but using different input classes (e.g., 5 executions)
As an example, consider the following example:
- Two datasets A and B, with A having two taxonomies A1 and A2 with 5 and 20 eligible classes, respectively, and with B having a single taxonomy B1 with eligible 30 classes
- The following values for the configuration variables:
MINIMUM_ALLOWED_NUMBER_CLASSES
= 10,PERCENTAGE_INITIAL
= 10,PERCENTAGE_FINAL
= 90,PERCENTAGE_RATE
= 20,NUMBER_OF_EXECUTIONS_PER_DATASET_PER_PERCENTAGE
= 5
Here, we have:
- The following percentages of inputs are going to be executed: 10, 30, 50, 70, and 90
- For each of these five percentages, the Scior is going to be executed 5 times with different classes as input
- A1 will not be tested, as it has only 5 eligible classes, a number that is inferior to the
MINIMUM_ALLOWED_NUMBER_CLASSES
- A2 is going to be tested with the following quantity of input for each percentage: 2 (10%), 6 (30%), 10 (50%), 14 (70%), and 18 (90%)
- B1 is going to be tested with the following quantity of input for each percentage: 3 (10%), 9 (30%), 15 (50%), 21 (70%), and 27 (90%)
- For 10%, A2 is going to be executed 5 times with different sets of two classes as inputs (e.g., classes A11 and A12 in execution 1, classes A11 and A15 in execution 2, classes A12 and A13 in execution 3, etc.)
In this example, the percentages of classes were not float values, but integers. If the resulting percentage of classes is a decimal number, it is going to be rounder to its nearest integer value.
For each possible completeness configuration (complete or incomplete), the Tester creates a different folder inside each dataset folder for storing the test's resulting files. The folders created by Test 2 are tt002_ac (when is_complete
equals true) and tt002_an (when is_complete
equals false).
Test 2 is executed for each taxonomy (of all datasets) that has more eligible classes than MINIMUM_ALLOWED_NUMBER_CLASSES
.
For every taxonomy (ttl files generated by the build function), the Tester creates a list with valid input classes (the eligible classes) in the same way Test 1 does (click here for more information). The input list size is got from the current value of the "percentage" that is going to be tested. If the resulting percentage of classes is a decimal number, it is going to be rounder to its nearest integer value using the Python function round
.
The current tested percentage value is executed NUMBER_OF_EXECUTIONS_PER_DATASET_PER_PERCENTAGE
times and, for each of these times, a new input list is going to be created (with the same size but containing different elements). The first percentage being tested is PERCENTAGE_INITIAL
. After finishing the test of the current percentage, the software increases the current percentage value in PERCENTAGE_RATE
percentage points and it repeats this process up to when the current percentage value equals PERCENTAGE_FINAL
.
If Scior reports at least one inconsistency, the Tester registers it in a specific file (more information in this link). After that, the tester interrupts the current execution and starts the next one.
The execution of Test 2 generates of the following files:
- Execution statistics csv files
- Execution times csv files
- Inconsistencies csv File
- Results yaml files
- Results csv files
- Knowledge Matrix csv files
- Divergences csv files
These files are going to be displayed in a structure similar to the one created by Test 1, which is represented here. Note, however, that the structure is not going to contain Execution Summary csv Files and that the file's names are going to contain the substring tt002 instead of tt001.
You can find the complete description of all output files generated in Test 2 by accessing its corresponding page at the repository with the Scior tests resulting datasets.
It is necessary to first execute the Tester's build function to create the structure for the tests. For running the Scior-Tester's build function, follow the instructions provided in this link.
For executing the Scior-Tester Test 2, first you need to set the configuration's values in the file __init__.py
. After that, use the following command:
python ./src/scior_tester.py -r2
Note: the instructions here provided may not work properly in the Tester's current implementation—please refer to issue #14.