Dataset with results of Scior tests using the Scior-Tester automation tool performed on the OntoUML/UFO Catalog.
The Scior-Dataset is composed of files with results of Scior tests performed via the Scior-Tester on the OntoUML/UFO Catalog.
The FAIR Model Catalog for Ontology-Driven Conceptual Modeling Research, short-named OntoUML/UFO Catalog, is a structured and open-source catalog that contains OntoUML and UFO ontology models. The catalog was conceived to allow collaborative work and to be easily accessible to all its users. Its goal is to support empirical research in OntoUML and UFO, as well as for the general conceptual modeling area, by providing high-quality curated, structured, and machine-processable data on why, where, and how different modeling approaches are used. The catalog offers a diverse collection of conceptual models, created by modelers with varying modeling skills, for a range of domains, and for different purposes.
The tests were performed using the automation tool named Scior-Tester, which runs over Scior. Scior is the abbreviated name for Identification of Ontological Categories for OWL Ontologies, a software that aims to support the semi-automatic semantic improvement of lightweight web ontologies. We aim to reach the referred semantic improvement via the association of gUFO—a lightweight implementation of the Unified Foundational Ontology (UFO)—concepts to the OWL entities. The aim of gUFO is "to provide a lightweight implementation of the Unified Foundational Ontology (UFO) suitable for Semantic Web OWL 2 DL applications".
This document presents the structure of the files generated during the Scior-Tester execution. For a complete comprehension of the tests (regarding scope, objectives, implementation, etc.), please refer to the Scior-Tester description file.
The aim of the publication of the resulting datasets is to share with the community data that can be analyzed in different ways, even though all executed tests are totally reproducible.
- Nomenclature of Files and Folders
- Build Generated Files
- Tests – Generated Files and their Descriptions
- Related Respositories
- Contributors
- Acknowledgements
For avoiding long names for files and directories, all content available in the datasets in this repository follows the nomenclature here presented:
- Numbers with up to three digits are always presented with three digits (e.g., 001). Numbers higher than three digits must be presented without additional digits
- All numbers must be attached directly to its corresponding item (e.g., test, execution, etc.)
- The following words must be changed for the corresponding simplifications:
- test: tt
- taxonomy: tx
- execution: ex
- percentage: pc
- The Scior parameters must be represented using the following simplifications:
- automatic: a
- interactive: i
- complete: c
- incomplete: n
- The automation parameter (a or i) must come first, and the completion parameter must follow it (c or n)
- The parameters must be displayed integrated (e.g., ac, in, etc.)
- Files names must be without spaces, which must be substituted by hyphens
- Separation between different items in the file name must be done using underlines
- The following item order must be used whenever possible: file name, dataset name, test name/number, test parameters, taxonomy number, execution number, percentage number
The Scior-Tester creates a directory for each one of the catalog's datasets that are tested. Each directory contains other folders with the results of the tests that were performed, but they also contain two different files generated by the Scior-Tester to be used as input for the tests. For generating these files, the Tester decomposes the original taxonomy from a dataset in its (possibly multiple) independent taxonomies (isolated group of classes related via specialization/generalization relations between each other). Both files are presented in this document, as well as a hashes register file.
Each XXX_txYYY.ttl
file (with XXX being the dataset name and YYY ranging from 001 to the number of independent taxonomies available in the dataset's OntoUML model) contains an isolated taxonomical graph in OWL (in turtle syntax) got from the OWL taxonomy provided in the catalog's dataset to be tested. An example of a generated taxonomy file is: aguiar2018rdbs-o_tx001.ttl.
For instance, a single model that has two not connected hierarchical structures of concepts will generate two files, each one containing only the following properties: rdfs:subClassOf
, owl:Class
, and rdf:type
.
For generating the concept's URIs, the Scior-Tester uses the following namespace for all taxonomies generated for all datasets: http://taxonomy.model/
Each data_XXX_txYYY.csv
file (with XXX being the dataset name and with YYY ranging from 01 to the number of independent taxonomies available in the dataset's OntoUML model) contains information about all the classes that are part of the taxonomical graph with the corresponding number (i.e., the file data_aguiar2018rdbs-o_tx001.csv refers to the taxonomy saved in the file aguiar2018rdbs-o_tx001.ttl). The difference between the results of a test and the inputted data should use this file, as it contains the source data.
The generated csv file contains the following columns:
class_name
: name of the OntoUML class as it is in the original model (i.e., without namespace)ontouml_stereotype
: the class's OntoUML stereotype as was attributed by its modelergufo_classification
: the class's OntoUML stereotype mapped to a gUFO endurant type (click here for more information)is_root
: Boolean value that shows if the class is a root node in the taxonomical graph (i.e., if it has no superclasses)is_leaf
: Boolean value that shows if the class is a leaf node in the taxonomical graph (i.e., if it has no subclasses)is_intermediate
: Boolean value that shows if the class is an intermediate node in the taxonomical graph (i.e., if it has subclasses and superclasses)number_superclasses
: the sum of the number of all direct and indirect superclasses that the class havenumber_subclasses
: the sum of the number of all direct and indirect subclasses that the class have
As every class must be a root, a leaf, or an intermediate node, note that this file would be inconsistent if:
- is_root OR is_leaf OR is_intermediate != True, or if
- is_root AND is_leaf AND is_intermediate != False
This file, named taxonomies.csv
, contains information about all taxonomies created in all datasets during the build function. The aim of this file is to display information to the user in a simple way so she/he can analyze it for creating tests or manipulating tests’ results.
The generated csv file contains the following columns:
taxonomy_name
: a string with the name of the dataset file (e.g., abrahao2018agriculture-operations_tx001.ttl)dataset_name
: a string with the dataset that contains this taxonomy (e.g., abrahao2018agriculture-operations)num_mapped_classes
: an integer representing the number of classes that the taxonomy has that have classifications different than the string "other"num_other_classes
: an integer representing the number of classes that the taxonomy has that are classified with the string "other"num_classes
: an integer representing the number of classes that the taxonomy has
Note that the sum of num_mapped_classes
and num_other_classes
must equal num_classes
. These fields classifications are related to the mapping process (described here).
A single taxonomies.csv
file, located in the /catalog
folder is created after the build function is completed.
For traceability, the Scior-Tester provides a function for generating a SHA256 hash of its generated files and of the files that originated them. The whole dataset contains a single csv register file named hash_sha256_register.csv
, containing four columns of data that are incremented every time the Tester creates new files. The columns are:
file_name
: complete path of the file being hashedfile_hash
: SHA256 hash of the filesource_file_name
: file used as a source for the generation of the file being hashedsource_file_hash
: SHA256 hash of the source file
We could cite as an example of use of this file the case where a user would like to know if he is using the same source data for generating his results, so he can get the SHA256 hash of the files she/he is using check if it exists in the hashes register file.
Currently, datasets generated from the execution of two tests are available. Please use the following links for accessing the tests descriptions and results.
- Scior: software for identification of ontological categories for OWL ontologies.
- Scior-Tester: used for automating tests on Scior.
- Scior-Dataset: contains data resulting from the Scior-Tester.
- OntoUML/UFO Catalog: source of models used for the performed tests.
- PhD. Pedro Paulo Favato Barcelos [GitHub] [LinkedIn]
- PhD. Tiago Prince Sales [GitHub] [LinkedIn]
- MSc. Elena Romanenko [GitHub] [LinkedIn]
- Prof. PhD. Giancarlo Guizzardi [LinkedIn]
- MSc. Gal Engelberg [GitHub] [LinkedIn]
- MBA Dan Klein [GitHub] [LinkedIn]
This work is a collaboration between the Free University of Bozen-Bolzano, the University of Twente, and Accenture Israel Cybersecurity Labs.