This maven project can generate a description logics gold standard to evaluate knowledge graph embeddings in terms of their ability to learn specific logical expressions. Multiple options exist to generate a gold standard. For the evaluation, this framework can be used.
Version | Name | Location | Based On | Size Classes | Other Parameters |
---|---|---|---|---|---|
v1 | dbpedia | /results/v1/dbpedia | DBpedia 2021-09 | 50, 500, 5000 | - |
v1 | synthetic_ontology | /results/v1/synthetic_ontology | - |
1000 |
|
Given multiple SPARQL queries defined in the query directory, this project generates a gold standard that can be used for machine learning extensions.
The gold standard is intended to be used with a DBpedia embedding.
You can find the current version of the gold standard in the results directory.
The DBpedia SPARQL GS (see above) is a real gold standard but it is not perfect. The synthetic gold standard option allows for generating synthetic "lab-grown" graphs for precisely evaluating knowledge graph embeddings.
There are three flavors: Random graph generation (very expensive), constructed graph generation (in easy and hard), and ontology-based graph generation (recommended).
You can find the current version of the constructed gold standards the results directory.
Test Case Collection
The first directory order in the results directory is called test case collection
in the implementation. Examples for a test case collection are tc01
, tc02
.
A test case collection is a collection of multiple test case groups (see below).
All test cases in a test case collection test for the same DL construct.
Test Case Group
A test case group is a domain-bound group of test cases. Examples for a test case
group are people
or movies
.
Note that the synthetic results only have one test case group (e.g. synthetic_ontology
).
In the synthetic case, there is additionally a file graph.nt
(the generated graph) and a directory dgl-ke-graph
(the same graph in the DGL-KE format).
Size Group
A size is merely the size of the test case; more specifically, the number of positives
and negatives, respectively.
Example: Structure of the DBpedia Gold Standard Directory
v1/
|
-- dbpedia/
|
-- tc01/
|
-- books/
| |
| -- 50/
| | |
| | -- negatives.txt
| | |
| | -- positives.txt
| | |
| | -- train_test/
| | |
| | -- test.txt
| | |
| | -- train.txt
| |
| -- 500/
| |
| -- ...
|
-- cities/
|
-- ...
Example: Structure of the Synthetic Gold Standard Directory
v1/
|
-- synthetic_ontology/
|
-- tc01/
| |
| -- synthetic_ontology/
| |
| -- 1000/
| | |
| | -- negatives.txt
| | |
| | -- positives.txt
| | |
| | -- train_test/
| | |
| | -- test.txt
| | |
| | -- train.txt
| |
| |
| -- graph.nt
| |
| -- ontology.nt
| |
| -- dgl-ke-graph/
|
-- tc02/
|
-- ...
The CLI can be used on to generate a gold standard without editing code in an IDE.
You can call the help menu via -h
/--help
.
-a,--analyze
-b,--branchingFactor <arg> Only valid for synthetic generators. The
parameter specifies the branching factor for
the ontology class tree.
-c,--classes <arg> Only valid for synthetic generators. The
parameter specifies the number of classes to
be used.
-d,--directory <arg> The test directory that shall be written or
analyzed (-a). If the directory shall be
written, it must not exist yet.
-e,--edges <arg> Only valid for synthetic generators. The
parameter specifies the number of edges to
be used.
-h,--help Print this help message.
-m,--maxTriples <arg> Only valid for synthetic generators. The
parameter specifies the maximum number of
triples per node.
-n,--nodeFactor <arg> Only valid for synthetic generators. The
total nodes factor determines the maximum
number of nodes in a graph (totalNodesFactor
* nodesOfInterest).
-q,--queries <arg> The directory where the queries reside. If
parameter -q is not specified, the synthetic
query module is used.
-s,--sizes <arg> The sizes for the test cases, space
separated.
-t,--timeout <arg> Time out in seconds for queries.
-tc,--tc_group <arg> The test case group such as 'cities'; space
separated.
-tcc,--tc_collection <arg> The test case collection such as 'tc01';
space separated.
- java 17
- maven project