-
Notifications
You must be signed in to change notification settings - Fork 10
Testing and Benchmarking
The scalatest
library is currently being used for testing.
All (unit-)tests are conducted within the scalatest
functionality. All the tests are stored within the ./src/test/scala/.
directory. Optional resources for those tests can be stored in the ./src/test/resources/.
directory, or rather a subdirectory of that (depending of semantic coherence of resource data).
All tests should extend the LeoTestSuite
class that automatically includes some testing utility functionality to the current name space. A general template for tests looks like the following:
/** Documentation of the test ... */
class MyTestSuite extends LeoTestSuite {
// some contant definition
// include external test resources by
val resource = fromFile(getClass.getResource("relative path to resource"))
// or obtain a directory with multiple resources in it (and for loading it later in the code)
val resourceDir = getClass.getResource("relative path to directory").getPath
// Tests begin
/** A simple test with name "Test1" */
test("Test1" /*, tags ... */) {
// Test code
}
/** Multiple tests by looping */
for (/** e.g. iterate over resourceDir */) {
test(...) {
// Test code
}
}
// Tests end
}
All paths to testing resources are to be used relatively to ./src/test/resources/
, i.e. the statement
getClass.getResource("problems/").getPath
would create a reference to the directory located at `./src/test/resources/problems/'.
All functionality from TestUtility
(also located at the test package leo
) is directly usable from any TestSuite that extends LeoTestSuite
. This functionality includes:
-
The pre-defined function
time
allows to measure the amount of microseconds used for a particular calculation. For any valuea
of typeA
it returns a tuple(a',t)
of type(A, Long)
where the first element if the result of the calculation and the second argument is the measured calculation time. Usage:class MyTestSuite { // Other stuff test(...) { val (erg,time) = time(somefunction(...)) // erg contains value of somefunction(...) // time contains the computation time } }
-
The function getFreshSignature returns a reference to the global signature with the additional assertion that all user-defined constants have been deleted. I.e. the global signature only contains the predefined HOL constants (such as fixed constants like "or" and "not" and defined constants like "and").
-
The function
resetTermBank
clears theTermBank
, i.e. the term store. -
The function
resetBlackBoard
clears the formula store of theBlackBoard
. -
WARNING: The function
test
will automatically execute all the three above reset methods in order to guarantee a clean test environment. If this is not what you want in your test suite, you need to override this manually by puttingoverride def before(fun: => Any) {}
to your code.
- Additionally, the
TestUtility
contains some more methods for "pretty printing" and other stuff. Please look into the according trait for more information.
There are currently three Tags for Tests in Leo-III:
- Checked
- Ignored
- Benchmark
Only tests tagged as "Checked" will be called when invoking mvn test
, those tagged with "Ignored" are explicitly excluded from the mvn test
command. The "Benchmark" tag's purpose is to show that a certain test does not directly test "correctness" of a function, but rather runs a performance test of some kind. These tests tend to consume a lot of time, this is why tests tagged as "Benchmark" are also not run by default via mvn test
.
package leo
package modules.parsers
import scala.util.parsing.input.CharArrayReader
import scala.io.Source._
/**
* This suite tests parsing of the SYN000-sample problems of the TPTP library.
* The module [[leo.modules.parsers.TPTP]], i.e., the method parseFile() is tested.
* A tests succeeds if the parser can successfully parse the input and
* fails otherwise (i.e. if a parse error occurs).
*
* @author Alexander Steen
* @since 22.04.2014
* @note Updated 04.03.2015 -- Only test SYN000-files that cover most of the syntax features.
*/
class ParserTestSuite extends LeoTestSuite {
val source = getClass.getResource("/problems").getPath
val problem_suffix = ".p"
val problems = Seq( "SYN000-1" -> "TPTP CNF basic syntax features",
"SYN000+1" -> "TPTP FOF basic syntax features",
"SYN000_1" -> "TPTP TF0 basic syntax features",
"SYN000^1" -> "TPTP THF basic syntax features",
"SYN000^2" -> "TPTP THF advanced syntax features",
"SYN000+2" -> "TPTP FOF advanced syntax features",
"SYN000_2" -> "TPTP TF0 advanced syntax features",
"SYN000=2" -> "TPTP TFA with arithmetic advanced syntax features"
)
for (p <- problems) {
test(p._2, Checked) {
printHeading(s"Parsing test for ${p._2}")
Out.output(s"## Parsing ${p._1} ...")
val parsed = TPTP.parseFile(new CharArrayReader(fromFile(source + "/" + p._1 + ".p").toArray))
if (parsed.isLeft) {
fail(s"FAILED. Cause: ${parsed.left.get}")
} else {
val res = parsed.right.get
Out.output(s"Parsing succeeded. Parsed ${res.getFormulaeCount} formulae and ${res.getIncludeCount} include statements.")
}
}
}
}
A simple python script is located at ./src/test/resources/processBenchmark.py
which calculates statistical information about a batch of input data.
If you want to use this script, the input file(s) must have the form:
<Identifier>\t<Value>
<Identifier>\t<Value>
<Identifier>\t<Value>
...
where \t
indicates a Tab-Symbol, <Identifier>
is some arbitrary string not containing any tab-symbol (\t), and <Value>
is some numerical value.
The call to ./src/test/resources/processBenchmark.py <file>
will then calculate the minimum, maximum, sum, mean, median and standard derivation of the s inside the file.
If multiple files are given, as in ./src/test/resources/processBenchmark.py <file1> <file2> ...
, the script will match/group the results of the input files according to their <Identifier>
and calculate the respective values for each series of values, for each identifier.
Consider file A
contains
Prob1 123
Prob2 234
Prob3 432
Prob4 534
then, the call ./src/test/resources/processBenchmark.py A
will yield:
Min Max Sum Mean Median Std
123 534 1323 330.75 333.0 161.309291425
Consider file A
contains
Prob1 123
Prob2 234
Prob3 432
Prob4 534
and file B
contains
Prob1 127
Prob2 236
Prob3 123
Prob4 132
then, the call ./src/test/resources/processBenchmark.py A B
will yield:
Key Min Max Sum Mean Median Std
Prob4 132 534 666 333.0 333.0 201.0
Prob2 234 236 470 235.0 235.0 1.0
Prob3 123 432 555 277.5 277.5 154.5
Prob1 123 127 250 125.0 125.0 2.0
Min Max Sum Mean* Median*
Global 123 534 1941 242.62 256.25
(*: Mean of means resp. median of medians)