Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added libtest_json_output #2234

Closed
wants to merge 2 commits into from
Closed

Conversation

Gilnaa
Copy link

@Gilnaa Gilnaa commented Dec 4, 2017

Rendered

Proposed implementation:
rust-lang/rust#46450 (comment)

@Centril Centril added T-cargo Relevant to the Cargo team, which will review and decide on the RFC. T-compiler Relevant to the compiler team, which will review and decide on the RFC. T-dev-tools labels Dec 5, 2017
@ehuss
Copy link
Contributor

ehuss commented Dec 6, 2017

Would there be a way to determine which file a test is coming from? I'd like to display errors inline, but just having the test name doesn't seem to be enough information to do that.

@Gilnaa
Copy link
Author

Gilnaa commented Dec 6, 2017

Currently, this is something the test infrastructure does not provide, and I'm not sure if it can be easily added.

@ehuss
Copy link
Contributor

ehuss commented Dec 6, 2017

After our discussion at alternate-libtest-output-format/6121 I looked at the implementation to see how difficult it would be to add filename/line number information. I don't have any real experience with the Rust internals, so it's difficult for me to say for sure, but it looks like it shouldn't be too difficult to extend TestDesc to include that information.

I'm not sure if extending this is outside of the scope for this RFC. However, my concern is that the "name" parameter is not unique. If we add file/line later, would every JSON message duplicate that information?

As an aside, adding line number information would also help with implementing rust-lang/rust#45765.

@Gilnaa
Copy link
Author

Gilnaa commented Dec 7, 2017

I think it's a great idea, which I would love to take care of, but this is a bit outside of the scope of this RFC.

I'm not an expert of APIs, but I think it might not be an API breakage to add another field to the json output

@jan-hudec
Copy link

@Gilnaa, what is the proposed format based on? Is it the same as some already established test framework? If yes, which one? If not, why not?

@Centril
Copy link
Contributor

Centril commented Dec 8, 2017

In this RFC, don't forget to account for the needs of crates providing property based testing such as proptest and quickcheck. These may be helped by having a notion of "number of tests passed" (which is different from number of #[test]s passed).

@Gilnaa
Copy link
Author

Gilnaa commented Dec 9, 2017

@jan-hudec I admit I did not do a lot of research, but I can't say I know of a testing framework using JSON. Mostly I found that gtest can output XML and of TAP.

The proposed format is mostly based on suggestions from other RFCs.

@Centril If I understand correctly, it is currently impossible to do what you suggest, since I have no indication of when an assert succeeds.
Asserts aren't part of libtest, the harness just catches panics.

Copy link
Member

@killercup killercup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a formal and full specification of the used JSON schema.


Using this flag, the format can be selected from one of the following:
- `pretty`: The default setting; print test results in a detailed manner.
- `terse`: Equivalent to `-q`; display one character per test instead of one line.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a new addition as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, it's just the old -q that libtest already has.
It was present in the PR as well.


Each JSON object starts with the "type" property, which is one of the following:

- `suite`: Events regarding the whole testing suite; consists of a header specifying the amount of tests to be performed,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be multiple test suites, right? It seems like currently we run unit tests, integration tests, and doc tests separately.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No as far as I'm aware.
In the context of libtest there's nothing really named a "testing suite", there are just slices of function pointers.
Maybe the name is misleading


- `suite`: Events regarding the whole testing suite; consists of a header specifying the amount of tests to be performed,
and a footer with results summary.
- `test`: A change in a test's state, along with its name. Can be one of the following:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "its name" here? A unique identifier for the test? We have one of these, right? I assume it only needs to be unique in the current suite.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially it's the test's name+path.
I'm pretty sure they're unique, as Rust paths are mostly unique, but it requires a second check.

- `allowed_failure`
- `timeout`
- `bench`: Benchmark results, specifying the median time and the standard deviation
- `test_output`: The stdout of a failed test, if available.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was part of a lengthy discussion in today's dev-tools meeting. I'll write a top-level comment on this later.


- This proposal adds a new API to which the toolchain must adhere, increasing the chance of accidental breakage in the future.

# Rationale and alternatives
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add "Prior Art" with links to all the stuff that other people did? (Can be collected from comments here!) #1284 mentioned https://github.com/rubyworks/tapout/wiki/TAP-Y-J-Specification for example

@matklad
Copy link
Member

matklad commented Dec 11, 2017

Some prior art links:

TAP:

http://testanything.org/tap-specification.html
https://github.com/rubyworks/tapout/wiki/TAP-Y-J-Specification

JSON format from Dart:

https://github.com/dart-lang/test/blob/master/doc/json_reporter.md

@matklad
Copy link
Member

matklad commented Dec 11, 2017

Hm, and of course #1284 :)

@Centril
Copy link
Contributor

Centril commented Dec 11, 2017

@Gilnaa Test-runners for PBT (property based testing) may themselves count how many tests-cases (generated) that were passed ignored, etc. An example for proptest. There just needs to be an API that can report this back in some organized fashion.

@killercup
Copy link
Member

As mentioned in #2234 (comment) there was a bit of a discussion around how best to capture a test's output (to stdout/stderr). The use case is that the user want to see immediately (in their IDE which consumes the JSON-based test output) when a test outputs something, so they don't need wait for the whole (possibly very slow) test to run to see if there was an error. Basically, printf debugging of tests.

Technically, each test needs to run in its own process, so it can have its own stdout/stdin buffers (threads share stdout/stderr!). I believe this is already the case (as cargo test is able to capture stdout/stderr).

Depending on our goals we can either capture this and output the buffer contents after the test is finished, or stream the output.

I propose the following:

  1. Capture output

    1. Enabled by default
    2. Stdout/stderr content it output together with the test ok/failed event: { "type": "test", "event": "ok" | "failed", "name": string, output: { stdout: string, stderr: string } }}.

    One possible problem here is that the output may be very long. But, assuming serde json is as as fast as promised, even 100MB of stdout buffer should not take a second to serialize.

  2. Streaming output

    1. Enabled if --nocapture is specified (a flag that already exists)
    2. As soon as a complete line is written to a test's stdout or stderr buffer, output a JSON document of the following schema: { type: "test", "event": "stdout" | "stderr", content: "…" }

    The requirement "as soon as a complete line is written" can probably be changed to "on every stdout/stderr flush", but it seemed more consistent to output line-based.

Another interesting issue that was brought up was: What if a test starts a thread, and uses println! in that thread? I think this should work by default—all tier 1 platform share stdout/stderr between threads.

@matklad what do you think?

@matklad
Copy link
Member

matklad commented Dec 11, 2017

@killercup reading the source of libtest reveals a couple of problems:

  1. tests are run in separate threads. There's io::set_print hack to capture stdout (but stderr is dropped on the floor), but it won't work for threads spawned from tests themselves.

  2. with parallel execution, the "test started" event is not emitted until the test ends, which means you can't learn which tests are in progress now.

@killercup 's suggestion looks excellent, if we can implement it :)

@killercup
Copy link
Member

killercup commented Dec 11, 2017

@matklad Interesting. (I'll leave it at that…) As these are limitations of the test runner, and this RFC is more about the format (or, well, it should be :)), do you think we can live with them for now?

Dart's JSON format look really great, by the way. I don't think we need all of it, but it may be worthwhile to try to keep our schema compatible with theirs. Their streaming output schema only differentiates between "print" and "skip message", so we'll need to expand that.

@matklad
Copy link
Member

matklad commented Dec 11, 2017

do you think we can live with them for now?

I am afraid that with these limitations IntelliJ won't be able to take advantage of JSON format at all :( So, this might be useful for someone, but not for us.

Actually, if not for these limitations, we would have parsed current text-based output: it's not as pretty as parsing JSON, but certainly doable.

I do think that we need to find at least one client (not necessary IntelliJ) for this feature before stabilizing it.

@killercup
Copy link
Member

Actually, if not for these limitations, we would have parsed current text-based output: it's not as pretty as parsing JSON, but certainly doable.

Wait, which limitation exactly? Does the thread-local-stdout-capture workaround not work properly enough to allow streaming, or is the test started thing messing everything up?

I do think this is independent of the RFC, though, and should actually be fixed regardless of what happens here.

Let me pick up something from the dev tool team meeting: I proposed for this to become an experimental RFC (or eRFC), because, while the format itself is not decided on, the motivation, and the need for machine-readable test output is there. We settled on landing this behind an unstable feature flag, iterate on the design, and then use this RFCs to stabilize it.

If we need to refactor libtest to do this, this will take longer than I'd like, but would still fit into that plan.

I do think that we need to find at least one client (not necessary IntelliJ) for this feature before stabilizing it.

I agree. A good first demo might be a CLI tool that reads the JSON test output and converts it to human readable stuff similar to what we currently output, but demo CLI tools are of course not the primary targets of this!

I also think IntelliJ could use this feature before output streaming is available, but I'm of course in no position to tell you what to do :) (My only argument: Displaying output after the test run is still strictly better than not supporting running tests at all.)

@matklad
Copy link
Member

matklad commented Dec 11, 2017

Wait, which limitation exactly?

with parallel execution, the "test started" event is not emitted until the test ends, which means you can't learn which tests are in progress now.

This one really throws a wrench into test reporting. I might be misreading the code though.

(My only argument: Displaying output after the test run is still strictly better than not supporting running tests at all.)

We do support running tests, we just can't generate nice reports and show which tests are in progress, which are done and which have not started yet.

@matklad
Copy link
Member

matklad commented Dec 11, 2017

Wait, "printing test started after the test is done" is actually only relevant for human reporter (because you want test foo::bar::baz ... and ok or failure bits to be on the same line), this limitation should be easily lifted for JSON messages, and with that, we will be able to improve test integration on the IntelliJ side, though the handling of stdout/stderr would be suboptimal.

@killercup
Copy link
Member

killercup commented Dec 12, 2017 via email

@Gilnaa
Copy link
Author

Gilnaa commented Dec 12, 2017

A lot of issues mentioned here and on the PR actually touch core problem with libtest.

It seems like this RFC should not be merged until there's an effort to redesign libtest.

@killercup I'm all for putting the output of a test inline. I'm not sure why I didn't think of that. Maybe I just followed too closely the pretty printing.

About serde. I wrote a quick trial of libtest using serde, but didn't manage to compile it. Something about serde_derive not being able to use/find the proc_macro crate. (Sounds like something to do with the compiler's stages, IDK)

@Centril I'm not familiar with the code proptest generates, but if it boils down to generating #[test] decorated functions, I'm not sure how libtest will be able to implement what you're suggesting currently.

Nonetheless, we might want the JSON API to support it, so alternative test runners who do support this can output more information.

I'll try to update according to your comments soon (+demo implementation), but I'm traveling a lot this week and so am laptop-less.

@Gilnaa Gilnaa closed this Dec 12, 2017
@Gilnaa
Copy link
Author

Gilnaa commented Dec 12, 2017

It seems some of the problems of libtest stem from the fact that it uses panics (mainly through asserts) to indicate failures, which is problematic for a few reasons.

Among other things, it prevents the existence of "soft-asserts", (I think Catch/gTest calls these REQUIRE) or any kind of test-specific indication.

Not to say libtest shouldn't try to catch panics (it should), but it's not the best way to communicate. (I can attest that I'm not the best communicator when I'm panicking)

It's not the prettiest solution, but maybe we can add an option for a #[test] function that receives some kind of struct/callback from libtest as a parameter.

#[test]
fn t(foo: test::HandlerSomething) {
    t.require(/* cond */);
    t.assert(/* cond */);
}

But this is also problematic, since libtest is not stable at all (and currently test-functions are effectively blind to their context)

@Gilnaa Gilnaa reopened this Dec 12, 2017
@Centril
Copy link
Contributor

Centril commented Dec 12, 2017

@Gilnaa That is what proptest does right now since that's the only game in town. But there's nothing stopping proptest or quickcheck to change this in order to provide such information since the TestRunner already keeps a tally.

For PBT, I think this would be useful (perhaps too much):

{
  "type": "test_output",
  "name": "reverse_reverse_eq",
  "output": "thread 'will_fail' panicked at 'assertion failed: false', f.rs:12:1\nnote: Run with `RUST_BACKTRACE=1` for a backtrace.\n",
  "runner": "proptest",
  "runner_type": "property_based_testing",
  "minimal_input": "Point { x: 1, y: 0 }",
  "passed": 78,
  "ignored": 10,
  "required": 100,
}

And some events:

{ "type": "test", "runner": "proptest", "runner_type": "property_based_testing", "event": "shrinking", "name": "complicated" }
{ "type": "test", "runner": "proptest", "runner_type": "property_based_testing", "event": "shrinking", "name": "simplified" }
{ "type": "test", "runner": "proptest", "runner_type": "property_based_testing", "event": "shrinking", "name": "found_minimal" }

Also, on an unrelated note, I think it would be helpful if the trait introduced in the reference level explanation had more motivation and explanation - you motivate that there's problem to solve, but not why the proposed trait solves this in the best way possible - the rationale should also talk about this.

@Gilnaa
Copy link
Author

Gilnaa commented Dec 12, 2017

Yeah, I'll add an explanation.

The trait is basically modeled after libtest current behavior and design which is described mostly by the following callback: (and then some)

https://github.com/Gilnaa/rust/blob/dcd49b9aad7b79c2c32c2bc94ee36aca81b6af92/src/libtest/lib.rs#L748

If we keep the current design of libtest (which we do not have to, since the library is unstable), this trait is the most proper solution, but if there's a major refactoring, the technical details of this RFC are slightly wrong.

@Centril
Copy link
Contributor

Centril commented Dec 12, 2017

@Gilnaa As long as the changes you propose are reasonably forward-compatible so that it could support the extra information I listed in the future, I think that's fine.

Having runner and runner_type allows parsing logic in the IDE tailored to the particular runner.

@Gilnaa
Copy link
Author

Gilnaa commented Dec 12, 2017

I'll admit I'm not sure what these are for or what the IDE should do with them, can you elaborate?

@Centril
Copy link
Contributor

Centril commented Dec 12, 2017

@Gilnaa Given runner or runner_type, you may have another field extra which can be parsed dependent on runner - in extra you may then put in

"extra": {
  // The minimal failing input (or none if all tests passed)
  "minimal_input": "Point { x: 1, y: 0 }",
  // Denotes how many actual randomly generated test cases that passed.
  "passed": 78, 
  // Denotes how many randomly generated test cases that failed.
  "failed": 3,
  // Dito for the number that was filtered out by the test runner because of some  .filter(|x| predicate(x)) logic
  // See: https://docs.rs/proptest/0.3.2/proptest/strategy/trait.Strategy.html#method.prop_filter    
  "ignored": 10,
  // The number of generated cases that was required to pass the test.
  "required": 100,
  // The time it took in microseconds (or nanoseconds) to generate a test case on average:
  "avg_gen": 10,
  // The time it took to execute the function under test on average:
  "avg_exec": 55,
}

or some string with useful information that we can't think of right now.

The IDE or the command line should dump these bits to the user in some nice readable way.
You can use this information to:

  1. Tell if a generator is really slow, so that you can speed it up and improve testing performance
  2. Dito with avg_exec
  3. Passed/Ignored/Required can help you to tell if perhaps we need to crank up the number of tests or in conjunction with avg_gen if the number of tests should be reduced, etc.
  4. ...

Without these statistics, your testing may be slow but you have no way to improve things because you lack the relevant information.

@Gilnaa
Copy link
Author

Gilnaa commented Dec 12, 2017

And are they supposed to free-form fields or a closed set?

@Centril
Copy link
Contributor

Centril commented Dec 12, 2017

@Gilnaa I was thinking the former as different runners may provide different sets of information... tho it could be a closed set of optional fields so you can range between a minimal set and and a maximal set. A set of free-form fields may be more expressive but harm the consistency of analysis.

@Gilnaa
Copy link
Author

Gilnaa commented Dec 12, 2017

Not sure there's a need to specify it for each test, right? You can add it to the first (header-like) JSON line.
Otherwise I'm pretty apathetic, tbh, as long as it's in some kind of "extra" field, which the parser can ignore easily if it wants.

@matklad
Copy link
Member

matklad commented Dec 12, 2017

@Centril I'd rather punt on needs of property based tests for now: even if we support this in JSON protocol, I doubt it there will be a lot of clients that would understand this information. Certainly, there's no UI in IntelliJ where we could show this information, so the best that a property test can do is to print this information in a human readable form to stdout. And it's always possible to add more fields to existing messages or to add new types of messages.

Disclaimer: I am myself addicted to property based testing and I find hypothesis a godsend, but I want to have some form of machine readable output sooner rather then later, because this was blocked on custom everything forever.

@Centril
Copy link
Contributor

Centril commented Dec 12, 2017

@Gilnaa To be explicit/clear, each #[test] would have its own unique instance of the extras field with values determined at runtime (well, required may perhaps be determined statically for a specific #[test]).

@matklad I'm down with that so long as we are forward compatible with adding the things I enumerated. I don't wish to be in the way of stabilization =)

@Gilnaa
Copy link
Author

Gilnaa commented Dec 12, 2017

@Centril yeah, ofc extras could be "anywhere", I talked about the runner type.
Actually the linked Dart specification mentions that any field can be added without breaking compatibility, so there's that.

Edit:
Does serde_json support unmapped fields during parsing?

@Centril
Copy link
Contributor

Centril commented Dec 12, 2017

Would the following be amenable to you?:

#[nonexhaustive]
pub enum RunnerClassification {
    Unit,
    PBT,
    //...
}

pub struct RunnerIdentifier {
    identifier: String,
    type: RunnerClassification // Or perhaps just String.
}

pub struct TestDesc {
    pub name: TestName,
    pub ignore: bool,
    pub should_panic: ShouldPanic,
    pub allow_fail: bool,
    pub runner: Option<RunnerIdentifier>, // <-- Added, rest is the same for this type.
}

pub struct TestResult {
    classification: TestResultClassification,
    extras: HashMap<String, String> // Or some similar type for free-form key-value extras.
}

pub enum TestResultClassification {
    TrOk,
    TrFailed,
    TrFailedMsg(String),
    TrIgnored,
    TrAllowedFail,
    TrMetrics(MetricMap),
    TrBench(BenchSamples),
}

// OutputFormatter remains as is.

If the trait is stabilized as specified in the RFC currently, then we are not forward-compatible since changes to TestDesc and TestResult would be a breaking change. At the very minimum I think the output-format must expect to receive extras in the protocol as well as a way to identify the test runner to apply some formatting logic based on a set of known runners. The current default runner will then use runner: None as well as extras: HashMap::new().

Also - re-checking the trait, I'm not sure how it supports the "event-notifications" you are proposing in the guide level explanation.

@jan-hudec
Copy link

jan-hudec commented Dec 12, 2017

@Gilnaa,

fn t(foo: test::HandlerSomething)

The context could be in thread-local storage. That would avoid the need for passing an object around. It would not allow easily doing checks from threads spawned by the test, but neither do panics in the current state.

@Gilnaa
Copy link
Author

Gilnaa commented Dec 12, 2017

@Centril the trait is mostly an implementation detail and should not be relevant to the resulting JSON, it is not public, and thus it is not an API to break it.
It is not meant for it to be some generic output formatter for generic test runners (which is a good idea, but unrelated).

The events are logged using mostly write_result, but also using write_timeout.

@jan-hudec Yeah, that's a great idea.
I thought about somehow having assert using this in test configurations, but that seems like asking for trouble

@Centril
Copy link
Contributor

Centril commented Dec 12, 2017

Oh I see. I thought it was generic for all runners and all formats, wherefore my confusion. Can you make this more clear in the text?

@Gilnaa
Copy link
Author

Gilnaa commented Dec 12, 2017

Yes, just to clarify:

This RFC is intended only for the simple, plain libtest to produce simple json.
Hopefully tonight I'll have some time and I will update the RFC to reflect all of your comments.

@nrc
Copy link
Member

nrc commented Dec 13, 2017

We discussed this yesterday in the dev-tools meeting. We thought that the associated Rust PR should land without an RFC. Long-term, machine-readable output should be part of the larger plan for pluggable test runners. We will probably want to specify some kind of standardised output or API and this RFC is probably a good starting point. However, I think we should get some experience with the initial implementation and work out the larger story around pluggable test runners. Therefore, I'm closing this RFC for now.

@nrc nrc closed this Dec 13, 2017
@petrochenkov petrochenkov added the T-dev-tools Relevant to the development tools team, which will review and decide on the RFC. label Jan 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-cargo Relevant to the Cargo team, which will review and decide on the RFC. T-compiler Relevant to the compiler team, which will review and decide on the RFC. T-dev-tools Relevant to the development tools team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants