Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metricbeat] Testing: Allow use of different directory to test data instead of the expected one #34467

Merged
merged 25 commits into from
Feb 20, 2023
Merged

[Metricbeat] Testing: Allow use of different directory to test data instead of the expected one #34467

merged 25 commits into from
Feb 20, 2023

Conversation

constanca-m
Copy link
Contributor

@constanca-m constanca-m commented Feb 2, 2023

What does this PR do?

Offers the possibility to define the read and write folder for the files used for mbtest.TestDataFiles. The previous default folder remains the same.

Why is it important?

At this moment, each metricset that uses mbtest.TestDataFiles, needs to have the directory _meta/testdata. However, sometimes a metricset can be using the same file as another. See this change where multiple metricsets need to have the same files.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Outcome

Example: when running the testing function inside directory module/kubernetes/state_cronjob:

func TestData(t *testing.T) {
	mbtest.TestDataFiles(t, "kubernetes", name)
}

The results should be:

=== RUN   TestData
--- PASS: TestData (0.50s)
=== RUN   TestData/ksm.v2.4.2.plain
    testdata.go:216: Testing _meta/testdata/ksm.v2.4.2.plain file
    testdata.go:286: Expected _meta/testdata/ksm.v2.4.2.plain-expected.json file
    --- PASS: TestData/ksm.v2.4.2.plain (0.12s)
=== RUN   TestData/ksm.v2.5.0.plain
    testdata.go:216: Testing _meta/testdata/ksm.v2.5.0.plain file
    testdata.go:286: Expected _meta/testdata/ksm.v2.5.0.plain-expected.json file
    --- PASS: TestData/ksm.v2.5.0.plain (0.13s)
=== RUN   TestData/ksm.v2.6.0.plain
    testdata.go:216: Testing _meta/testdata/ksm.v2.6.0.plain file
    testdata.go:286: Expected _meta/testdata/ksm.v2.6.0.plain-expected.json file
    --- PASS: TestData/ksm.v2.6.0.plain (0.13s)
=== RUN   TestData/ksm.v2.7.0.plain
    testdata.go:216: Testing _meta/testdata/ksm.v2.7.0.plain file
    testdata.go:286: Expected _meta/testdata/ksm.v2.7.0.plain-expected.json file
    --- PASS: TestData/ksm.v2.7.0.plain (0.12s)
PASS

However, if specified a custom prefix in the config.yml file:

path: "../_meta/testdata"

Running the function:

func TestData(t *testing.T) {
	mbtest.TestDataFiles(t, "kubernetes", name)
}

Would lead to:

=== RUN   TestData
--- PASS: TestData (0.49s)
=== RUN   TestData/ksm.v2.4.2.plain
    testdata.go:216: Testing ../_meta/testdata/ksm.v2.4.2.plain file
    testdata.go:286: Expected _meta/testdata/ksm.v2.4.2.plain-expected.json file
    --- PASS: TestData/ksm.v2.4.2.plain (0.12s)
=== RUN   TestData/ksm.v2.5.0.plain
    testdata.go:216: Testing ../_meta/testdata/ksm.v2.5.0.plain file
    testdata.go:286: Expected _meta/testdata/ksm.v2.5.0.plain-expected.json file
    --- PASS: TestData/ksm.v2.5.0.plain (0.14s)
=== RUN   TestData/ksm.v2.6.0.plain
    testdata.go:216: Testing ../_meta/testdata/ksm.v2.6.0.plain file
    testdata.go:286: Expected _meta/testdata/ksm.v2.6.0.plain-expected.json file
    --- PASS: TestData/ksm.v2.6.0.plain (0.12s)
=== RUN   TestData/ksm.v2.7.0.plain
    testdata.go:216: Testing ../_meta/testdata/ksm.v2.7.0.plain file
    testdata.go:286: Expected _meta/testdata/ksm.v2.7.0.plain-expected.json file
    --- PASS: TestData/ksm.v2.7.0.plain (0.11s)
PASS

Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
@constanca-m constanca-m self-assigned this Feb 2, 2023
@constanca-m constanca-m requested a review from a team as a code owner February 2, 2023 17:36
@constanca-m constanca-m requested review from rdner and cmacknz and removed request for a team February 2, 2023 17:36
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Feb 2, 2023
@mergify
Copy link
Contributor

mergify bot commented Feb 2, 2023

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @constanca-m? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
@constanca-m constanca-m added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Feb 2, 2023
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Feb 2, 2023
@elasticmachine
Copy link
Collaborator

elasticmachine commented Feb 2, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-02-14T14:37:08.058+0000

  • Duration: 55 min 46 sec

Test stats 🧪

Test Results
Failed 0
Passed 4877
Skipped 957
Total 5834

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
@constanca-m constanca-m requested a review from a team as a code owner February 6, 2023 08:36
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
@constanca-m
Copy link
Contributor Author

Hey @elastic/elastic-agent-data-plane, can you have a look at this? Thanks.

@cmacknz cmacknz requested review from belimawr and removed request for cmacknz February 6, 2023 20:52
@constanca-m constanca-m requested review from a team, ChrsMark and tetianakravchenko and removed request for a team February 8, 2023 14:20
Copy link
Contributor

@tetianakravchenko tetianakravchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it actually required for the purpose of this PR to delete tests from the openmentrics module? I didn't check why do we have 2 folders with tests, but I think that it shouldn't part of this PR, can it be split to 2 PRs: move openmetrics changes to different PR?

CHANGELOG.next.asciidoc Outdated Show resolved Hide resolved
metricbeat/module/openmetrics/collector/collector_test.go Outdated Show resolved Hide resolved
@tetianakravchenko
Copy link
Contributor

@constanca-m also looking on my comment from your previous PR - #34448 (comment), maybe it would be easier to add a symlink files for the metricbeat/module/kubernetes/_meta/test/ksm.v2.4.2 instead?

@constanca-m
Copy link
Contributor Author

@constanca-m also looking on my comment from your previous PR - #34448 (comment), maybe it would be easier to add a symlink files for the metricbeat/module/kubernetes/_meta/test/ksm.v2.4.2 instead?

I am not sure what that implies @tetianakravchenko

Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition in the test helper looks good in general if it is to reduce code/file duplication. I didn't check it in detail but if the testing suite doesn't break it should be fine. I left some nits to be considered to improve the changes a bit.

The removed files from the openmetrics module should not be removed, those are actual unit tests.

These changes should also be depicted at https://github.com/elastic/beats/blob/main/metricbeat/mb/testing/data/README.md.

CHANGELOG.next.asciidoc Outdated Show resolved Hide resolved
metricbeat/module/openmetrics/collector/collector_test.go Outdated Show resolved Hide resolved
metricbeat/mb/testing/testdata.go Outdated Show resolved Hide resolved
metricbeat/mb/testing/testdata.go Show resolved Hide resolved
metricbeat/mb/testing/testdata.go Show resolved Hide resolved
metricbeat/mb/testing/testdata.go Show resolved Hide resolved
constanca-m and others added 7 commits February 9, 2023 11:13
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
@@ -0,0 +1,24 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reason of generation data.json now? for both samelabeltestdata and testdata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, that was already in the code. It was added because I ran the tests locally.

Copy link
Contributor

@tetianakravchenko tetianakravchenko Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was already in the code.

what do you mean? those files were not in samelabeltestdata and testdata folders before your changes. And as I see in other modules this file is not present (example - https://github.com/elastic/beats/tree/main/metricbeat/module/prometheus/collector/_meta/testdata, similar files structure with the openmetrics module)

Copy link
Contributor Author

@constanca-m constanca-m Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, should have been more clear. There are these lines to create that file. Since I ran the tests locally, the files were created.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if running tests locally from main branch (as described in this doc) data.json are not created on the metricsets level, please double check why it gets created after your changes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be a breaking change for our documentation, it should be changed

@elastic/elastic-agent-data-plane could you please review?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am interpreting this correctly, we are proposing to stop updating the data.json file that is used to render example events published in our official documentation? For example https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-metricset-redis-info.html that was already linked?

If this understanding is correct then,

  1. We should keep the data.json field up to date. Users are likely relying on these pages to know what fields exist in the events for querying and processing. Removing this isn't helping our users, even if it may be convenient for us.
  2. If we have discovered that data.json is not being maintained or generated consistently, then that seems like a bug causing the documentation to be out date and should be fixed.

Copy link
Contributor Author

@constanca-m constanca-m Feb 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am interpreting this correctly, we are proposing to stop updating the data.json file that is used to render example events published in our official documentation? For example https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-metricset-redis-info.html that was already linked?

Yes, you're correct. But the way we are referencing these files is wrong. And the one that you linked is not affected by this, because there are no files referenced there generated by this function. But going back to the ones that are actually affected... We can see a good example in this changed module openmetrics. We have two new data.json files: one for samelabeltestdata (created by running this test) and another for testdata (created by running this test). Previously, running these two tests would result in just one data.json file (running one function after the other, is overwriting what the other did), misplaced in this directory.

So if:

  1. We want to always create just one data.json file and never have the possibility to create more than one for different configs, this PR is breaking that.
  2. If we want to fix it, this PR is needed for that, but that requires changing the path for the data.json files in some of the modules. If this is not done, the referenced data.json files in the docs will stop being updated.

I would say that as it is now, this is a bug and we should fix it.

But if we want to keep doing 1., then updating it would be very easy as all the docs files are auto generated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went and looked at how the data.json files are actually pulled into our documentation. Picking the Redis info one that is already referenced (arbitrarily, I know it isn't affected) the data.json file is included in the docs with this asciidoc syntax:

==== Fields
For a description of each field in the metricset, see the
<<exported-fields-redis,exported fields>> section.
Here is an example document generated by this metricset:
[source,json]
----
include::../../../module/redis/info/_meta/data.json[]
----
:edit_url!:

You can see that we are explicitly linking to the data.json file to generate an example event.

I think the only two requirements here are:

  1. We can generate sample events for the metricsets in each module.
  2. We do not remove or break the example events that are already in our documentation.

We can change the name of the .json file and the number of .json files generated as long as we update the links in the documentation to account for that.

We want to always create just one data.json file and never have the possibility to create more than one for different configs, this PR is breaking that

I don't think only having one data.json file is a requirement. We can link to multiple different files as needed.

If we want to fix it, this PR is needed for that, but that requires changing the path for the data.json files in some of the modules. If this is not done, the referenced data.json files in the docs will stop being updated

We can change the path to the generated files as needed, we just need to also update the relative paths referenced in the .asciidoc files that are used to generate the documentation as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see that we are explicitly linking to the data.json file to generate an example event.

Yes, I noticed. To avoid changing the path then (I think this would be very simple, we would just need to change this part of the template), I set a default one specifically for the data.json file. By default this path is always the same, so if there are two tests overwritting this file (like it is happening with the module openmetrics), at least there is a chance now to change the path for one of the files if it is needed.

So this way we keep updating the old data.json files and give an option if we want to create more than one.

@tetianakravchenko
Copy link
Contributor

could you please add information about settings path and writepath to https://github.com/elastic/beats/blob/main/metricbeat/mb/testing/data/README.md#available-settings-in-configyml, and what are the default values

@constanca-m
Copy link
Contributor Author

could you please add information about settings path and writepath to https://github.com/elastic/beats/blob/main/metricbeat/mb/testing/data/README.md#available-settings-in-configyml, and what are the default values

Yes, good idea @tetianakravchenko

@gizas
Copy link
Contributor

gizas commented Feb 10, 2023

"../_meta/testdata"

Why dont we put specific examples in the README. Always an example talks better. An example with or without writepath can be nice

@gizas
Copy link
Contributor

gizas commented Feb 10, 2023

For my understanding, our we going to change the _meta/testdata of all our datastreams to a common folder and update the confi.yml files, as part of another PR?

@constanca-m
Copy link
Contributor Author

For my understanding, our we going to change the _meta/testdata of all our datastreams to a common folder and update the confi.yml files, as part of another PR?

No, the folders don't need update nor the config files. The only thing this changes is that the data.json file is going to be written in the expected folder, instead of the default one. I wasn't thinking in changing the data.json files since they are working where they are. @gizas

@gizas
Copy link
Contributor

gizas commented Feb 10, 2023

I was referring to a common folder eg. for all state_datasets. Maybe this way we can avoid repetition. And again yes only to another PR. But ok not something important

@constanca-m
Copy link
Contributor Author

constanca-m commented Feb 10, 2023

I was referring to a common folder eg. for all state_datasets. Maybe this way we can avoid repetition. And again yes only to another PR. But ok not something important

I think the data.json is different for all state_* because they have different fields @gizas

But the goal of this PR is to move all the repeated files we use for testing those metricsets.

@@ -0,0 +1,24 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be a breaking change for our documentation, it should be changed

@elastic/elastic-agent-data-plane could you please review?

@@ -26,7 +26,8 @@ An alternative is to just run from metricbeat `mage mockedTests` to achieve the
### Available settings in `config.yml`

- `path`: (string) Path to reach the directory containing the files to read from. Default is "_meta/testdata".
- `wirtepath`: (string) Path to the directory where the expected files are going to be written. Default is "_meta/testdata".
- `writepath`: (string) Path to the directory where the expected files are going to be written. Default is "_meta/testdata".
- `datapath`: (string) Path to the directory where the data.json file is going to be written. Default is "_meta".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please somewhere add a comment why we have data.json file and that is used/might be used for our public documentation (or link to the place where it is mentioned, but I couldn't find it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try, but I am not sure I understand the reasoning for that file since it was already there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about adding smth like data.json file is used to render example events published in our official documentation
at least for me it wasn't clear at the moment of asking why we generated those files in this comment #34467 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done that! I was thinking something more complex, but as you wrote it, it makes perfect sense. I don't want to link to the line code because I would have to use one fixed commit and that might end up not making sense if we are on the branch main and are sent to a possible very old one.

Copy link
Contributor

@tetianakravchenko tetianakravchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Make sure you have a green CI before merging!

@constanca-m constanca-m merged commit af41650 into elastic:main Feb 20, 2023
@constanca-m constanca-m deleted the refactor-test-data-mb branch February 20, 2023 09:00
chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
…nstead of the expected one (#34467)

* Allow change of directory.

Signed-off-by: constanca-m <constanca.manteigas@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants