Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Logs+] Implement Data Source APIs #156413

Merged
merged 22 commits into from
May 23, 2023

Conversation

Kerry350
Copy link
Contributor

@Kerry350 Kerry350 commented May 2, 2023

Summary

Research: https://github.com/elastic/logs-dev/issues/97
Requirements: https://github.com/elastic/logs-dev/issues/94
Implementation: https://github.com/elastic/logs-dev/issues/95

This implements two APIs ("Enumerate integrations" and "Enumerate data streams") under the Fleet plugin, and specifically the /epm namespace given the input here: https://github.com/elastic/logs-dev/issues/94

The Enumerate Integrations API can be queried like so (example with all parameters):

GET /api/fleet/epm/packages/installed?nameQuery=system&pageSize=5&type=logs&pageAfter=["system"]&sortDirection=asc

The Enumerate Data Streams API can be queried like so (example with all parameters):

GET /api/fleet/epm/data_streams?uncategorisedOnly=true&type=logs&sortDirection=desc&datasetQuery=beat

I've also left some inline review comments.

Notes / questions

  • There is an uncategorisedOnly option as part of the Data Streams API to exclude streams which are owned and managed by Fleet (so this would be streams that adhere to the spec 100%, but don't have the managed by property). This will be necessary for Discover (and perhaps other uses) to make a differentiation.

  • We are trying to make this API performant, and we therefore want to paginate for the first API (we can't on the second one as the Data Streams APIs aren't flexible enough).

    • I think this has the caveat that we can't filter by type at the top level of the package Saved Objects query, as we would need to query for objects where "one of the values of es_index_patterns contains XYZ...". The inner data streams are filtered, but we'll likely end up with type-less integrations to display in our log source selector. We can't just filter them server side as this will obviously alter the pagination. Will disabling / greying out these entries be acceptable UX? Is there some combination of querying on the SOs that I've missed?

      • ℹ️💭 It looks like we can use a nested filter on a combination of the installed_es.type and installed_es.id properties to achieve this and maintain top level pagination integrity.
  • Similar to above, as we're trying to make this as performant as possible it's hard to query certain properties without taking a performance hit. In particular there isn't an easy way to query for icons. Icons are usually queried via Registry.fetchList(), this fetches all integrations from the registry. To use this we would have to forego our pagination benefits on the SO side, as the API doesn't support pagination. There is a fetchInfo method, but this queries for one. There isn't a way to pass an array of IDs (as an example) to these APIs, so we'd have to make one call per package. I imagine we'll use a page size of around 30, so this would be another 30 requests. The current data streams API (the one not under the /epm namespace) suffers from this same problem, of having to query from many locations and combine the information. If we use the PackageIcon component from Fleet with tryAPI=true we could at least defer the loading, giving priority to getting the names and datasets on the page, however this still ultimately ends up with multiple requests to api/fleet/epm/packages/<package_name>/<version> - but shifted from server to client. This component does make optimisations such as using loading=lazy on the img but it's likely our selector will be displaying everything in the viewport anyway once it's open. The icons seem fairly important within our UX, how should we tackle this? Adding capabilities to this API: https://github.com/elastic/package-registry/blob/f1b859c61c243b445873ac938236c64ffbb3a7f6/search.go#L27 seems fairly difficult.

  • The permissions model within Fleet is very custom. There is the use of FleetAuthz and this allows the use of a custom Saved Objects client that is "configured to use kibana_system privileges instead of end-user privileges". I don't think we want to have these strict authorisation requirements for consuming these APIs within things like Discover. Rather, we'd want to only require permissions such as view_index_metadata. I've used the asCurrentUser version of the ES client (will require view_index_metadata), and the "standard" savedObjects.client (as in, from the router core context, not Fleet's custom version). This will still require the Kibana > Management > Integrations > Read permission as a minimum to query for the package Saved Objects. Are we okay with these two permission requirements? Even living outside of the Fleet plugin this would be the case. Has this introduced any security issues?

    • ℹ️💭 We will defer this question to Fleet. Maybe the SO client scopes to kibana_system user so that the end user doesn't need specific permissions. We'll need to clarify.

@apmmachine
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@Kerry350 Kerry350 force-pushed the 2653-implement-data-source-API branch from a08f399 to c497919 Compare May 3, 2023 11:10
@Kerry350 Kerry350 force-pushed the 2653-implement-data-source-API branch from 46110f0 to ed121b5 Compare May 9, 2023 12:28
@@ -21,14 +21,17 @@ export const LIMITED_CONCURRENCY_ROUTE_TAG = 'ingest:limited-concurrency';

// EPM API routes
const EPM_PACKAGES_MANY = `${EPM_API_ROOT}/packages`;
const EPM_PACKAGES_INSTALLED = `${EPM_API_ROOT}/packages/installed`;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -24,3 +24,5 @@ export interface DataStream {
serviceName: string;
} | null;
}

export type PackageDataStreamTypes = 'logs' | 'metrics' | 'traces' | 'synthetics' | 'profiling';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly not the best naming, or location? But when looking at other Fleet code I could only really find the DATA_STREAM_INDEX_PATTERN constant, but not a type I could reuse.

@@ -46,6 +49,26 @@ export interface GetPackagesResponse {
response?: PackageList;
}

interface InstalledPackage {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fairly minimal, and doesn't necessarily align with the advice given, r.e.:

I do think we should make sure the response body is somewhat aligned with the rest of the Fleet API's patterns though for consistency

Extra properties would have required extra queries to other locations (e.g. the registry), which is why this is so minimal.

Copy link
Contributor

@juliaElastic juliaElastic May 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to keep the API response minimal to avoid extra queries, we can add more fields later if needed for other use cases.

dataset: datasetQuery ? `*${datasetQuery}*` : '*',
});

const filteredDataStreams = uncategorisedOnly
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the best name? But we need some sort of option that can differentiate between wanting all data streams or only wanting uncategorised ones, this will be important for the work within Discover.

? [
buildFunctionNode(
'nested',
`${PACKAGES_SAVED_OBJECT_TYPE}.attributes.installed_es`,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use installed_es here as other properties within the SO aren't mapped, and therefore aren't queryable. We need something top level so that we can take advantage of pagination on the Saved Objects.

@Kerry350 Kerry350 marked this pull request as ready for review May 9, 2023 14:47
@Kerry350 Kerry350 requested a review from a team as a code owner May 9, 2023 14:47
@Kerry350 Kerry350 self-assigned this May 9, 2023
@Kerry350 Kerry350 added the release_note:skip Skip the PR/issue when compiling release notes label May 9, 2023
@kpollich kpollich requested a review from juliaElastic May 9, 2023 14:56
@botelastic botelastic bot added the Team:Fleet Team label for Observability Data Collection Fleet team label May 12, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@juliaElastic juliaElastic removed the request for review from kpollich May 22, 2023 11:30
@Kerry350
Copy link
Contributor Author

Kerry350 commented May 22, 2023

@juliaElastic I think this is ready for another look (once green).

  • fleetAuthz has been added back for both routes, with readPackageInfo required.

    • getDataStreams() uses the ES client scoped to currentUser, the same as here.
    • getInstalledPackages() uses the custom internal SO client (with custom audit logging).
  • I did have some issues with the integration tests. I think there are tests that leak state between suites (in this case installed packages), as I get different results when using grep vs running the whole EPM suite. To avoid this I've added filters to the requests, but this might be worth an issue? (I initially started investigating, but the feedback loop is slow when running the whole suite).

Otherwise everything has been actioned.

@juliaElastic
Copy link
Contributor

juliaElastic commented May 22, 2023

@Kerry350 Looks great! Added a few comments.

I think we should test at least manually that the permissions work as expected. Is this the expected behavior?

  • User with Integrations/Read privilege -> Can query data streams and installed packages API
  • User without Integrations/Read privilege -> Gets a permission error.
  • Data streams API - which packages should we test with to verify that the response contains all expected results?
  • I did have some issues with the integration tests. I think there are tests that leak state between suites (in this case installed packages), as I get different results when using grep vs running the whole EPM suite. To avoid this I've added filters to the requests, but this might be worth an issue? (I initially started investigating, but the feedback loop is slow when running the whole suite).

I would appreciate if you opened an issue in kibana, we can investigate.

@Kerry350
Copy link
Contributor Author

Kerry350 commented May 22, 2023

I think we should test at least manually that the permissions work as expected. Is this the expected behavior?

Yep, I've given this a test locally manually with a specific role and user, but a second pair of eyes is appreciated.

User with Integrations/Read privilege -> Can query data streams and installed packages API

Correct 👍 For the data streams one they will also need the view_index_metadata privilege.

User without Integrations/Read privilege -> Gets a permission error.

Correct 👍

Data streams API - which packages should we test with to verify that the response contains all expected results?

I've been using elastic-package for my testing, running with ./elastic-package stack up and then manually verifying that the correct data streams are returned based on what the Saved Objects contain, using this Dev Tools query:

GET .kibana_ingest/_search
{
  "query": {
    "term": {
      "type": {
        "value": "epm-packages"
      }
    }
  }
}

I would appreciate if you opened an issue in kibana, we can investigate.

R.e. tests and this issue I think I've found the issue, although I still seem to be getting the occasional leaky state issue when running my tests.

When I did a log of the packages I was getting returned in my tests the two rogue (in the sense they had nothing to do with that file of tests) packages I could see were:

   {
      "name":"deprecated",
      "version":"0.1.0",
      "status":"installed",
      "dataStreams":[
         {
            "name":"logs-deprecated.test-*",
            "title":"test"
         }
      ]
   },
   {
      "name":"multiple_versions",
      "version":"0.3.0",
      "status":"installed",
      "dataStreams":[
         {
            "name":"logs-multiple_versions.test-*",
            "title":"test"
         }
      ]
   }

Looking at index.js these are the files that are run before get:

loadTestFile(require.resolve('./delete'));
loadTestFile(require.resolve('./list'));
loadTestFile(require.resolve('./setup'));

I managed to find the deprecated and multiple_versions packages being installed, but not uninstalled, in setup.ts. I added an after() to setup to uninstall these two packages. For the most part this seems to have fixed the issue with running the whole EPM suite vs grep for just the get tests. Occasionally I'd see something weird, but it took quite a few runs. Unfortunately because of the time it takes to run all of the EPM tests this was as reliable as I could make things :(

I'll let CI run and see what sort of results I get - I expect there to be 3 packages involved in these new tests now (and before the addition of the cleanup I'd get 5).

@juliaElastic
Copy link
Contributor

juliaElastic commented May 23, 2023

I tried to test the privileges by creating a new user + role with view_index_metadata and Integrations:Read privilege, but nothing comes back from the /data_streams API. The /packages/installed API works as expected.

The data streams exist, indexed a doc in one of the system data streams.

image
POST /logs-system.application-default/_doc
{
                  "@timestamp": "2015-01-01",
                  "logs_test_name": "test",
                  "data_stream": {
                    "dataset": "system.test_logs",
                    "namespace":"default:",
                    "type": "logs"
                  }
                }

Also tested without the Integrations privilege, and the APIs return 403 as expected.

@Kerry350
Copy link
Contributor Author

I tried to test the privileges by creating a new user + role with view_index_metadata and Integrations:Read privilege, but nothing comes back from the /data_streams API. The /packages/installed API works as expected.

Interesting 🤔 I've just double checked with the same and things are working.

Request:

Screenshot 2023-05-23 at 11 17 27

Auth:

Screenshot 2023-05-23 at 11 17 35

Index privileges:

Screenshot 2023-05-23 at 11 17 51

Kibana privileges:

Screenshot 2023-05-23 at 11 18 09

Was your request possibly missing the uncategorisedOnly=false parameter? By default this is true, so you'll only get "unmanaged" streams. Then again if you manually created a data stream, and indexed a doc, I'd assume this would be missing the meta managed properties anyway?

As a side note: maybe this should be flipped to the default being false not true?

@juliaElastic
Copy link
Contributor

Was your request possibly missing the uncategorisedOnly=false parameter? By default this is true, so you'll only get "unmanaged" streams. Then again if you manually created a data stream, and indexed a doc, I'd assume this would be missing the meta managed properties anyway?

Yes, this was it, with the parameter I get the expected results.

GET kbn:/api/fleet/epm/data_streams?uncategorisedOnly=false
{
  "items": [
    {
      "name": "logs-elastic_agent.apm_server-default"
    },
    {
      "name": "logs-system.application-default"
    },
    {
      "name": "metrics-nginx.stubstatus-1"
    }
  ]
}

As a side note: maybe this should be flipped to the default being false not true?

Yes, it sounds more intuitive to me to return all results without any parameters.

Copy link
Contributor

@juliaElastic juliaElastic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, tested locally.

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
fleet 1043 1045 +2

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
fleet 129.7KB 129.8KB +90.0B
Unknown metric groups

API count

id before after diff
fleet 1157 1159 +2

ESLint disabled line counts

id before after diff
enterpriseSearch 19 21 +2
securitySolution 399 403 +4
total +6

Total ESLint disabled count

id before after diff
enterpriseSearch 20 22 +2
securitySolution 479 483 +4
total +6

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @Kerry350

@Kerry350 Kerry350 merged commit 34f21be into elastic:main May 23, 2023
@kibanamachine kibanamachine added v8.9.0 backport:skip This commit does not require backporting labels May 23, 2023
delanni pushed a commit to delanni/kibana that referenced this pull request May 25, 2023
## Summary

Research: https://github.com/elastic/observability-dev/issues/2620
Requirements: https://github.com/elastic/observability-dev/issues/2639
Implementation: https://github.com/elastic/observability-dev/issues/2653

This implements two APIs ("Enumerate integrations" and "Enumerate data
streams") under the Fleet plugin, and specifically the `/epm` namespace
given the input here:
https://github.com/elastic/observability-dev/issues/2639#issuecomment-1521689096

The Enumerate Integrations API can be queried like so (example with all
parameters):

`GET
/api/fleet/epm/packages/installed?nameQuery=system&pageSize=5&type=logs&pageAfter=["system"]&sortDirection=asc`

The Enumerate Data Streams API can be queried like so (example with all
parameters):

`GET
/api/fleet/epm/data_streams?uncategorisedOnly=true&type=logs&sortDirection=desc&datasetQuery=beat`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v8.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants