diff --git a/CHANGELOG.md b/CHANGELOG.md index 0dd139a81baf..cd237d67232b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -28,6 +28,7 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) ### 📝 Documentation +* [MD] Add design documents of multiple data source feature [#2538](https://github.com/opensearch-project/OpenSearch-Dashboards/pull/2538) ### 🛠 Maintenance ### 🪛 Refactoring diff --git a/src/plugins/data_source/README.md b/src/plugins/data_source/README.md index cfda79a2908f..c8cad1d2717b 100755 --- a/src/plugins/data_source/README.md +++ b/src/plugins/data_source/README.md @@ -5,50 +5,62 @@ An OpenSearch Dashboards plugin This plugin introduces support for multiple data sources into OpenSearch Dashboards and provides related functions to connect to OpenSearch data sources. ## Configuration + Update the following configuration in the `opensearch_dashboards.yml` file to apply changes. Refer to the schema [here](https://github.com/opensearch-project/OpenSearch-Dashboards/blob/main/src/plugins/data_source/config.ts) for supported configurations. 1. The dataSource plugin is disabled by default; to enable it: -`data_source.enabled: true` + `data_source.enabled: true` 2. The audit trail is enabled by default for logging the access to data source; to disable it: -`data_source.audit.enabled: false` + `data_source.audit.enabled: false` - - Current auditor configuration: -``` +- Current auditor configuration: + +```yml data_source.audit.appender.kind: 'file' data_source.audit.appender.layout.kind: 'pattern' data_source.audit.appender.path: '/tmp/opensearch-dashboards-data-source-audit.log' ``` 3. The default encryption-related configuration parameters are: -``` + +```yml data_source.encryption.wrappingKeyName: 'changeme' data_source.encryption.wrappingKeyNamespace: 'changeme' -data_source.encryption.wrappingKey: [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] +data_source.encryption.wrappingKey: + [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] ``` + Note that if any of the encryption keyring configuration values change (wrappingKeyName/wrappingKeyNamespace/wrappingKey), none of the previously-encrypted credentials can be decrypted; therefore, credentials of previously created data sources must be updated to continue use. **What are the best practices for generating a secure wrapping key?** WrappingKey is an array of 32 random numbers. Read [more](https://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator) about best practices for generating a secure wrapping key. ## Public + The public plugin is used to enable and disable the features related to multi data source available in other plugins. e.g. data_source_management, index_pattern_management - Add as a required dependency for whole plugin on/off switch - Add as opitional dependency for partial flow changes control ## Server + The provided data source client is integrated with default search strategy in data plugin. When data source id presented in IOpenSearchSearchRequest, data source client will be used. ### Data Source Service -The data source service will provide a data source client given a data source id and optional client configurations. + +The data source service will provide a data source client given a data source id and optional client configurations. Currently supported client config is: + - `data_source.clientPool.size` Data source service uses LRU cache to cache the root client to improve client pool usage. + #### Example usage: + In the RequestHandler, get an instance of the client using: + ```ts client: OpenSearchClient = await context.dataSource.opensearch.getClient(dataSourceId); @@ -57,21 +69,35 @@ apiCaller: LegacyAPICaller = context.dataSource.opensearch.legacy.getClient(data ``` ### Data Source Client Wrapper + The data source saved object client wrapper overrides the write related action for data source object in order to perform validation and encryption actions of the authentication information inside data source. ### Cryptography Client + The research for choosing a suitable stack can be found in: [#1756](https://github.com/opensearch-project/OpenSearch-Dashboards/issues/1756) + #### Example usage: + ```ts //Encrypt const encryptedPassword = await this.cryptographyClient.encryptAndEncode(password); //Decrypt const decodedPassword = await this.cryptographyClient.decodeAndDecrypt(password); ``` + --- ## Development See the [OpenSearch Dashboards contributing -guide](https://github.com/opensearch-project/OpenSearch-Dashboards/blob/main/CONTRIBUTING.md) for instructions -setting up your development environment. +guide](https://github.com/opensearch-project/OpenSearch-Dashboards/blob/main/CONTRIBUTING.md) for instructions setting up your development environment. + +### Design Documents + +- [High level design doc](./docs/high_level_design.md) +- [User stories](./docs/user_stories.md) +- [client management detail design](./docs/client_management_design.md) + +### Integrate with multiple data source feature + +TODO: [#2455](https://github.com/opensearch-project/OpenSearch-Dashboards/issues/2455) diff --git a/src/plugins/data_source/docs/client_management_design.md b/src/plugins/data_source/docs/client_management_design.md new file mode 100644 index 000000000000..d97c36f4156e --- /dev/null +++ b/src/plugins/data_source/docs/client_management_design.md @@ -0,0 +1,295 @@ +# [OSD Multi Data Source] Client Management + +## 1. Problem Statement + +This design is part of the OSD multi data source project [[RFC](https://github.com/opensearch-project/OpenSearch-Dashboards/issues/1388)], where we need to manage and expose clients. Connections are established through creating OpenSearch clients. Then clients can be used by caller to interact with any data source(OpenSearch is the only data source type in scope at this phase). + +**Overall the critical problems we are solving are:** + +1. How to set up connection(clients) for different data sources? +2. How to expose data source clients to callers through clean interfaces? +3. How to maintain backwards compatibility if user turn off this feature? +4. How to manage multiple clients/connection efficiently, and not consume all the memory?(P1) + +## 2. Requirements + +1. **Accessibility**: + 1. Clients need to be accessible by other OSD plugins or modules through interfaces, in all stages of the plugin lifecycle. E.g “Setup”, and “Start” + 2. Clients should be accessible by plugin through request handler context. +2. **Client Management**: Clients needs to be reused in a resource-efficient way to not harm the performance (P1) +3. **Backwards compatibility**: if user enables this feature and later disabled it. Any related logic should be able to take in this config change, and deal with any user cases. + 1. Either switching to connect to default OpenSearch cluster + 2. Or blocking the connection to data source, and throw error message +4. **Auditing:** Need to log different user query on different data sources, for troubleshooting, or log analysis + +## 3. Architecture/Dataflow + +- We are adding a new service in core to manage data source clients, and expose interface for plugins and modules to access data source client. +- Existing OpenSearch service and saved object services is supposed to be non-affected by this change + +#### 3.1 Dataflow of plugin(use viz plugin as example) call sequence to retrieve data form any datasource. + +![img](./img/client_management_dataflow.png) + +#### 3.2 Architecture Diagram + +![img](./img/client_management_architecture.png) + +## 4. Detailed Design + +### 4.0 Answer some critical design questions + +**1.** **How to set up connection(clients) for different datasources?** +Similar to how current OSD talks to default OS by creating opensearch node.js client using [opensearch-js](https://github.com/opensearch-project/opensearch-js) library, for datasources we also create clients for each. Critical params that differentiate data sources are `url` and `auth` + +```ts +const { Client } = require(['@opensearch-project/opensearch'](https://github.com/opensearch-project/opensearch-js)); + +const **dataSourceClient** = new Client({ + node: url, + auth: { + username, + password, + }, + ...OtherClientOptions +}); + +dataSourceClient.search() +dataSourceClient.ping() +``` + +**2. How to expose datasource clients to callers through clean interfaces?** +We create an `opensearch_data_service` in core. Similar to existing `openearch_service`, which provides client of default OS cluster. This new service will be dedicated to provide clients for data sources. Following the same paradigm we can register this new service to `CoreStart`, `CoreRouteHandlerContext` , in order to expose data source client to plugins and modules. The interface is exposed from new service, and thus it doesn’t mess up with any existing services, and keeps the interface clean. + +``` +*// Existing* +*const defaultClient: OpenSearchClient = core.opensearch.client.asCurrentUser +* +// With openearch_data_services added +const dataSourceClient: OpenSearchClient = core.openearchData.client +``` + +**3.How to maintain backwards compatibility if user turn off this feature?** +Context is user can only turn off this feature by updating `osd.yml` and reboot. Configs are accessible from `ConfigService` in core. + +1. **Browser side**, is datasource feature is turned off, browser should detect the config change and update UI not allowing request to submit to datasource. If the request is not submitted to a datasource, the logic won’t return a datasource client at all. +2. **Server side**, if user submits the request to datasource manually, on purpose. Or the plugin tries to access datasource client from server side. In the corresponding core service we’ll have a **flag** that maps to the **enable_multi_datasource** boolean config, and throw error if API is called while this feature is turned off. + +**4.How to manage multiple clients/connection efficiently, and not consume all the memory?** + +- P0, we keep a map of unique clients. No size limit. +- For datasources with different endpoint, user client Pooling (E.g. LRU cache) +- For data sources with same endpoint, but different user, use connection pooling strategy (child client) provided by opensearch-js. + +### 4.1 Create `core → opensearch_data_service.ts -> class: OpenSearchDataService` + +- Extend from [OpenSearchService](https://github.com/opensearch-project/OpenSearch-Dashboards/blob/3d6dd638d021f383a4c6ab750c83a1d30d3787b3/src/core/server/opensearch/opensearch_service.ts#L60) to reuse utility functions +- Instance variables + - SavedObjectClient + - DataSourceClusterClient +- Add `savedObject` as dependency, E.g + - interface StartDeps { + savedObjects: InternalSavedObjectsServiceStart; + } +- **setup()**: + - Override setup() to not to initialize anything non-related to datasource, such as scoped client, internal client and legacy-related + - Initialize `DataSourceClient` object + - **return:** nothing +- **start():** + - **input**: `{ savedObjects, auditTrail }: StartDeps` + - Initialize saved object client. + - **return:** createOrFindClient() +- **stop():** close all datasource clients and child clients +- Other + - Register this service to related interfaces such as `CoreStart/CoreSetup` + - Create corresponding service interfaces such as `InternalOpenSearchDataServiceStart` + +### 4.2 Refactor `core → opensearch -> client` module + +Currently [`core-opensearch`](https://github.com/opensearch-project/OpenSearch-Dashboards/tree/d7004dc5b0392477fdd54ac66b29d231975a173b/src/core/server/opensearch)module contains 2 major parts. + +- **opensearch_service**: hold a `ClusterClient` instance +- **[client module](https://github.com/opensearch-project/OpenSearch-Dashboards/tree/d7004dc5b0392477fdd54ac66b29d231975a173b/src/core/server/opensearch/client)**: the utilities and interfaces for creating `ClusterClient` + - internalClient: read only. (create as OSD internal user, system user) + - ScopedClient: read only. (as current user) + - asScoped(): function that create child clients of the read only ScopeClient for current user + +We’ll only make changes in the client module + +**4.2.1 Create `IDataSourceClient` Interface** + +Similar to Existing `IClusterClient` + +```ts +export interface IClusterClient { + readonly asInternalUser: OpenSearchClient; + asScoped: (request: ScopeableRequest) => IScopedClusterClient; +} +``` + +`IDataSourceClient` represents an OpenSearch data source API client created by the platform. It allows to call API on behalf of the user defined in the datasource saved + +```ts +export interface IDataSourceClient { + asDataSource: (dataSourceId: String) => Promise; + close: () => Promise; +} +``` + +**4.2.2 Create `DataSourceClient` Class** + +- extends `IDataSourceClient` +- Add local variable **isDataSourceEnabled** + - The value of flag is mapped to the boolean configuration “enable_multi_datasource” in osd.yml. Flag to determine if feature is enabled. If turned off, any access to dataSourceClient will be rejected and throw error +- Add local variable **rootDataSourceClientCollection** + - Map (initialize as empty or take user config to add Clients) +- Implement the new function `asDataSource` as shown in above `IDataSourceClient` interface. Params and return type is clear + + - **Functionality** + - Throw error if **isDataSourceEnabled == false** + - Look up Client Pool by datasource id, return client if existed + - Use `Saved_Object` Client to retrieve datas source info from OSD system index and parse to `DataSource` object + - Call credential manager utilities to **decrypt** user credentials from `DataSource` Object + - Create Client using dataSource metadata, and decrypted user credential + - \*Optimization: If same endpoint but different user credential, we’ll leverage the openearch-js connection pooling strategy to create client by `.child()` + +### 4.3 Register datasource client to core context + +This is for plugin to access data source client via request handler. For example, by `core.client.search(params)`. It’s a very common use case for plugin to access cluster while handling request. In fact data plugin uses it in its search module to get client, and I’ll talk about it in details in next section. + +**4.3.1 Update `RequestHandlerContext` interface** + +- **param** + - **dataSourceId**: need it to retrieve **datasource info** for either creating new client, or loop up the client pool +- **return type:** OpenSearchClient + ```ts + export interface RequestHandlerContext { + core: { + savedObjects: { + client: SavedObjectsClientContract; + typeRegistry: ISavedObjectTypeRegistry; + }; + opensearch: { + client: IScopedClusterClient; + legacy: { + client: ILegacyScopedClusterClient; + }; + }; + opensearchData: { + getClient(dataSourceId: String): Promise; // method + }; + ... + }; + } + ``` + +**4.3.2 Update** **`CoreOpenSearchRouteHandlerContext`** **class** + +- Create class `CoreOpenSearchDataRouteHandlerContext` + + ```ts + class CoreOpenSearchDataRouteHandlerContext { + constructor( + private readonly opensearchDataStart: InternalOpenSearchDataServiceStart + ) {} + + public getClient(dataSourceId: string) { + return async () => { + try { + await this.opensearchDataStart.client.asDataSource(dataSourceId) + } + } + } + ``` + +- Register to `CoreRouteHandlerContext` + + ```ts + export class CoreRouteHandlerContext { + #auditor?: Auditor; + + readonly opensearch: CoreOpenSearchRouteHandlerContext; + readonly savedObjects: CoreSavedObjectsRouteHandlerContext; + readonly uiSettings: CoreUiSettingsRouteHandlerContext; + **readonly dataSource CoreOpenSearchDataRouteHandlerContext** + + constructor( + private readonly coreStart: InternalCoreStart, + private readonly request: OpenSearchDashboardsRequest + ) { + this.savedObjects = new CoreSavedObjectsRouteHandlerContext( + this.coreStart.savedObjects, + this.request + ); + this.opensearch = new CoreOpenSearchRouteHandlerContext( + this.coreStart.opensearch, + this.request, + ); + this.uiSettings = new CoreUiSettingsRouteHandlerContext( + this.coreStart.uiSettings, + this.savedObjects + ); + this.dataSource = new CoreOpenSearchDataRouteHandlerContext( + this.coreStart.opensearchData + ) + } + + ``` + +### 4.4 Refactor data plugin search module to call core API to get datasource client + +`Search strategy` is the low level API of data plugin search module. It retrieve clients and query OpenSearch. It needs to be refactored to switch between default client and datasource client, depending on whether a request is send to datasource or not. + +Currently default client is retrieved by search module of data plugin to interact with OpenSearch by this API call. Ref: [opensearch-search-strategy.ts](https://github.com/opensearch-project/opensearch-dashboards/blob/e3b34df1dea59a253884f6da4e49c3e717d362c9/src/plugins/data/server/search/opensearch_search/opensearch_search_strategy.ts#L75) + +```ts +const client: OpenSearchClient = core.opensearch.client.asCurrentUser; +// use API provided by opensearch-js lib to interact with OpenSearch +client.search(params); +``` + +Similarly we’ll have the following for datasource use case. `AsCurrentUser` is something doesn’t make sense for datasource, because it’s always the “current” user credential defined in the “datasource”, that we are using to create the client, or look up the client pool. + +```ts +if (request.dataSource) { + await client: OpenSearchClient = + core.opensearchData.getClient() +} else { +// existing logic to retrieve default client + client: OpenSearchClient = core.opensearch.client.asCurrentUser +} + +// use API provided by opensearch-js lib to interact with OpenSearch +client.ping() +client.search(params) +``` + +### 4.5 Client Management + +When loading a dashboard with visualizations, each visualization sends at least 1 request to server side to retrieve data. With multiple data source feature enabled, multiple requests are being sent to multiple datasources, that requires multiple clients. If we return a new client **per request**, it will soon fill up the memory and sockets with idle clients hanging there. Of course we can close a client anytime. But the connection is supposed to be kept alive for easy reload and periodic pulling data. Therefore, we should come up with better solution to manage clients efficiently. + +#### P0: **const dataSourceClientPool = Map()** + +- Keep all datasource clients in a Map +- Map enables easy look up. The input for getting a data source client is `dataSourceId`. If a client was created with same datasource, we can easily find it and return to caller. Otherwise we create a new client to return to caller, and add to the Map. +- While stopping the service, we can close all the connections by looping the Map and calling `client.close()` for each. +- For data sources with same endpoint, but different user, use connection pooling strategy (child client) provided by opensearch-js. + +#### P1: Client pooling by LRU cache + +- key: data source endpoint +- value: OpenSearch client object +- configurable pool size +- use existing js lru-cache lib + +```ts +import LRUCache from 'lru-cache'; + +export class OpenSearchClientPool { + private cache?: LRUCache + ... +``` + +## 5. Audit & Logging + +[#1986](https://github.com/opensearch-project/OpenSearch-Dashboards/issues/1986) diff --git a/src/plugins/data_source/docs/high_level_design.md b/src/plugins/data_source/docs/high_level_design.md new file mode 100644 index 000000000000..da8152f3019e --- /dev/null +++ b/src/plugins/data_source/docs/high_level_design.md @@ -0,0 +1,147 @@ +# OSD Multiple Data Source Support HLD + +OpenSearch Dashboards is designed and implemented to only work with one single OpenSearch cluster. This documents discusses the design to enable OpenSearch Dashboards to work with multiple OpenSearch endpoints, which can be a centralized data visualization and analytics application. + +For more context, see RFC [Enable OpenSearch Dashboards to support multiple OpenSearch clusters](https://github.com/opensearch-project/OpenSearch-Dashboards/issues/1388) + +## User Stories + +[OpenSearch Dashboards Multiple OpenSearch Data Source Support User Stories](user_stories.md) + +From very high level, we propose to introduce `data-source` as a new OSD saved object type. + +## Terminologies + +- **Dashboards metadata**: refers to data documents saved in the opensearch_dashboards index. Equivalent to Dashboards **saved objects**. +- **User data**: in this document, user data refers to the log, metrics or search catalog data that saved in OpenSearch, users run analysis against these user data with OpenSearch Dashboards. +- **Data source**: an Opensearch endpoint, it could be a on-prem cluster, or AWS managed OpenSearch domain or a serverless collection, which stores the user log/metrics data for visualization and analytics purpose. + - in this document, we may also refer data source as a new type of OSD saved objects, which is a data model to describe a data source, including endpoint, auth info, capabilities etc. + +## Scope + +We are targeting to release the multiple data source support in OpenSearch 3.0 preview as an experimental feature, and make it GA over a few minor version throughout 3.x versions. + +### Preview Scope + +- data source only support basic authentication with OpenSearch + - API key, JWT, Sigv4 and other auth types are out of scope +- data source will only work with visualizations, and discover + - plugins like AD/Alerting/ISM doesn’t work with data source + - DevTool console maybe in scope depending on the progress and resource + - Observability visualizations are out of scope +- data source support can be enabled/disable based on config in OSD yml config file +- multiple data source project doesn’t change existing security experience + - e.g. if a user have access to a security tenant, they will be able to use the data sources defined in that tenant + +### GA Scope + +- Support all Elasticsearch 7.10 DSL/API compatible data sources, including customer self managed Elasticsearch 7.10, OpenSearch 3.x clusters, AWS managed OpenSearch and Elasticsearch 7.10 domains. OpenSearch Serverless collections. + - Support Basic auth, AWS SigV4 signing with Data sources +- OpenSearch Dashboards plugins such as Alerting/AD etc. can work with each data source depending on the data source capability +- Observability visualizations are out of scope +- Support of different (major) versions of ES/OpenSearch data sources is out of scope + +## Requirements + +### Functional requirements + +- OSD users should be able to dynamically add/view/update/remove OpenSearch data sources using UI and API +- OSD users should be able to save/update/remove credentials( username/password in preview, and AWS Sigv4 in GA) +- OSD users can create index pattern with specific data source +- Data source credentials should be handled securely +- OSD users can put data visualizations of different data sources into one dashboard +- OpenSearch analytics and management functions (such as AD, ISM and security) can work with specific data source to manage those functions in corresponding data source + - such as user can choose a data source and then edit/view Anomaly detectors and security roles with OpenSearch Dashboards +- OSD should be able to work with self managed and AWS managed + +### Limitations + +- One index pattern can only work with one data source +- One visualization will still only work with one index pattern +- Plugins like AD and alerting will only work with one data source at any point of time + +## Design + +### Introducing data source saved object model + +Generally, OSD works with 2 kinds of data: + +1. User data, such as application logs, metrics, and search catalog data in data indices. +2. OSD metadata, which are the saved objects in `.opensearch_dashboards` index + +Currently both OSD metadata and user data indices are saved in the same OpenSearch cluster. However in the case to support OSD to work with multiple OpenSearch data sources, OSD metadata index will be stored in one OpenSearch cluster, and user data indices will be saved in other OpenSearch clusters. Thus we will need to differentiate OSD metadata operations and user data access. + +OSD admin will still define an OpenSearch cluster in the `opensearch.host` config in `opensearch_dashboards.yml` file. It will be used as the OSD metadata store, and OSD metadata will still be saved in the `.opensearch_dashboards index` in this OpenSearch cluster. + +Regarding the user data access, we propose to add a new “data-source” saved objects type, which describes a data source connection, such as + +- cluster endpoint +- auth info, like auth types and credentials to use when accessing the data source +- data source capabilities, such as if the data source supports AD/ISM etc. + +Users can dynamically add data source in OSD using UI or API, OSD will save the data source saved objects in its metadata index. And then users can do their with with their data sources. For example, when OSD needs to access user data on behalf of the customer, customer will need to specify a data source id, then OSD can fetch the data source info from its metadata store, then send the request to the corresponding data source endpoint. + +So the Dashboards and OpenSearch setup may look like:![img](./img/hld_setup_diagram.png) + +Refer to the proposed solution in [#1388](https://github.com/opensearch-project/OpenSearch-Dashboards/issues/1388) for the data modeling of data source + +### Data source integration + +[opensearch_service](https://github.com/opensearch-project/OpenSearch-Dashboards/tree/main/src/core/server/opensearch) is one of the core modules of OpenSearch Dashboards, it is a singleton instance in OSD which manages OSD connection with the backend OpenSearch endpoint. It makes another level of abstraction of OpenSearch client, and provide a set of interfaces for other OSD modules and plugins to interact with OpenSearch for example running DSL queries, or calling arbitrary OpenSearch APIs. + +Currently, OSD only works with one OpenSearch cluster, OSD metadata index and user data indices are stored in the same OpenSearch cluster. So the OSD [saved object service](https://github.com/opensearch-project/OpenSearch-Dashboards/tree/main/src/core/server/saved_objects), which the core OSD module handles all OSD metadata operations, also relies on `opensearch_service` interfaces to work with OpenSearch. + +With multi-datasource, we will need to diverge the `opensearch_service` for these 2 use cases. We propose to fork a new `metadata_client` from existing `opensearch_service` to manage the metadata store connection, so that `saved_objects_service` can use `metadata_client` to perform saved objects operations. And then we repurpose the `opensearch_service` to serve the user data access use cases. The new `opensearch_service` needs will expose following interface to allow other OSD components to interact with a specific data source cluster. + +``` +core.opensearch.withDataSource().callAsCurrentUser(searchParams) +``` + +OSD plugins like data plugin, alerting plugin will need to introduce the data source concept into their use case, letting users to specify a data source when using their functions, and then switch to this new opensearch interface when calling OpenSearch APIs or executing queries. + +### Multi-datasource support in visualizations + +Current OpenSearch Dashboards has 3 major saved object types, index pattern, visualization and Dashboards + +- Visualization works starts with index pattern. An index pattern is a level of data abstraction. Index pattern describes a set of data indices, and their data schema. +- OSD users can create data visualizations against an index pattern. A visualization includes the OpenSearch SDL query, aggregation and a reference to an index pattern, as well as graph metadata such as legend and labels. When rendering a visualization graph, the visualization executes the query& aggregation against that specific index pattern, and draw the graph according to graph settings. +- Then OSD users can place a set of visualizations into a dashboard. A OSD dashboards describes the layout and control (time picker, field filters) of all visualizations on the dashboard. + +To support multiple data source in OSD, we will add “data source” to index pattern model. One index pattern will have a `dataSourceId` field, so that it refers to a data source. + +An index pattern can only refer to one data source, one data source can be used by multiple index patterns. +With this new “data source” reference in index pattern, OSD users will need to create data sources in OSD, then select a data source when creating index patterns. Then the visualization and dashboard creation experience will remain the same. + +For OSD multiple data source user experience, refer to [OpenSearch Dashboards Multiple OpenSearch Data Source Support User Stories](https://quip-amazon.com/VXQ0AhpPs3gU) + +The OSD visualization rendering flow will look like following with multi-datasource support: ![image](./img/hld_vis_flow.png) + +### Backward Compatibility + +We plan to release this multi-datasource support as an experimental feature with OpenSearch 3.0. OpenSearch Dashboards admins will be able to enable or disable the multi-datasource feature using configurations in `opensearch_dashboards.yml` . + +If multi-datasource is enabled, OSD users will be able to see all data source related feature and APIs, that they can manage their data sources, and build visualization and dashboards with data sources. While if multi-datasource is disabled, users will not see anything related to data sources, and their OSD experience will remain the same as single data source. + +If OSD admin enables multi-datasource for an existing OSD service, users will still able to use their existing index patterns and visualizations, which will by default fetch data from the same endpoint as their metadata store. + +If an OSD service has enabled multi-datasource, and it already has index pattern with remote data source created, admin will not able to disable multi-datasource feature. OSD will fail to start if it detected data source in the saved object while multi-datasource is disabled. + +### Security + +#### Data source access control + +Multi-datasource project doesn’t plan to change the security (authN & authZ) controls for OSD. The `data-source` is a new type of saved objects, so the access control of `data source` will follow the same way as other saved objects such as index patterns and visualizations. + +Based on existing OpenSearch and OSD security implementations, OSD saved objects access control is implemented via `security tenants`. OpenSearch users are mapped to a set of roles, and each role has corresponding permission to access certain tenants. If a user has permission to access a tenant, they will be able to access all saved objects in that tenant. With this mechanism, if a user created a data source in a shared tenant, other users who has access to that shared tenant will be able to see the data source object and see/create visualizations with the data source. + +#### Data source credential handling + +Credentials is part of the data source object, and will be saved in OSD metadata index. OSD will use that credentials to authenticate with the data source when executing queries. This credentials will need to be encrypted regardless OSD has access control or not. + +We will use a symmetric key to encrypt the credentials before saving data source into OSD metadata index, and use the same key to decrypt it when OSD needs to authenticate with corresponding data source. For open source release, we will allow admins to configure the encryption key in the `opensearch_dashboards.yml` file. And we will also provide the option to integrate with a key store, such as AWS KMS, to use keys with the key store. + +For more about credential encryption/decryption strategy, refer to [#1756](https://github.com/opensearch-project/OpenSearch-Dashboards/issues/1756) + +#### Auditing + +As part of the security effort, OSD needs to support the logging for all use of data sources, so that admins can have a clear view of which OSD user accessed data source, and queried data from that data source. The audit log could be saved in the metadata store, or local logs for potential auditing work. diff --git a/src/plugins/data_source/docs/img/client_management_architecture.png b/src/plugins/data_source/docs/img/client_management_architecture.png new file mode 100644 index 000000000000..f99b52a38c4c Binary files /dev/null and b/src/plugins/data_source/docs/img/client_management_architecture.png differ diff --git a/src/plugins/data_source/docs/img/client_management_dataflow.png b/src/plugins/data_source/docs/img/client_management_dataflow.png new file mode 100644 index 000000000000..955a4a0132bb Binary files /dev/null and b/src/plugins/data_source/docs/img/client_management_dataflow.png differ diff --git a/src/plugins/data_source/docs/img/cm_flow.png b/src/plugins/data_source/docs/img/cm_flow.png new file mode 100644 index 000000000000..97eab85f03bc Binary files /dev/null and b/src/plugins/data_source/docs/img/cm_flow.png differ diff --git a/src/plugins/data_source/docs/img/dsm_flow.png b/src/plugins/data_source/docs/img/dsm_flow.png new file mode 100644 index 000000000000..98ea5b3619ac Binary files /dev/null and b/src/plugins/data_source/docs/img/dsm_flow.png differ diff --git a/src/plugins/data_source/docs/img/hld_setup_diagram.png b/src/plugins/data_source/docs/img/hld_setup_diagram.png new file mode 100644 index 000000000000..15854999b395 Binary files /dev/null and b/src/plugins/data_source/docs/img/hld_setup_diagram.png differ diff --git a/src/plugins/data_source/docs/img/hld_vis_flow.png b/src/plugins/data_source/docs/img/hld_vis_flow.png new file mode 100644 index 000000000000..08bf027ffc10 Binary files /dev/null and b/src/plugins/data_source/docs/img/hld_vis_flow.png differ diff --git a/src/plugins/data_source/docs/user_stories.md b/src/plugins/data_source/docs/user_stories.md new file mode 100644 index 000000000000..6601c477d8dd --- /dev/null +++ b/src/plugins/data_source/docs/user_stories.md @@ -0,0 +1,73 @@ +# OpenSearch Dashboards Multiple OpenSearch Data Source Support User Stories + +Today, OpenSearch Dashboards (OSD) can only connect to one single OpenSearch cluster by configuring the cluster endpoint in the `opensearch_dashboards.yml` config file. We want to allow OSD users to dynamically add/update/remove OpenSearch compatible endpoints, and then do their analytics work with data in those OpenSearch data stores. + +RFC: https://github.com/opensearch-project/OpenSearch-Dashboards/issues/1388 + +This document discusses the user experience of the OSD multiple data source support. + +## User Story + +### Current user experience + +- OpenSearch Dashboards admin setup the OSD service and configure the OpenSearch endpoint in `opensearch_dashboards.yml` + - Both the OSD metadata index (`opensearch_dashboards` index) and data indices are saved in the same OpenSearch cluster +- OSD users can work with visualizations, usually they will + - Create/update index patterns + - Create/update visualization, each visualization is built on top of one index pattern + - Create/update dashboard using a group of visualizations + - Run adhoc queries against an index pattern using discover feature + - View index patterns/visualization/dashboards +- OSD users can work with analytics functions, such as Alerting/AD etc + +### Expected user experience with multiple data source + +We are planning to introduce a new `data-source` model, to describe an OpenSearch data source, and letting index pattern to refer to a `data-source`. + +- OpenSearch Dashboards admin setup the OSD service and configure the OpenSearch **metadata store endpoint** in `opensearch_dashboards.yml` + - the metadata store OpenSearch cluster only saves the `opensearch_dashboards` index, data indices can be saved in other OpenSearch stores +- Users will need to have a data-source before they can do any visualization or analytics work with OSD + - Users can create/update/view data sources + - Users need to specify a data source when creating new index patterns, data source is not mutable after index pattern is created + - Create/update visualization and dashboards experience remains the same as is today. + - View index patterns/visualization/dashboards experience remains the same as is today. +- When users want to work with analytics features like AD and alerting. they need to specify a data source to work with. (We may consider to add default data source concept) + +## UI Change + +This multiple data source support and introduction of data source model requires several UI changes on OSD + +### Data source management + +![img](./img/dsm_flow.png) + +Data source, as a new saved object type, should have a management page, like index pattern. + +We will need to + +- add a new data source entry in the stack management Nav app, with a data source list table +- a data source detail page, to show detailed information of a specific data source, such as URL, auth type, endpoint capabilities etc. + +### Credential management + +![img](./img/cm_flow.png) + +Credential is used to establish an authenticated connection to other data source. Typical credentials are User/Password for basic auth. IAM auth for AWS specific authentication. + +Credential management is provided way for user to add/edit/remove the credential to be used in datasource management + +### Index Pattern + +- Index pattern creation flow: With the data sources, users will need to specify which data source to use when creating a new index pattern. +- Index pattern detail page: On the index pattern detail page, we will need to show which data source this index pattern uses +- Data source selector for plugins: when OSD users working with analytics functions like Alerting and AD, we will want to allow users to switch between data sources + +## Appendix + +### Data source security + +For the initial launch with OpenSearch 3.0 preview, we do not plan to change security design of OpenSearch. + +Users need to provide endpoint URL, username and password when creating a data source. OSD service will encrypt the username and password when storing it into metadata store. + +Data source is a new type of OSD saved objects. In current OpenSearch security model, access control on data source document is the same as other saved objects documents. Basically data source docs will be accessible by any user who has access to the tenant.