Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate the search functionality with the datahub api end point #2983

Closed
1 task
PriyaBasker23 opened this issue Jan 17, 2024 · 6 comments
Closed
1 task
Assignees
Labels

Comments

@PriyaBasker23
Copy link
Contributor

PriyaBasker23 commented Jan 17, 2024

User Story

As a developer, facilitate the implementation of search functionality to link with the datahub, retrieving a list of tables.

Value / Purpose

The central aim is to empower users with the capability to search and locate items effectively.

Hypothesis

Enabling this functionality will aid in testing the connection to the datahub and obtaining the desired search results.

Proposal

  1. Implement search functionality by directly calling the datahub API for initial results.
  2. Optionally, consider using a Python library for ease, so to integrate it into the FastAPI layer later.
    ** Note: Following is not necessary but for later**
    a. Develop a dedicated endpoint in FastAPI to handle the search functionality.
    b. Invoke the Python function from the created endpoint to ensure seamless integration.
  3. Keep the approach flexible, allowing for iterative improvements and adjustments as needed.

Definition of done

  • Functional search capability intended for utilised in the frontend.
@PriyaBasker23 PriyaBasker23 converted this from a draft issue Jan 17, 2024
@MatMoore MatMoore self-assigned this Jan 19, 2024
@MatMoore MatMoore moved this from Todo to In Progress in Data Catalogue Jan 19, 2024
@MatMoore
Copy link
Contributor

MatMoore commented Jan 19, 2024

DatahubGraph docs:
https://datahubproject.io/docs/python-sdk/clients#datahub.ingestion.graph.client.DataHubGraph

This is intended for ingestion only so I don't think the built in methods (e.g. get_search_results) are useful to us.

We can drop down to the GraphQL layer with

execute_graphql(query, variables=None)
Parameters:
query (str)

variables (Optional[Dict])

Or we can avoid DataHubGraph entirely and use a plain GraphQL library https://graphql.org/code/#python-client

The endpoint we want to be using is https://datahubproject.io/docs/graphql/queries/#searchacrossentities

We will need to pass in filter information, sort order, pagination information.

We should expect to get back

  • search results
  • total number
  • search suggestions
  • some aggregations (TBC)

There are some search options here - will start with the defaults though
https://datahubproject.io/docs/graphql/inputObjects#searchflags

@MatMoore
Copy link
Contributor

I'm planning to make the search method reasonably flexible (i.e. return a lot of details about each search result) so we can experiment with the frontend side, and then we can remove stuff we don't need later to optimise the query.

Some reckons

  • Search results could return datasets or data products
  • Search results should include every property that we set when registering data products & tables
  • Search results should return edited descriptions if available (see editableProperties field)
  • Search results should contain any breadcrumb information (list of names and urns)
  • The query should return everything needed to render pagination

Here's a starter query that combines dataset results with data product results

Request

            {
                searchAcrossEntities(
                    input: {types: [DATASET,DATA_PRODUCT], query: "*", start: 0, count: 10}
                ) {
                    start
                    count
                    total
                    searchResults {
                    entity {
                        type
                        ... on Dataset {
                            urn
                            type
                            platform {
                                name
                            }
                            name
                            properties {
                                name
                                qualifiedName
                                description
                                customProperties {
                                    key
                                    value
                                }
                                created
                                lastModified
                            }
                            editableProperties {
                                description
                            }
                            browsePathV2 {
                                path {
                                    name
                                    entity {
                                        urn
                                    }
                                }
                            }
                            tags {
                                tags {
                                    tag {
                                        urn
                                        properties {
                                            name
                                            description
                                        }
                                    }
                                }
                            }
                            lastIngested
                            domain {
                                domain {
                                    urn
                                    id
                                    properties {
                                        name
                                        description
                                    }
                                }
                            }
                        }
                        ... on DataProduct {
                            urn
                            type
                            properties {
                                name
                                description
                                customProperties {
                                    key
                                    value
                                }
                                numAssets
                            }
                            domain {
                                domain {
                                    urn
                                    id
                                    properties {
                                        name
                                        description
                                    }
                                }
                            }
                            tags {
                                tags {
                                    tag {
                                        urn
                                        properties {
                                            name
                                            description
                                        }
                                    }
                                }
                            }
                        }
                    }
                    }
                }
            }

Response

{
  "searchAcrossEntities": {
    "start": 0,
    "count": 10,
    "total": 10000,
    "searchResults": [
      {
        "entity": {
          "type": "DATA_PRODUCT",
          "urn": "urn:li:dataProduct:6cc5cbc4-c002-42c3-b80b-ed55df17d39f",
          "properties": {
            "name": "Use of force",
            "description": "Prisons in England and Wales are required to record all instances of Use of Force within their establishment. Use of Force can be planned or unplanned and may involve various categories of control and restraint (C&R) techniques such as physical restraint or handcuffs.\n\nPlease refer to [PSO 1600](https://www.gov.uk/government/publications/use-of-force-in-prisons-pso-1600) for the current guidance.",
            "customProperties": [],
            "numAssets": 7
          },
          "domain": {
            "domain": {
              "urn": "urn:li:domain:3dc18e48-c062-4407-84a9-73e23f768023",
              "id": "3dc18e48-c062-4407-84a9-73e23f768023",
              "properties": {
                "name": "HMPPS",
                "description": "HMPPS is an executive agency that carries out sentences given by the courts, in custody and the community, and rehabilitates people through education and employment."
              }
            }
          },
          "tags": {
            "tags": [
              {
                "tag": {
                  "urn": "urn:li:tag:custody",
                  "properties": {
                    "name": "custody",
                    "description": "Data about prisons and prisoners. Not just NOMIS!"
                  }
                }
              }
            ]
          }
        }
      },
      {
        "entity": {
          "type": "DATA_PRODUCT",
          "urn": "urn:li:dataProduct:common-platform",
          "properties": {
            "name": "common-platform",
            "description": null,
            "customProperties": [],
            "numAssets": 231
          },
          "domain": {
            "domain": {
              "urn": "urn:li:domain:8bc42de3-eba1-4fdc-8842-17a0d4d0fda3",
              "id": "8bc42de3-eba1-4fdc-8842-17a0d4d0fda3",
              "properties": {
                "name": "HMCTS",
                "description": "HM Courts and Tribunals Service is responsible for the administration of criminal, civil and family courts and tribunals in England and Wales."
              }
            }
          },
          "tags": null
        }
      },
      {
        "entity": {
          "type": "DATA_PRODUCT",
          "urn": "urn:li:dataProduct:contracts",
          "properties": {
            "name": "contracts",
            "description": null,
            "customProperties": [],
            "numAssets": 18
          },
          "domain": {
            "domain": {
              "urn": "urn:li:domain:edc1ed2e-b5b2-4a0f-a8d1-ce123d2ebd3c",
              "id": "edc1ed2e-b5b2-4a0f-a8d1-ce123d2ebd3c",
              "properties": {
                "name": "HR",
                "description": null
              }
            }
          },
          "tags": null
        }
      },
      {
        "entity": {
          "type": "DATA_PRODUCT",
          "urn": "urn:li:dataProduct:delius",
          "properties": {
            "name": "delius",
            "description": null,
            "customProperties": [],
            "numAssets": 496
          },
          "domain": {
            "domain": {
              "urn": "urn:li:domain:8bc42de3-eba1-4fdc-8842-17a0d4d0fda3",
              "id": "8bc42de3-eba1-4fdc-8842-17a0d4d0fda3",
              "properties": {
                "name": "HMCTS",
                "description": "HM Courts and Tribunals Service is responsible for the administration of criminal, civil and family courts and tribunals in England and Wales."
              }
            }
          },
          "tags": null
        }
      },
      {
        "entity": {
          "type": "DATA_PRODUCT",
          "urn": "urn:li:dataProduct:f3416fac-2ad1-4b61-9a92-fbcb48649bc3",
          "properties": {
            "name": "External movements",
            "description": "Data from Book a Secure Move - movements to and from prisons. This includes transfers between prisons, court appearances, release on temporary license (RoTL), movements to and from police custody and any other type of movement supported by the BaSM service.\n\nA **movement** will involve one or more **journey**, each with a \"from\" and \"to\" location (prison, police station, court etc.). Journeys have **people** associated with them, and people have a **profile**. [Movements](https://datahub.apps-tools.development.data-platform.service.justice.gov.uk/dataset/urn:li:dataset:(urn:li:dataPlatform:glue,hmpps_book_secure_move_api_prod.moves,PROD)) and [journeys](https://datahub.apps-tools.development.data-platform.service.justice.gov.uk/dataset/urn:li:dataset:(urn:li:dataPlatform:glue,hmpps_book_secure_move_api_prod.journeys,PROD)) will trigger several **events**.\n\nNote that movements will include remand or unsentenced prisoners.",
            "customProperties": [],
            "numAssets": 10
          },
          "domain": {
            "domain": {
              "urn": "urn:li:domain:3dc18e48-c062-4407-84a9-73e23f768023",
              "id": "3dc18e48-c062-4407-84a9-73e23f768023",
              "properties": {
                "name": "HMPPS",
                "description": "HMPPS is an executive agency that carries out sentences given by the courts, in custody and the community, and rehabilitates people through education and employment."
              }
            }
          },
          "tags": {
            "tags": [
              {
                "tag": {
                  "urn": "urn:li:tag:custody",
                  "properties": {
                    "name": "custody",
                    "description": "Data about prisons and prisoners. Not just NOMIS!"
                  }
                }
              }
            ]
          }
        }
      },
      {
        "entity": {
          "type": "DATA_PRODUCT",
          "urn": "urn:li:dataProduct:family_derived",
          "properties": {
            "name": "family_derived",
            "description": null,
            "customProperties": [],
            "numAssets": 6
          },
          "domain": {
            "domain": {
              "urn": "urn:li:domain:8bc42de3-eba1-4fdc-8842-17a0d4d0fda3",
              "id": "8bc42de3-eba1-4fdc-8842-17a0d4d0fda3",
              "properties": {
                "name": "HMCTS",
                "description": "HM Courts and Tribunals Service is responsible for the administration of criminal, civil and family courts and tribunals in England and Wales."
              }
            }
          },
          "tags": null
        }
      },
      {
        "entity": {
          "type": "DATA_PRODUCT",
          "urn": "urn:li:dataProduct:familyman",
          "properties": {
            "name": "familyman",
            "description": null,
            "customProperties": [],
            "numAssets": 36
          },
          "domain": {
            "domain": {
              "urn": "urn:li:domain:8bc42de3-eba1-4fdc-8842-17a0d4d0fda3",
              "id": "8bc42de3-eba1-4fdc-8842-17a0d4d0fda3",
              "properties": {
                "name": "HMCTS",
                "description": "HM Courts and Tribunals Service is responsible for the administration of criminal, civil and family courts and tribunals in England and Wales."
              }
            }
          },
          "tags": null
        }
      },
      {
        "entity": {
          "type": "DATA_PRODUCT",
          "urn": "urn:li:dataProduct:familyman_cases",
          "properties": {
            "name": "Family court cases",
            "description": "Data from FamilyMan - the Family Courts case management system.\n\nContains data on **cases**, **people** and the **events** associated with cases.",
            "customProperties": [
              {
                "key": "product-source",
                "value": "datahub-cli-test"
              }
            ],
            "numAssets": 3
          },
          "domain": {
            "domain": {
              "urn": "urn:li:domain:8bc42de3-eba1-4fdc-8842-17a0d4d0fda3",
              "id": "8bc42de3-eba1-4fdc-8842-17a0d4d0fda3",
              "properties": {
                "name": "HMCTS",
                "description": "HM Courts and Tribunals Service is responsible for the administration of criminal, civil and family courts and tribunals in England and Wales."
              }
            }
          },
          "tags": {
            "tags": [
              {
                "tag": {
                  "urn": "urn:li:tag:court",
                  "properties": null
                }
              },
              {
                "tag": {
                  "urn": "urn:li:tag:adoption",
                  "properties": null
                }
              },
              {
                "tag": {
                  "urn": "urn:li:tag:test",
                  "properties": {
                    "name": "test",
                    "description": "dsffsd"
                  }
                }
              }
            ]
          }
        }
      },
      {
        "entity": {
          "type": "DATA_PRODUCT",
          "urn": "urn:li:dataProduct:fines_enforcement",
          "properties": {
            "name": "fines_enforcement",
            "description": null,
            "customProperties": [],
            "numAssets": 8
          },
          "domain": {
            "domain": {
              "urn": "urn:li:domain:8bc42de3-eba1-4fdc-8842-17a0d4d0fda3",
              "id": "8bc42de3-eba1-4fdc-8842-17a0d4d0fda3",
              "properties": {
                "name": "HMCTS",
                "description": "HM Courts and Tribunals Service is responsible for the administration of criminal, civil and family courts and tribunals in England and Wales."
              }
            }
          },
          "tags": null
        }
      },
      {
        "entity": {
          "type": "DATA_PRODUCT",
          "urn": "urn:li:dataProduct:mags_curated",
          "properties": {
            "name": "mags_curated",
            "description": null,
            "customProperties": [],
            "numAssets": 12
          },
          "domain": {
            "domain": {
              "urn": "urn:li:domain:8bc42de3-eba1-4fdc-8842-17a0d4d0fda3",
              "id": "8bc42de3-eba1-4fdc-8842-17a0d4d0fda3",
              "properties": {
                "name": "HMCTS",
                "description": "HM Courts and Tribunals Service is responsible for the administration of criminal, civil and family courts and tribunals in England and Wales."
              }
            }
          },
          "tags": null
        }
      }
    ]
  }
}

@MatMoore
Copy link
Contributor

Missing functionality from ingestion:

  1. Owners and maintainers (see https://datahubproject.io/docs/graphql/enums/#ownershiptype)
  2. Custom properties (dpiaRequired, retentionPeriod, email, version)

@MatMoore MatMoore moved this from In Progress to Done in Data Catalogue Jan 25, 2024
@tom-webber tom-webber added this to the Search and filtering working in custom DataHub front-end milestone Jan 25, 2024
Copy link
Contributor

This issue is being marked as stale because it has been open for 60 days with no activity. Remove stale label or comment to keep the issue open.

@github-actions github-actions bot added the stale label Mar 26, 2024
Copy link
Contributor

github-actions bot commented Apr 2, 2024

This issue is being closed because it has been open for a further 7 days with no activity. If this is still a valid issue, please reopen it, Thank you!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done ✅
Development

No branches or pull requests

4 participants