-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Performance DB Table Construction Documentation #139
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
186 changes: 186 additions & 0 deletions
186
docs/data-operations-manual/Reference/Performance-db-tables-construction.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,186 @@ | ||
## Performance Database Table Construction Documentation | ||
|
||
This is documentation that explains the construction of the tables in the [performance database](https://datasette.planning.data.gov.uk/performance/), including details on base tables, sources of columns. | ||
|
||
The Performance Database is designed to store and analyze performance-related metrics extracted from the source database [digital_land](https://datasette.planning.data.gov.uk/digital-land). The tables of this DB are: | ||
|
||
>1. `endpoint_dataset_issue_type_summary` | ||
>2. `endpoint_dataset_resource_summary` | ||
>3. `endpoint_dataset_summary` | ||
>4. `provision_summary` | ||
>5. `reporting_historic_endpoints` | ||
>6. `reporting_latest_endpoints` | ||
|
||
### 1. Table: [endpoint_dataset_issue_type_summary](https://datasette.planning.data.gov.uk/performance/endpoint_dataset_issue_type_summary) | ||
|
||
**Purpose**: To summarize issues associated with each dataset and its associated endpoint for various organisation. | ||
|
||
**Base Tables**: | ||
- `issue`: Contains records of issues related to resources sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `resource`: Holds information about the resources linked to the endpoints sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `issue_type`: Defines types of issues sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
|
||
**Columns**: | ||
- `organisation`: Extracted from the `provision` table. | ||
- `organisation_name`: Extracted from the `provision` table. | ||
- `cohort`: Extracted from the `provision` table. | ||
- `dataset`: Extracted from the `provision` table. | ||
- `collection`: Extracted from the `reporting_historic_endpoints` table. | ||
- `pipeline`: Extracted from the `reporting_historic_endpoints` table. | ||
- `endpoint`: Extracted from the `reporting_historic_endpoints` table. | ||
- `endpoint_url`: Extracted from the `reporting_historic_endpoints` table. | ||
- `resource`: Extracted from the `reporting_historic_endpoints` table. | ||
- `resource_start_date`: Extracted from the `reporting_historic_endpoints` table. | ||
- `resource_end_date`: Extracted from the `reporting_historic_endpoints` table. | ||
- `latest_log_entry_date`: Extracted from the `reporting_historic_endpoints` table. | ||
- `count_issues`: Count of issues from the `issue` table. | ||
- `date`: Current date when the query is executed. | ||
- `issue_type`: Type of issues from the `issue_type` table. | ||
- `severity`: Severity of issues from the `issue_type` table. | ||
- `responsibility`: Responsibility assigned to the issues from the `issue_type` table. | ||
- `fields`: Concatenated fields from `issue` table. | ||
|
||
|
||
### 2. Table: [endpoint_dataset_resource_summary](https://datasette.planning.data.gov.uk/performance/endpoint_dataset_resource_summary) | ||
|
||
**Purpose**: To summarize resources associated with endpoints, including mapping and non-mapping fields. Mapping fields represent those that require conversion to align with the internal system's accepted field names (e.g., converting ID to reference), while non-mapping fields already match the required names and can be accepted directly without modification. | ||
|
||
**Base Tables**: | ||
- `provision`: Contains information about datasets provisioned for a particular Organisation which include organisations, cohorts, and datasets sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `reporting_historic_endpoints`: To store historical data on endpoints sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `column_field`: This is a mapping table that connects specific columns from an endpoint to their corresponding fields sourced according to the dataset's specification e.g UID -> reference, sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
|
||
**Columns**: | ||
- `organisation`: Extracted from the `provision` table. | ||
- `organisation_name`: Extracted from the `provision` table. | ||
- `cohort`: Extracted from the `provision` table. | ||
- `dataset`: Extracted from the `provision` table. | ||
- `collection`: Extracted from the `reporting_historic_endpoints` table. | ||
- `pipeline`: Extracted from the `reporting_historic_endpoints` table. | ||
- `endpoint`: Extracted from the `reporting_historic_endpoints` table. | ||
- `endpoint_url`: Extracted from the `reporting_historic_endpoints` table. | ||
- `resource`: Extracted from the `reporting_historic_endpoints` table. | ||
- `resource_start_date`: Extracted from the `reporting_historic_endpoints` table. | ||
- `resource_end_date`: Extracted from the `reporting_historic_endpoints` table. | ||
- `latest_log_entry_date`: Extracted from the `reporting_historic_endpoints` table. | ||
- `mapping_field`: Generated using conditional aggregation from `column_field` to identify fields that map correctly to columns. | ||
- `non_mapping_field`: Generated using conditional aggregation from `column_field` for fields that do not map correctly. | ||
|
||
|
||
|
||
### 3. Table: [endpoint_dataset_summary](https://datasette.planning.data.gov.uk/performance/endpoint_dataset_summary) | ||
|
||
**Purpose**: To summarize endpoint information, including the latest statuses and exceptions. | ||
|
||
**Base Tables**: | ||
- `endpoint`: Contains information about endpoints sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `source`: Provides centralized metadata for datasets across different organisations, sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `log`: Contains logs related to endpoint performance sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. typo? |
||
|
||
**Columns**: | ||
- `endpoint`: Extracted from the `endpoint` table. | ||
- `endpoint_url`: Extracted from the `endpoint` table. | ||
- `organisation`: Extracted from the `source` table. | ||
- `dataset`: Extracted from the resource-dataset mapping. | ||
- `latest_status`: Extracted from the `log` table. | ||
- `latest_exception`: Extracted from the `log` table. | ||
- `entry_date`: Entry date of the endpoint. | ||
- `end_date`: End date of the endpoint. | ||
- `latest_resource_start_date`: Start date of the most recent resource. | ||
|
||
|
||
### 4. Table: [provision_summary](https://datasette.planning.data.gov.uk/performance/provision_summary) | ||
|
||
**Purpose**: This table provides an overview of the provision metrics for each dataset associated with an organization, by tracking the number of active and error-prone endpoints and counting various types of issues(error, warning & notice) and endpoints. | ||
|
||
**Base Tables**: | ||
- `provision`: Contains information about datasets provisioned for a particular Organisation which include organisations, cohorts, and datasets sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `organisation`: Contains the names and details of organizations sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `issue`: Contains records of issues related to resources sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `resource`: Holds information about the resources linked to the endpoints sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `issue_type`: Defines types of issues sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `column_field`: This is a mapping table that connects specific columns from an endpoint to their corresponding fields sourced according to the dataset's specification e.g UID -> reference, sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `endpoint`: Contains information about endpoints sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `source`: Provides centralized metadata for datasets across different organisations, sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `log`: Contains logs related to endpoint, sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `reporting_historic_endpoints`: To store historical data on endpoints sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
|
||
**Columns**: | ||
- `organisation`: Extracted from the `provision` table. | ||
- `organisation_name`: Extracted from the `organisation` table. | ||
- `dataset`: Extracted from the `provision` table. | ||
- `active_endpoint_count`: Calculated from `endpoint` table. | ||
- `error_endpoint_count`: Calculated from `endpoint` table. | ||
- `count_issue_error_internal`: Count of internal error issues calculated from `issue` and `issue_type`. | ||
- `count_issue_error_external`: Count of external error issues calculated from `issue` and `issue_type`. | ||
- `count_issue_warning_internal`: Count of internal warning issues calculated from `issue` and `issue_type`. | ||
- `count_issue_warning_external`: Count of external warning issues calculated from `issue` and `issue_type`. | ||
- `count_issue_notice_internal`: Count of internal notice issues calculated from `issue` and `issue_type`. | ||
- `count_issue_notice_external`: Count of external notice issues calculated from `issue` and `issue_type`. | ||
|
||
|
||
### 5. Table: [reporting_historic_endpoints](https://datasette.planning.data.gov.uk/performance/reporting_historic_endpoints) | ||
|
||
**Purpose**: To store historical data on endpoints, including their organization, dataset, and status. | ||
|
||
**Base Tables**: | ||
- `endpoint`: Contains information about endpoints sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `source`: Provides centralized metadata for datasets across different organisations, sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `log`: Contains logs related to endpoint, sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `organisation`: Contains the names and details of organizations sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `source_pipeline`: Contains information on the pipeline associated with each source sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `resource`: Holds information about the resources linked to the endpoints sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
|
||
**Columns**: | ||
- `organisation`: Extracted from `source` table. | ||
- `name`: Extracted from the `organisation` table. | ||
- `organisation_name`: Extracted from the `organisation` table. | ||
- `dataset`: Extracted from the `source_pipeline` table. | ||
- `collection`: Extracted from the `source` table. | ||
- `pipeline`: Extracted from the `source_pipeline` table. | ||
- `endpoint`: Extracted from the `log` table. | ||
- `endpoint_url`: Extracted from the `endpoint` table. | ||
- `licence`: Extracted from the `source` data. | ||
- `latest_status`: Extracted from the `log` table. | ||
- `latest_exception`: Extracted from the `log` table. | ||
- `resource`: Extracted from the `log` table. | ||
- `latest_log_entry_date`: Extracted using the **max** function on the entry_date from the `log` table. | ||
- `endpoint_entry_date`: Extracted from the `endpoint` table. | ||
- `endpoint_end_date`: Extracted from the `endpoint` table. | ||
- `resource_start_date`: Extracted from the `resource` table. | ||
- `resource_end_date`: Extracted from the `resource` table. | ||
|
||
|
||
### 6. Table: [reporting_latest_endpoints](https://datasette.planning.data.gov.uk/performance/reporting_latest_endpoints) | ||
|
||
**Purpose**: To store the most recent data on endpoints and provides the latest active endpoint data per organization and pipeline. | ||
|
||
**Base Tables**: | ||
- `endpoint`: Contains information about endpoints sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `source`: Provides centralized metadata for datasets across different organisations, sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `log`: Contains logs related to endpoint, sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `organisation`: Contains the names and details of organizations sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `source_pipeline`: Contains information on the pipeline associated with each source sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
- `resource`: Holds information about the resources linked to the endpoints sourced from the [digital_land](https://datasette.planning.data.gov.uk/digital-land). | ||
|
||
**Columns**: | ||
- `organisation`: Extracted from source table. | ||
- `name`: Extracted from the organization table. | ||
- `organisation_name`: Extracted from the organization table. | ||
- `dataset`: Extracted from the dataset table. | ||
- `collection`: Extracted from the source table. | ||
- `pipeline`: Extracted from the source_pipeline table. | ||
- `endpoint`: Extracted from the log table. | ||
- `endpoint_url`: Extracted from the endpoint table. | ||
- `licence`: Extracted from the source data. | ||
- `latest_status`: Extracted from the log table. | ||
- `days_since_200`: Calculated from log as days since last "200 OK" status, from subquery t2. | ||
- `latest_exception`: Extracted from the log table. | ||
- `resource`: Extracted from the log table. | ||
- `latest_log_entry_date`: Extracted using the **max** function on the entry_date from the log table. | ||
- `endpoint_entry_date`: Extracted from the endpoint table. | ||
- `endpoint_end_date`: Extracted from the endpoint table. | ||
- `resource_start_date`: Extracted from the resource table. | ||
- `resource_end_date`: Extracted from the resource table. | ||
- `rn`: Row number for identifying unique records. | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be useful to confirm here if there are any restrictions on resources. Does the table only include active resources, and does it include resources even when they don't have any issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.