A Python package intended to manage Google Cloud Data Catalog custom entries, loading metadata from external sources. Currently supports the CSV and JSON file formats.
It is built on top of GoogleCloudPlatform/datacatalog-connectors and, differently from the existing connectors, allows ingesting metadata with no need to connect to other systems than Data Catalog. Known use cases include validating Custom Entries ingestion workloads before coding their specific features and loading metadata into development / PoC environments.
In case you need not only Entries but also Tags to validate your model/workload, consider giving datacatalog-custom-model-manager a try.
- 1. Environment setup
- 2. Manage Custom Entries
- 3. How to contribute
Using virtualenv is optional, but strongly recommended unless you use Docker.
This is recommended so all related stuff will reside at the same place, making it easier to follow the next instructions.
mkdir ./datacatalog-custom-entries-manager
cd ./datacatalog-custom-entries-manager
All paths starting with ./
in the next steps are relative to the
datacatalog-custom-entries-manager
folder.
pip install --upgrade virtualenv
python3 -m virtualenv --python python3 env
source ./env/bin/activate
pip install --upgrade datacatalog-custom-entries-manager
Docker may be used as an option to run datacatalog-custom-entries-manager
. In this case,
please disregard the above virtualenv setup instructions.
git clone https://github.com/ricardolsmendes/datacatalog-custom-entries-manager
cd ./datacatalog-custom-entries-manager
DataCatalog entryGroup Owner
DataCatalog entry Owner
Data Catalog Viewer
./credentials/datacatalog-custom-entries-manager.json
This step can be skipped if you're using Docker.
export GOOGLE_APPLICATION_CREDENTIALS=./credentials/datacatalog-custom-entries-manager.json
- SCHEMA
The metadata schema to synchronize Custom Entries is presented below. Use as many lines as needed to describe all Data Catalog Entries you need.
Column | Description | Mandatory |
---|---|---|
user_specified_system | Indicates the Entry source system | ✓ |
group_id | Id of the Entry Group the Entry belongs to | ✓ |
linked_resource | The resource a metadata Entry refers to | ✓ |
display_name | Display information such as title and description; a short name to identify the Entry (the entry_id field will be generated as a normalized version of the display name) |
✓ |
description | Can consist of several sentences that describe the Entry contents | ✗ |
user_specified_type | A custom value indicating the Entry type | ✓ |
created_at | The creation time of the underlying resource, not of the Data Catalog Entry (format: YYYY-MM-DDTHH:MM:SSZ) | ✗ |
updated_at | The last-modified time of the underlying resource, not of the Data Catalog Entry (format: YYYY-MM-DDTHH:MM:SSZ) | ✗ |
- SAMPLE INPUT
- sample-input/csv for reference;
- Data Catalog Sample Custom Entries (Google Sheets) might help to create/export a CSV file.
- COMMANDS
Python + virtualenv
datacatalog-custom-entries sync \
--csv-file <CSV-FILE-PATH> \
--project-id <YOUR-PROJECT-ID> --location-id <YOUR-LOCATION-ID>
Docker
docker build --rm --tag datacatalog-custom-entries-manager .
docker run --rm --tty \
--volume <CREDENTIALS-FILE-DIR>:/credentials --volume <CSV-FILE-DIR>:/data \
datacatalog-custom-entries-manager sync \
--csv-file /data/<CSV-FILE-PATH> \
--project-id <YOUR-PROJECT-ID> --location-id <YOUR-LOCATION-ID>
- STRUCTURE
The metadata structure to synchronize Custom Entries is presented below. Use as many objects as needed to describe all Data Catalog Entries you need.
{
"userSpecifiedSystems": [
{
"name": "STRING",
"entryGroups": [
{
"id": "STRING",
"entries": [
{
"linkedResource": "STRING",
"displayName": "STRING",
"description": "STRING (optional)",
"type": "STRING",
"createdAt": "STRING (optional, format: YYYY-MM-DDTHH:MM:SSZ)",
"updatedAt": "STRING (optional, format: YYYY-MM-DDTHH:MM:SSZ)"
}
]
}
]
}
]
}
- SAMPLE INPUT
- sample-input/json for reference;
- COMMANDS
Python + virtualenv
datacatalog-custom-entries sync \
--json-file <JSON-FILE-PATH> \
--project-id <YOUR-PROJECT-ID> --location-id <YOUR-LOCATION-ID>
Docker
docker build --rm --tag datacatalog-custom-entries-manager .
docker run --rm --tty \
--volume <CREDENTIALS-FILE-DIR>:/credentials --volume <CSV-FILE-DIR>:/data \
datacatalog-custom-entries-manager sync \
--json-file <JSON-FILE-PATH> \
--project-id <YOUR-PROJECT-ID> --location-id <YOUR-LOCATION-ID>
Please make sure to take a moment and read the Code of Conduct.
Please report bugs and suggest features via the GitHub Issues.
Before opening an issue, search the tracker for possible duplicates. If you find a duplicate, please add a comment saying that you encountered the problem as well.
Please make sure to read the Contributing Guide before making a pull request.