You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For each platform/resource combination (so for each ResourceConnector),
we want to execute a separate script that makes sure that the db of our Metadata Catalogue is in sync with the metadata of this platform. We want to execute this script every X seconds (probably minutes).
Main considerations:
It should be easy to implement a new connector
It should be easy to monitor the connectors
It should be easy to retry synchronizing resources that threw an error
This could be placed in a separate repository. To keep it simple, let's keep it in the current repo for now
Create src/connectors/synchronization.py
It should expect command line arguments:
from: datetime | None - only relevant for the first run. The first run will start with this datetime.
connector: str - the path to the ResourceConnector
connect-db (either connect-db or connect-url must be present) - if present, the database of the Metadata Catalogue will be updated directly
connect-url (either connect-db or connect-url must be present): str - if present, the Metadata Catalogue will be updated using the REST API.
working-dir: str - a path. This will contain a subdirectory for this ResourceConnector with
the logging
a .csv with the failed resources (datetime, identifier, reason)
a file containing the next datetime from which the data should be retrieved
When it's running, it should run from the datetime given by the working-dir/last-datetime file
(or if not existant, from the cmd line argument) to the current datetime.
If this is the first run, the run should be split into batches.
Update the resource connectors.
They should have a fetch method with parameters from_incl and to_excl
The fetch should return same as now, but also possibly a FailedResource with identifier and reason
They should have a retry method with parameter identifier
create a simple synchronize.sh, that:
will be run from a crontab
takes the same arguments as the .py
adds a file lock, making sure that this process is not called multiple times
calls the .py
logs into the working directory. This should be a separate log, containing only "Starting run" "Run ended" and "Run not possible because another process is already running"
Create src/connectors/retry.py (can be left out of scope for the first PR)
This should retry all failures that are present in the failed resources .csv.
command line arguments
connector
connect-db
connect-url
working-dir
The text was updated successfully, but these errors were encountered:
For each platform/resource combination (so for each ResourceConnector),
we want to execute a separate script that makes sure that the db of our Metadata Catalogue is in sync with the metadata of this platform. We want to execute this script every X seconds (probably minutes).
Main considerations:
This could be placed in a separate repository. To keep it simple, let's keep it in the current repo for now
src/connectors/synchronization.py
from: datetime | None
- only relevant for the first run. The first run will start with this datetime.connector: str
- the path to the ResourceConnectorconnect-db
(either connect-db or connect-url must be present) - if present, the database of the Metadata Catalogue will be updated directlyconnect-url
(either connect-db or connect-url must be present): str - if present, the Metadata Catalogue will be updated using the REST API.working-dir: str
- a path. This will contain a subdirectory for this ResourceConnector with(or if not existant, from the cmd line argument) to the current datetime.
If this is the first run, the run should be split into batches.
fetch
method with parametersfrom_incl
andto_excl
FailedResource
with identifier and reasonretry
method with parameteridentifier
synchronize.sh
, that:crontab
.py
.py
src/connectors/retry.py
(can be left out of scope for the first PR)The text was updated successfully, but these errors were encountered: