Synchronization of database for each resource/platform #16

josvandervelde · 2023-03-22T10:52:35Z

For each platform/resource combination (so for each ResourceConnector),
we want to execute a separate script that makes sure that the db of our Metadata Catalogue is in sync with the metadata of this platform. We want to execute this script every X seconds (probably minutes).

Main considerations:

It should be easy to implement a new connector
It should be easy to monitor the connectors
It should be easy to retry synchronizing resources that threw an error

This could be placed in a separate repository. To keep it simple, let's keep it in the current repo for now

Create src/connectors/synchronization.py
- It should expect command line arguments:
  - from: datetime | None - only relevant for the first run. The first run will start with this datetime.
  - connector: str - the path to the ResourceConnector
  - connect-db (either connect-db or connect-url must be present) - if present, the database of the Metadata Catalogue will be updated directly
  - connect-url (either connect-db or connect-url must be present): str - if present, the Metadata Catalogue will be updated using the REST API.
  - working-dir: str - a path. This will contain a subdirectory for this ResourceConnector with
    - the logging
    - a .csv with the failed resources (datetime, identifier, reason)
    - a file containing the next datetime from which the data should be retrieved
- When it's running, it should run from the datetime given by the working-dir/last-datetime file
  (or if not existant, from the cmd line argument) to the current datetime.
  If this is the first run, the run should be split into batches.
Update the resource connectors.
- They should have a fetch method with parameters from_incl and to_excl
- The fetch should return same as now, but also possibly a FailedResource with identifier and reason
- They should have a retry method with parameter identifier
create a simple synchronize.sh, that:
- will be run from a crontab
- takes the same arguments as the .py
- adds a file lock, making sure that this process is not called multiple times
- calls the .py
- logs into the working directory. This should be a separate log, containing only "Starting run" "Run ended" and "Run not possible because another process is already running"
Create src/connectors/retry.py (can be left out of scope for the first PR)
- This should retry all failures that are present in the failed resources .csv.
- command line arguments
  - connector
  - connect-db
  - connect-url
  - working-dir

The text was updated successfully, but these errors were encountered:

#16

josvandervelde added this to AIoD API Mar 22, 2023

josvandervelde converted this from a draft issue Mar 22, 2023

arejula27 mentioned this issue Jul 14, 2023

Feature/connector sync #100

Merged

josvandervelde added a commit that referenced this issue Aug 18, 2023

Updated the synchronization and connectors to be more like described in

9e3e8c2

#16

josvandervelde moved this from Todo to Done in AIoD API Dec 19, 2023

josvandervelde closed this as completed Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synchronization of database for each resource/platform #16

Synchronization of database for each resource/platform #16

josvandervelde commented Mar 22, 2023 •

edited

Loading

Synchronization of database for each resource/platform #16

Synchronization of database for each resource/platform #16

Comments

josvandervelde commented Mar 22, 2023 • edited Loading

josvandervelde commented Mar 22, 2023 •

edited

Loading