Skip to content

Commit

Permalink
Fix/rebuild db (#322)
Browse files Browse the repository at this point in the history
* Do not setup the database if rebuild-db is set to no

* Log a warning when the database does not exist and is not created

* Fix typo in warning

* Rename parameter and options for clarity and update outofdate docs
  • Loading branch information
PGijsbers authored Oct 29, 2024
1 parent 5b40190 commit f4b2d1d
Show file tree
Hide file tree
Showing 4 changed files with 73 additions and 45 deletions.
41 changes: 15 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,33 +178,22 @@ for a production instance.

See [authentication README](authentication/README.md) for more information.

### Populating the Database
### Creating the Database

By default, the app will create a database on the provided MySQL server.
You can change this behavior through the **build-db** command-line parameter,
it takes the following options:
* never: *never* creates the database, not even if there does not exist one yet.
Use this only if you expect the database to be created through other means, such
as MySQL group replication.
* if-absent: Creates a database only if none exists. (default)
* drop-then-build: Drops the database on startup to recreate it from scratch.
**THIS REMOVES ALL DATA PERMANENTLY. NO RECOVERY POSSIBLE.**

By default, the app will connect to the database and populate it with a few items if there is no data present.
You can change this behavior through parameters of the script:

* **rebuild-db**: "no", "only-if-empty", "always". Default is "only-if-empty".
* no: connect to the database but don't make any modifications on startup.
* only-if-empty: if the database does not exist, create it. Then, if the tables do not exist, create them.
Then, if the tables are empty, populate according to `populate`.
* always: drop the configured database and rebuild its structure from scratch.
Effectively a `DROP DATABASE` followed by a `CREATE DATABASE` and the creation of the tables.
The database is then repopulated according to `populate`.
**Important:** data in the database is not restored. All data will be lost. Do not use this option
if you are not sure if it is what you need.

* **populate-datasets**: one or multiple of "example", "huggingface", "zenodo" or "openml".
Default is nothing. Specifies what data to add the database, only used if `rebuild-db` is
"only-if-empty" or "always".
* nothing: don't add any data.
* example: registers two datasets and two publications.
* openml: registers datasets of OpenML, this may take a while, depending on the limit (~30
minutes).

* **populate-publications**: similar to populate-datasets. Only "example" is currently implemented.

* **limit**: limit the number of initial resources with which the database is populated. This
limit is per resource and per platform.
### Populating the Database
To populate the database with some examples, run the `connectors/fill-examples.sh` script.
When using `docker compose` you can easily do this by running the "examples" profile:
`docker compose --profile examples up`

## Usage

Expand Down
3 changes: 2 additions & 1 deletion docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,10 @@ services:
- ${AIOD_REST_PORT}:8000
volumes:
- ./src:/app:ro
stdin_open: true # docker run -i
command: >
python main.py
--rebuild-db only-if-empty
--build-db if-absent
--reload
healthcheck:
test: ["CMD", "python", "-c", "import requests; requests.get('http://localhost:8000')"]
Expand Down
20 changes: 19 additions & 1 deletion src/database/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import sqlmodel
from sqlalchemy import text, create_engine
from sqlmodel import SQLModel, select
from sqlalchemy.exc import OperationalError

from config import DB_CONFIG
from connectors.resource_with_relations import ResourceWithRelations
Expand All @@ -16,7 +17,7 @@
from routers import resource_routers


def drop_or_create_database(delete_first: bool):
def create_database(*, delete_first: bool):
url = db_url(including_db=False)
engine = create_engine(url, echo=False) # Temporary engine, not connected to a database
with engine.connect() as connection:
Expand All @@ -26,6 +27,23 @@ def drop_or_create_database(delete_first: bool):
connection.execute(text(f"CREATE DATABASE IF NOT EXISTS {database}"))


def database_exists() -> bool:
"""Checks whether the database defined in the configuration exists."""
url = db_url(including_db=True)
# Using the singleton defined in `Session.py` may be cleaner, but I could
# not find documentation that ensures me that creating the engine there and
# then potentially re-creating the database later is safe.
# Since this function is only supposed to be called once, using a separate
# Engine object does not seem problematic.
engine = create_engine(url, echo=False)
try:
with engine.connect() as _:
pass
except OperationalError:
return False
return True


def _get_existing_resource(
session: sqlmodel.Session, resource: AIoDConcept, clazz: type[SQLModel]
) -> AIoDConcept | None:
Expand Down
54 changes: 37 additions & 17 deletions src/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
"""

import argparse
import logging

import pkg_resources
import uvicorn
Expand All @@ -20,7 +21,7 @@
from database.model.platform.platform import Platform
from database.model.platform.platform_names import PlatformName
from database.session import EngineSingleton, DbSession
from database.setup import drop_or_create_database
from database.setup import create_database, database_exists
from routers import resource_routers, parent_routers, enum_routers, uploader_routers
from routers import search_routers
from setup_logger import setup_logger
Expand All @@ -31,10 +32,18 @@ def _parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description="Please refer to the README.")
parser.add_argument("--url-prefix", default="", help="Prefix for the api url.")
parser.add_argument(
"--rebuild-db",
default="only-if-empty",
choices=["no", "only-if-empty", "always"],
help="Determines if the database is recreated.",
"--build-db",
default="if-absent",
choices=["never", "if-absent", "drop-then-build"],
help="""
Determines if the database is created:\n
- never: *never* creates the database, not even if there does not exist one yet.
Use this only if you expect the database to be created through other means, such
as MySQL group replication.\n
- if-absent: Creates a database only if none exists.\n
- drop-then-build: Drops the database on startup to recreate it from scratch.
THIS REMOVES ALL DATA PERMANENTLY. NO RECOVERY POSSIBLE.
""",
)
parser.add_argument(
"--reload",
Expand Down Expand Up @@ -111,18 +120,29 @@ def create_app() -> FastAPI:
"scopes": KEYCLOAK_CONFIG.get("scopes"),
},
)
drop_or_create_database(delete_first=args.rebuild_db == "always")
AIoDConcept.metadata.create_all(EngineSingleton().engine, checkfirst=True)
with DbSession() as session:
existing_platforms = session.scalars(select(Platform)).all()
if not any(existing_platforms):
session.add_all([Platform(name=name) for name in PlatformName])
session.commit()

# this is a bit of a hack: instead of checking whether the triggers exist, we check
# whether platforms are already present. If platforms were not present, the db is
# empty, and so the triggers should still be added.
add_delete_triggers(AIoDConcept)
if args.build_db == "never":
if not database_exists():
logging.warning(
"AI-on-Demand database does not exist on the MySQL server, "
"but `build_db` is set to 'never'. If you are not creating the "
"database through other means, such as MySQL group replication, "
"this likely means that you will get errors or undefined behavior."
)
else:

drop_database = args.build_db == "drop-then-build"
create_database(delete_first=drop_database)
AIoDConcept.metadata.create_all(EngineSingleton().engine, checkfirst=True)
with DbSession() as session:
existing_platforms = session.scalars(select(Platform)).all()
if not any(existing_platforms):
session.add_all([Platform(name=name) for name in PlatformName])
session.commit()

# this is a bit of a hack: instead of checking whether the triggers exist, we check
# whether platforms are already present. If platforms were not present, the db is
# empty, and so the triggers should still be added.
add_delete_triggers(AIoDConcept)

add_routes(app, url_prefix=args.url_prefix)
return app
Expand Down

0 comments on commit f4b2d1d

Please sign in to comment.