Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: 🎸 add concept of Resource #784

Merged
merged 29 commits into from
Feb 10, 2023
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
52b9df8
feat: 🎸 add concept of Resource
severo Feb 7, 2023
7be733d
ci: 🎡 fix unit tests for the job
severo Feb 8, 2023
203d789
refactor: 💡 remove dead code
severo Feb 8, 2023
6aa2398
Merge branch 'main' into create_resources_concept
severo Feb 8, 2023
073809c
test: 💍 give more time to avoid timeout to mongodb
severo Feb 8, 2023
5da80a8
refactor: 💡 rename resource into resources
severo Feb 8, 2023
2f7bef4
refactor: 💡 use InitVar to avoid mutating fields
severo Feb 8, 2023
7d047b0
fix: 🐛 release the resources on shutdown
severo Feb 8, 2023
c7b9de9
fix: 🐛 avoid using InitVar since they are hard to use
severo Feb 8, 2023
c4d755b
test: 💍 fix parameter name
severo Feb 8, 2023
59c78b9
fix: 🐛 no need to call .allocate, it's done in __post_init__
severo Feb 8, 2023
9d4039f
feat: 🎸 use primitive parameters, add release, add tests
severo Feb 8, 2023
8952cb5
Merge branch 'main' into create_resources_concept
severo Feb 8, 2023
4681b97
style: 💄 fix style
severo Feb 8, 2023
12a52b1
Merge branch 'main' into create_resources_concept
severo Feb 9, 2023
d99d0e4
refactor: 💡 rename AssetsDirectory to AssetsStorageAccess
severo Feb 9, 2023
2301eb9
fix: 🐛 allocate assets storage for all the workers
severo Feb 9, 2023
318f1a2
Merge branch 'main' into create_resources_concept
severo Feb 10, 2023
e3317c9
refactor: 💡 change env var names for coherence
severo Feb 10, 2023
0c5f89a
fix: 🐛 missing env var
severo Feb 10, 2023
8613ecc
ci: 🎡 fix env var
severo Feb 10, 2023
a7afa28
refactor: 💡 replace LogResource with init_logging
severo Feb 10, 2023
ebe20fb
feat: 🎸 add init_assets_directory method
severo Feb 10, 2023
1a8fb1e
refactor: 💡 remove StorageResource + remove mongo.py
severo Feb 10, 2023
37889e9
fix: 🐛 fix details to make the CI pass
severo Feb 10, 2023
66233a7
test: 💍 fix tests
severo Feb 10, 2023
235d3cf
refactor: 💡 remove dead code
severo Feb 10, 2023
16554f0
test: 💍 remove dead code
severo Feb 10, 2023
a4396c6
style: 💄 fix style
severo Feb 10, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/_unit-tests-python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ jobs:
env:
CACHE_MONGO_URL: mongodb://localhost:${{ env.mongo-port }}
QUEUE_MONGO_URL: mongodb://localhost:${{ env.mongo-port }}
MONGODB_MIGRATION_MONGO_URL: mongodb://localhost:${{ env.mongo-port }}
run: poetry run python -m pytest -s --cov=./ --cov-report=xml:./coverage.xml --cov-report=term tests
- name: Prepare codecov flag (slash "/" is not allowed)
id: remove-slash
Expand Down
1 change: 1 addition & 0 deletions chart/templates/worker/config-names/_container.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
# overridden
value: {{ .Values.configNames.queue.maxJobsPerNamespace | quote }}
volumeMounts:
{{ include "volumeMountAssetsRW" . | nindent 2 }}
{{ include "volumeMountCache" . | nindent 2 }}
securityContext:
allowPrivilegeEscalation: false
Expand Down
1 change: 1 addition & 0 deletions chart/templates/worker/config-names/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ spec:
spec:
{{- include "image.imagePullSecrets" . | nindent 6 }}
initContainers:
{{ include "initContainerAssets" . | nindent 8 }}
{{ include "initContainerCache" . | nindent 8 }}
containers: {{ include "containerWorkerConfigNames" . | nindent 8 }}
nodeSelector: {{ toYaml .Values.configNames.nodeSelector | nindent 8 }}
Expand Down
2 changes: 2 additions & 0 deletions chart/templates/worker/dataset-info/_container.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@
# value: {{ .Values.queue.maxJobsPerNamespace | quote }}
# overridden
value: {{ .Values.datasetInfo.queue.maxJobsPerNamespace | quote }}
volumeMounts:
{{ include "volumeMountAssetsRW" . | nindent 2 }}
securityContext:
allowPrivilegeEscalation: false
resources: {{ toYaml .Values.datasetInfo.resources | nindent 4 }}
Expand Down
2 changes: 2 additions & 0 deletions chart/templates/worker/dataset-info/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ spec:
labels: {{ include "labels.datasetInfo" . | nindent 8 }}
spec:
{{- include "image.imagePullSecrets" . | nindent 6 }}
initContainers:
{{ include "initContainerAssets" . | nindent 8 }}
containers: {{ include "containerWorkerDatasetInfo" . | nindent 8 }}
nodeSelector: {{ toYaml .Values.datasetInfo.nodeSelector | nindent 8 }}
tolerations: {{ toYaml .Values.datasetInfo.tolerations | nindent 8 }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
- name: PARQUET_AND_DATASET_INFO_URL_TEMPLATE
value: {{ .Values.parquetAndDatasetInfo.urlTemplate | quote }}
volumeMounts:
{{ include "volumeMountAssetsRW" . | nindent 2 }}
{{ include "volumeMountCache" . | nindent 2 }}
securityContext:
allowPrivilegeEscalation: false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,9 @@ spec:
labels: {{ include "labels.parquetAndDatasetInfo" . | nindent 8 }}
spec:
{{- include "image.imagePullSecrets" . | nindent 6 }}
initContainers: {{ include "initContainerCache" . | nindent 8 }}
initContainers:
{{ include "initContainerAssets" . | nindent 8 }}
{{ include "initContainerCache" . | nindent 8 }}
containers: {{ include "containerWorkerParquetAndDatasetInfo" . | nindent 8 }}
nodeSelector: {{ toYaml .Values.parquetAndDatasetInfo.nodeSelector | nindent 8 }}
tolerations: {{ toYaml .Values.parquetAndDatasetInfo.tolerations | nindent 8 }}
Expand Down
2 changes: 2 additions & 0 deletions chart/templates/worker/parquet/_container.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@
# value: {{ .Values.queue.maxJobsPerNamespace | quote }}
# overridden
value: {{ .Values.parquet.queue.maxJobsPerNamespace | quote }}
volumeMounts:
{{ include "volumeMountAssetsRW" . | nindent 2 }}
securityContext:
allowPrivilegeEscalation: false
resources: {{ toYaml .Values.parquet.resources | nindent 4 }}
Expand Down
2 changes: 2 additions & 0 deletions chart/templates/worker/parquet/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ spec:
labels: {{ include "labels.parquet" . | nindent 8 }}
spec:
{{- include "image.imagePullSecrets" . | nindent 6 }}
initContainers:
{{ include "initContainerAssets" . | nindent 8 }}
containers: {{ include "containerWorkerParquet" . | nindent 8 }}
nodeSelector: {{ toYaml .Values.parquet.nodeSelector | nindent 8 }}
tolerations: {{ toYaml .Values.parquet.tolerations | nindent 8 }}
Expand Down
2 changes: 2 additions & 0 deletions chart/templates/worker/sizes/_container.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@
# value: {{ .Values.queue.maxJobsPerNamespace | quote }}
# overridden
value: {{ .Values.sizes.queue.maxJobsPerNamespace | quote }}
volumeMounts:
{{ include "volumeMountAssetsRW" . | nindent 2 }}
securityContext:
allowPrivilegeEscalation: false
resources: {{ toYaml .Values.sizes.resources | nindent 4 }}
Expand Down
2 changes: 2 additions & 0 deletions chart/templates/worker/sizes/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ spec:
labels: {{ include "labels.sizes" . | nindent 8 }}
spec:
{{- include "image.imagePullSecrets" . | nindent 6 }}
initContainers:
{{ include "initContainerAssets" . | nindent 8 }}
containers: {{ include "containerWorkerSizes" . | nindent 8 }}
nodeSelector: {{ toYaml .Values.sizes.nodeSelector | nindent 8 }}
tolerations: {{ toYaml .Values.sizes.tolerations | nindent 8 }}
Expand Down
1 change: 1 addition & 0 deletions chart/templates/worker/split-names/_container.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
# overridden
value: {{ .Values.splitNames.queue.maxJobsPerNamespace | quote }}
volumeMounts:
{{ include "volumeMountAssetsRW" . | nindent 2 }}
{{ include "volumeMountCache" . | nindent 2 }}
securityContext:
allowPrivilegeEscalation: false
Expand Down
1 change: 1 addition & 0 deletions chart/templates/worker/split-names/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ spec:
spec:
{{- include "image.imagePullSecrets" . | nindent 6 }}
initContainers:
{{ include "initContainerAssets" . | nindent 8 }}
{{ include "initContainerCache" . | nindent 8 }}
containers: {{ include "containerWorkerSplitNames" . | nindent 8 }}
nodeSelector: {{ toYaml .Values.splitNames.nodeSelector | nindent 8 }}
Expand Down
1 change: 1 addition & 0 deletions chart/templates/worker/splits/_container.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
# overridden
value: {{ .Values.splits.queue.maxJobsPerNamespace | quote }}
volumeMounts:
{{ include "volumeMountAssetsRW" . | nindent 2 }}
{{ include "volumeMountCache" . | nindent 2 }}
securityContext:
allowPrivilegeEscalation: false
Expand Down
4 changes: 3 additions & 1 deletion chart/templates/worker/splits/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,9 @@ spec:
labels: {{ include "labels.splits" . | nindent 8 }}
spec:
{{- include "image.imagePullSecrets" . | nindent 6 }}
initContainers: {{ include "initContainerCache" . | nindent 8 }}
initContainers:
{{ include "initContainerAssets" . | nindent 8 }}
{{ include "initContainerCache" . | nindent 8 }}
containers: {{ include "containerWorkerSplits" . | nindent 8 }}
nodeSelector: {{ toYaml .Values.splits.nodeSelector | nindent 8 }}
tolerations: {{ toYaml .Values.splits.tolerations | nindent 8 }}
Expand Down
21 changes: 8 additions & 13 deletions jobs/mongodb_migration/src/mongodb_migration/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,42 +6,37 @@
from environs import Env
from libcommon.config import CacheConfig, CommonConfig, QueueConfig

from mongodb_migration.database_migrations import connect_to_database

MONGODB_MIGRATION_MONGO_DATABASE = "datasets_server_maintenance"
MONGO_DATABASE_MONGO_URL = "mongodb://localhost:27017"
MONGODB_MIGRATION_MONGO_URL = "mongodb://localhost:27017"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the defined class at line 14, It might be DATABASE_MIGRATION_MONGO_URL value
And also attribute name at line 9 could be DATABASE_MIGRATION_MONGO_DATABASE
Just to keep the consistency between NAMESPACE_ -> Class Config as it is done in other configurations classes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks



@dataclass
class MongodbMigrationConfig:
class DatabaseMigrationsConfig:
mongo_database: str = MONGODB_MIGRATION_MONGO_DATABASE
mongo_url: str = MONGO_DATABASE_MONGO_URL

def __post_init__(self):
connect_to_database(database=self.mongo_database, host=self.mongo_url)
mongo_url: str = MONGODB_MIGRATION_MONGO_URL

@staticmethod
def from_env() -> "MongodbMigrationConfig":
def from_env() -> "DatabaseMigrationsConfig":
env = Env(expand_vars=True)
with env.prefixed("MONGODB_MIGRATION_"):
return MongodbMigrationConfig(
return DatabaseMigrationsConfig(
mongo_database=env.str(name="MONGO_DATABASE", default=MONGODB_MIGRATION_MONGO_DATABASE),
mongo_url=env.str(name="MONGO_URL", default=MONGO_DATABASE_MONGO_URL),
mongo_url=env.str(name="MONGO_URL", default=MONGODB_MIGRATION_MONGO_URL),
)


@dataclass
class JobConfig:
cache: CacheConfig = field(default_factory=CacheConfig)
common: CommonConfig = field(default_factory=CommonConfig)
mongodb_migration: MongodbMigrationConfig = field(default_factory=MongodbMigrationConfig)
database_migrations: DatabaseMigrationsConfig = field(default_factory=DatabaseMigrationsConfig)
queue: QueueConfig = field(default_factory=QueueConfig)

@staticmethod
def from_env() -> "JobConfig":
return JobConfig(
common=CommonConfig.from_env(),
cache=CacheConfig.from_env(),
mongodb_migration=MongodbMigrationConfig.from_env(),
database_migrations=DatabaseMigrationsConfig.from_env(),
queue=QueueConfig.from_env(),
)
4 changes: 4 additions & 0 deletions jobs/mongodb_migration/src/mongodb_migration/constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# SPDX-License-Identifier: Apache-2.0
# Copyright 2022 The HuggingFace Authors.

DATABASE_MIGRATIONS_MONGOENGINE_ALIAS = "maintenance"
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,12 @@
import types
from typing import Generic, Type, TypeVar

from mongoengine import Document, DoesNotExist, connect
from mongoengine import Document, DoesNotExist
from mongoengine.fields import StringField
from mongoengine.queryset.queryset import QuerySet

from mongodb_migration.constants import DATABASE_MIGRATIONS_MONGOENGINE_ALIAS

# START monkey patching ### hack ###
# see https://github.com/sbdchd/mongo-types#install
U = TypeVar("U", bound=Document)
Expand All @@ -27,12 +29,6 @@ def __get__(self, instance: object, cls: Type[U]) -> QuerySet[U]:

# END monkey patching ### hack ###

DATABASE_ALIAS = "maintenance"


def connect_to_database(database: str, host: str) -> None:
connect(db=database, alias=DATABASE_ALIAS, host=host)


class DatabaseMigration(Document):
"""A database migration that has already been executed.
Expand All @@ -44,7 +40,7 @@ class DatabaseMigration(Document):

meta = {
"collection": "databaseMigrations",
"db_alias": DATABASE_ALIAS,
"db_alias": DATABASE_MIGRATIONS_MONGOENGINE_ALIAS,
}
version = StringField(required=True)
description = StringField(required=True)
Expand Down
28 changes: 22 additions & 6 deletions jobs/mongodb_migration/src/mongodb_migration/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,34 @@

import sys

from libcommon.resources import (
CacheDatabaseResource,
LogResource,
QueueDatabaseResource,
)

from mongodb_migration.collector import MigrationsCollector
from mongodb_migration.config import JobConfig
from mongodb_migration.plan import Plan
from mongodb_migration.resources import MigrationsDatabaseResource

if __name__ == "__main__":
job_config = JobConfig.from_env()
collected_migrations = MigrationsCollector().get_migrations()
try:
Plan(collected_migrations=collected_migrations).execute()
sys.exit(0)
except Exception:
sys.exit(1)
with (
LogResource(log_level=job_config.common.log_level),
# ^ first resource to be acquired, in order to have logs as soon as possible
CacheDatabaseResource(database=job_config.cache.mongo_database, host=job_config.cache.mongo_url),
QueueDatabaseResource(database=job_config.queue.mongo_database, host=job_config.queue.mongo_url),
MigrationsDatabaseResource(
database=job_config.database_migrations.mongo_database, host=job_config.database_migrations.mongo_url
),
):
collected_migrations = MigrationsCollector().get_migrations()
try:
Plan(collected_migrations=collected_migrations).execute()
sys.exit(0)
except Exception:
sys.exit(1)

# See:
# https://blog.appsignal.com/2020/04/14/dissecting-rails-migrationsl.html
Expand Down
25 changes: 25 additions & 0 deletions jobs/mongodb_migration/src/mongodb_migration/resources.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# SPDX-License-Identifier: Apache-2.0
# Copyright 2022 The HuggingFace Authors.

from dataclasses import dataclass, field

from libcommon.resources import DatabaseResource

from mongodb_migration.constants import DATABASE_MIGRATIONS_MONGOENGINE_ALIAS


class MigrationsDatabaseConnectionFailure(Exception):
pass


@dataclass
class MigrationsDatabaseResource(DatabaseResource):
"""
A resource that represents a connection to the migrations database.

Args:
database (:obj:`str`): The name of the database.
host (:obj:`str`): The host of the database. It must start with ``mongodb://`` or ``mongodb+srv://``.
"""

mongoengine_alias: str = field(default=DATABASE_MIGRATIONS_MONGOENGINE_ALIAS, init=False)
31 changes: 10 additions & 21 deletions jobs/mongodb_migration/tests/conftest.py
Original file line number Diff line number Diff line change
@@ -1,29 +1,18 @@
# SPDX-License-Identifier: Apache-2.0
# Copyright 2022 The HuggingFace Authors.

from pytest import MonkeyPatch, fixture
from environs import Env
from pytest import fixture

from mongodb_migration.config import JobConfig


# see https://github.com/pytest-dev/pytest/issues/363#issuecomment-406536200
@fixture(scope="session")
def monkeypatch_session():
monkeypatch_session = MonkeyPatch()
monkeypatch_session.setenv("CACHE_MONGO_DATABASE", "datasets_server_cache_test")
monkeypatch_session.setenv("QUEUE_MONGO_DATABASE", "datasets_server_queue_test")
monkeypatch_session.setenv("MONGODB_MIGRATION_MONGO_DATABASE", "datasets_server_maintenance_test")
yield monkeypatch_session
monkeypatch_session.undo()
def env() -> Env:
return Env(expand_vars=True)


@fixture(scope="session", autouse=True)
def app_config(monkeypatch_session: MonkeyPatch) -> JobConfig:
job_config = JobConfig.from_env()
if (
"test" not in job_config.cache.mongo_database
or "test" not in job_config.queue.mongo_database
or "test" not in job_config.mongodb_migration.mongo_database
):
raise ValueError("Test must be launched on a test mongo database")
return job_config
@fixture(scope="session")
def mongo_host(env: Env) -> str:
try:
return env.str(name="MONGODB_MIGRATION_MONGO_URL")
except Exception as e:
raise ValueError("MONGODB_MIGRATION_MONGO_URL is not set") from e
14 changes: 12 additions & 2 deletions jobs/mongodb_migration/tests/test_plan.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: Apache-2.0
# Copyright 2022 The HuggingFace Authors.

from typing import List, Optional, Type
from typing import Iterator, List, Optional, Type

import pytest

Expand All @@ -11,10 +11,20 @@
)
from mongodb_migration.migration import IrreversibleMigration, Migration
from mongodb_migration.plan import Plan, SavedMigrationsError
from mongodb_migration.resources import MigrationsDatabaseResource


@pytest.fixture(scope="module")
def migrations_database_resource(mongo_host: str) -> Iterator[MigrationsDatabaseResource]:
database = "datasets_server_migrations_test"
if "test" not in database:
raise ValueError("Test must be launched on a test mongo database")
with MigrationsDatabaseResource(database=database, host=mongo_host) as resource:
yield resource


@pytest.fixture(autouse=True)
def clean_mongo_database() -> None:
def clean_mongo_database(migrations_database_resource: MigrationsDatabaseResource) -> None:
_clean_maintenance_database()


Expand Down
20 changes: 20 additions & 0 deletions jobs/mongodb_migration/tests/test_resources.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# SPDX-License-Identifier: Apache-2.0
# Copyright 2022 The HuggingFace Authors.

from mongoengine import Document
from mongoengine.fields import StringField

from mongodb_migration.resources import MigrationsDatabaseResource


def test_cache_database(mongo_host: str) -> None:
resource = MigrationsDatabaseResource(database="test_migrations_database", host=mongo_host)

class User(Document):
name = StringField()
meta = {"db_alias": resource.mongo_connection.mongoengine_alias}

assert len(User.objects()) == 0 # type: ignore
# clean
User.drop_collection() # type: ignore
resource.release()
Loading