Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better file/folder organization for DB scripts across all NApps #61

Open
viniarck opened this issue Sep 28, 2023 · 5 comments
Open

better file/folder organization for DB scripts across all NApps #61

viniarck opened this issue Sep 28, 2023 · 5 comments
Labels
enhancement New feature or request priority_medium Medium priority

Comments

@viniarck
Copy link
Member

Currently these NApps have the following scripts:

  • flow_manager:
├── scripts
│   ├── drop_compound_index.py
│   ├── pipeline_related.py
│   ├── README.md
│   └── storehouse_to_mongo.py
  • topology:
├── scripts
│   ├── pipeline_related.py
│   ├── README.md
│   ├── storehouse_to_mongo.py
│   ├── unset_active.py
│   └── vlan_pool.py
  • mef_eline:
├── scripts
│   ├── 001_rename_priority.py
│   ├── 002_unset_spf_attribute.py
│   ├── 003_vlan_type_string.py
│   ├── README.md
│   └── storehouse_to_mongo.py

On each NApp scripts README.md we have further information about each script, which is great to understand when to use. But it's becoming difficult to link this on release notes and trying to understand on which version upgrade the script is needed for. Also, the ordering of the script isn't immediately clear, on mef_eline we started to use a 3-digit as a prefix which helped out, but this pattern is only be used on mef_eline so far.

This issue is to discuss a proposal for it, and then sticky with a new pattern, it needs to solve the following problems:

  • It must be clear which version the script is needed for.
  • It must be clear and easy to derive the order of the execution of the scripts.
@viniarck viniarck added enhancement New feature or request 2023.2 Kytos-ng 2023.2 priority_medium Medium priority labels Sep 28, 2023
@viniarck
Copy link
Member Author

viniarck commented Oct 2, 2023

Here's naming convention for folders and scripts to easily identify which version, and sequence, and how to use:

scripts/db/<version>/README.md
scripts/db/<version>/<\d+{3}>_script_name.py
  • Each NApp will have a scripts/db/<version> dir for DB migrations scripts of a specific version. The README.md will provide more brief information about each script and how to execute them.
  • Each DB script name will follow this name pattern <\d{3}>_script_name.py, with a 3-digit prefix starting with 000, just so the sequence of execution can be followed accordingly if needed (since the versions also increase monotonically, even if you have to traverse all the dirs you can derive a chronological sequence regardless of file metadata attrs).
  • scripts/db/README.md can also provide general information about general pre-requisites that applies for all scripts to avoid repetition on README files.

Here's an example of how flow_manager's current scripts would be organized with this convention:

scripts/db/2023.1.0/README.md
scripts/db/2023.1.0/000_drop_compound_index.py.
scripts/db/2023.1.0/001_pipeline_related.py.
scripts/db/2022.3.3/README.md
scripts/db/2022.3.3/000_drop_compound_index.py
scripts/db/2022.2.0/README.md
scripts/db/2022.2.0/000_storehouse_to_mongo.py

Let me know if you have any other suggestions to consider, otherwise we'll go with this one

@Ktmi
Copy link
Contributor

Ktmi commented Oct 2, 2023

We should also consider including DB version information somewhere in the DB. It could be as simple as a collection, with entries containing the name of a collection, and the version id of that collection. We can then use that information during migrations to validate the migration.

@viniarck
Copy link
Member Author

viniarck commented Oct 2, 2023

We should also consider including DB version information somewhere in the DB. It could be as simple as a collection, with entries containing the name of a collection, and the version id of that collection. We can then use that information during migrations to validate the migration.

That's a good idea, @Ktmi. We could reserve a migrations collection for it where each object would have this structure:

class MigrationDoc(BaseModel):
  napp_id: str 
  id: str # unique mapped to underlying Mongo doc _id
  collection: str
  inserted_at: datetime
  updated_at: datetime

Where id will be a uuid.uuid4() str value to facilitate for BSON serialization, which pre-defined when creating a DB script. To find if a migration has been applied it's a matter of finding the _id, and then to also get which migrations have been applied it's matter of fetching the entire collection sorting by created_at.

Another benefit of keeping track of the id is that it also facilitate scripts to be idempotent by simply querying first if the _id has already been inserted. If any script needs to be augmented for some unexpected reason, then a new script will be generated pretty much this collection will behave like an immutable collection when it comes to its objects. In the core we can export a controller managing this collection for NApps to use. Let me know if you have any other suggestions.

@viniarck viniarck removed the 2023.2 Kytos-ng 2023.2 label Oct 30, 2023
@viniarck
Copy link
Member Author

viniarck commented Nov 1, 2023

We'll move forward with the proposed approach.

@viniarck
Copy link
Member Author

viniarck commented May 2, 2024

The ideia with the MigrationDoc document in a migrations collection, is to allow to easily keep track of which migrations have already been applied, for instance, let's say these migrations have been applied on kytos/flow_manager, maybe we can also add a optional description string field:

rs0 [direct: primary] napps> db.migrations.find()
[
  {
    _id: 'cdfe8cd0-58f7-4942-a6e1-08c2d72c45be',
    napp_id: 'kytos/flow_manager',
    collection: 'flows',
    updated_at: ISODate("2024-05-02T18:09:29.603Z"),
    inserted_at: ISODate("2024-05-02T18:09:29.603Z")
  },
  {
    _id: 'f26dc0f5-207a-4fa8-ac51-4b198e84c1ca',
    napp_id: 'kytos/flow_manager',
    collection: 'flows',
    updated_at: ISODate("2024-05-02T18:10:06.992Z")
    inserted_at: ISODate("2024-05-02T18:10:06.992Z")
  }
]

So, if an user is trying to execute a new flow_manager DB script on its scripts folder, then all you have to do before performing any updates or writes to the collection is to check if a given _id hasn't been inserted yet. This means that whenever we also create/introduce a new DB script this script should also insert in this collection once the migration has succeeded. So on kytos core we could also provide some crud ops for this collection. But then in the DB scripts we can also provide a sort of a force option (implemented in the script) if they want to overwrite it anyway, most of the cases they wont.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority_medium Medium priority
Projects
None yet
Development

No branches or pull requests

2 participants