-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Associate PublishableEntities with Collections [FC-0062] #216
Changes from 6 commits
534e8d9
1fda3e5
a2cd234
7ec9ba5
0e15dd0
00111bc
b723d79
2fe2e70
e03daaa
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
""" | ||
Open edX Learning ("Learning Core"). | ||
""" | ||
__version__ = "0.11.2" | ||
__version__ = "0.11.3" |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,19 +4,26 @@ | |
from __future__ import annotations | ||
|
||
from django.db.models import QuerySet | ||
from django.db.transaction import atomic | ||
from django.utils import timezone | ||
|
||
from .models import Collection | ||
from ..publishing import api as publishing_api | ||
from ..publishing.models import PublishableEntity | ||
from .models import Collection, CollectionPublishableEntity | ||
|
||
# The public API that will be re-exported by openedx_learning.apps.authoring.api | ||
# is listed in the __all__ entries below. Internal helper functions that are | ||
# private to this module should start with an underscore. If a function does not | ||
# start with an underscore AND it is not in __all__, that function is considered | ||
# to be callable only by other apps in the authoring package. | ||
__all__ = [ | ||
"add_to_collections", | ||
"create_collection", | ||
"get_collection", | ||
"get_collections", | ||
"get_learning_package_collections", | ||
"get_entity_collections", | ||
"remove_from_collections", | ||
"update_collection", | ||
] | ||
|
||
|
@@ -26,16 +33,28 @@ def create_collection( | |
title: str, | ||
created_by: int | None, | ||
description: str = "", | ||
entities_qset: QuerySet[PublishableEntity] = PublishableEntity.objects.none(), # default to empty set, | ||
) -> Collection: | ||
""" | ||
Create a new Collection | ||
""" | ||
collection = Collection.objects.create( | ||
learning_package_id=learning_package_id, | ||
title=title, | ||
created_by_id=created_by, | ||
description=description, | ||
) | ||
|
||
with atomic(): | ||
collection = Collection.objects.create( | ||
learning_package_id=learning_package_id, | ||
title=title, | ||
created_by_id=created_by, | ||
description=description, | ||
) | ||
|
||
added = add_to_collections( | ||
Collection.objects.filter(id=collection.id), | ||
entities_qset, | ||
created_by, | ||
) | ||
if added: | ||
collection.refresh_from_db() # fetch updated modified date | ||
|
||
return collection | ||
|
||
|
||
|
@@ -70,6 +89,95 @@ def update_collection( | |
return collection | ||
|
||
|
||
def add_to_collections( | ||
collections_qset: QuerySet[Collection], | ||
entities_qset: QuerySet[PublishableEntity], | ||
created_by: int | None = None, | ||
) -> int: | ||
""" | ||
Adds a QuerySet of PublishableEntities to a QuerySet of Collections. | ||
|
||
Records are created in bulk, and so integrity errors are deliberately ignored: they indicate that the entity(s) | ||
have already been added to the collection(s). | ||
|
||
Returns the number of entities added (including any that already exist). | ||
""" | ||
collection_entities = [] | ||
entity_ids = entities_qset.values_list("pk", flat=True) | ||
collection_ids = collections_qset.values_list("pk", flat=True) | ||
|
||
for collection_id in collection_ids: | ||
for entity_id in entity_ids: | ||
collection_entities.append( | ||
CollectionPublishableEntity( | ||
collection_id=collection_id, | ||
entity_id=entity_id, | ||
created_by_id=created_by, | ||
) | ||
) | ||
|
||
created = [] | ||
if collection_entities: | ||
created = CollectionPublishableEntity.objects.bulk_create( | ||
collection_entities, | ||
ignore_conflicts=True, | ||
) | ||
|
||
# Update the modified date for all the provided Collections. | ||
# (Ignoring conflicts means we don't know which ones were modified.) | ||
collections_qset.update(modified=timezone.now()) | ||
pomegranited marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
return len(created) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this a useful number to return? If it includes the conflicts, doesn't it always return There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's true.. What should this method return instead? I had thought that I should filter out any entities that aren't in the same learning package as the collection, so that would change this count. But since there's potentially more than one collection/learning package, I did that filtering in the REST API. Do you think it makes sense to enforce the "a collection may only contain entities in their learning package" here instead? It'll mean fetching more data in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, that's a fantastic callout. We should absolutely enforce it at the Learning Core layer, so people don't shoot themselves in the foot. It's very possible we'll eventually allow folks to add stuff from other LearningPackages into Collections, but we're going to have to be very deliberate when we do and think through the lifecycle issues when those LearningPackages disappear. We definitely don't want to let it happen by accident because other apps used Learning Core directly instead of going through the Libraries REST API. |
||
|
||
|
||
def remove_from_collections( | ||
collections_qset: QuerySet[Collection], | ||
entities_qset: QuerySet[PublishableEntity], | ||
pomegranited marked this conversation as resolved.
Show resolved
Hide resolved
|
||
) -> int: | ||
""" | ||
Removes a QuerySet of PublishableEntities from a QuerySet of Collections. | ||
|
||
PublishableEntities are deleted from each Collection, in bulk. | ||
|
||
Collections which had entities removed are marked with modified=now(). | ||
|
||
Returns the total number of entities deleted. | ||
""" | ||
total_deleted = 0 | ||
entity_ids = entities_qset.values_list("pk", flat=True) | ||
collection_ids = collections_qset.values_list("pk", flat=True) | ||
modified_collection_ids = [] | ||
|
||
for collection_id in collection_ids: | ||
num_deleted, _ = CollectionPublishableEntity.objects.filter( | ||
collection_id=collection_id, | ||
entity__in=entity_ids, | ||
).delete() | ||
|
||
if num_deleted: | ||
modified_collection_ids.append(collection_id) | ||
|
||
total_deleted += num_deleted | ||
|
||
# Update the modified date for the affected Collections | ||
Collection.objects.filter(id__in=modified_collection_ids).update(modified=timezone.now()) | ||
|
||
return total_deleted | ||
|
||
|
||
def get_entity_collections(learning_package_id: int, entity_key: str) -> QuerySet[Collection]: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [speculation (please do not block PR for this)]: It might be nice if we could a custom Manager on the through-model so that if someone has an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh absolutely -- I am interested to see the other use cases for Collections besides what we're supporting here. But will leave that for a day when it's needed, thank you :) |
||
""" | ||
Get all collections in the given learning package which contain this entity. | ||
|
||
Only enabled collections are returned. | ||
""" | ||
entity = publishing_api.get_publishable_entity_by_key( | ||
learning_package_id, | ||
key=entity_key, | ||
) | ||
return entity.collections.filter(enabled=True).order_by("pk") | ||
|
||
|
||
def get_learning_package_collections(learning_package_id: int) -> QuerySet[Collection]: | ||
pomegranited marked this conversation as resolved.
Show resolved
Hide resolved
|
||
""" | ||
Get all collections for a given learning package | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
# Generated by Django 4.2.15 on 2024-08-29 00:05 | ||
|
||
import django.db.models.deletion | ||
from django.conf import settings | ||
from django.db import migrations, models | ||
|
||
import openedx_learning.lib.validators | ||
|
||
|
||
class Migration(migrations.Migration): | ||
|
||
dependencies = [ | ||
migrations.swappable_dependency(settings.AUTH_USER_MODEL), | ||
('oel_publishing', '0002_alter_learningpackage_key_and_more'), | ||
('oel_collections', '0002_remove_collection_name_collection_created_by_and_more'), | ||
] | ||
|
||
operations = [ | ||
migrations.CreateModel( | ||
name='CollectionPublishableEntity', | ||
fields=[ | ||
('id', models.BigAutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')), | ||
('created', models.DateTimeField(auto_now_add=True, validators=[openedx_learning.lib.validators.validate_utc_datetime])), | ||
('collection', models.ForeignKey(on_delete=django.db.models.deletion.CASCADE, to='oel_collections.collection')), | ||
('created_by', models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.SET_NULL, to=settings.AUTH_USER_MODEL)), | ||
('entity', models.ForeignKey(on_delete=django.db.models.deletion.RESTRICT, to='oel_publishing.publishableentity')), | ||
], | ||
), | ||
migrations.AddField( | ||
model_name='collection', | ||
name='entities', | ||
field=models.ManyToManyField(related_name='collections', through='oel_collections.CollectionPublishableEntity', to='oel_publishing.publishableentity'), | ||
), | ||
migrations.AddConstraint( | ||
model_name='collectionpublishableentity', | ||
constraint=models.UniqueConstraint(fields=('collection', 'entity'), name='oel_collections_cpe_uniq_col_ent'), | ||
), | ||
] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,68 @@ | ||
""" | ||
Core models for Collections | ||
|
||
TLDR Guidelines: | ||
1. DO NOT modify these models to store full version snapshots. | ||
2. DO NOT use these models to try to reconstruct historical versions of | ||
Collections for fast querying. | ||
|
||
If you're trying to do either of these things, you probably want a new model or | ||
app. For more details, read on. | ||
Comment on lines
+9
to
+10
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This adds a lot of context!! Thank you! ❤️ There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh definitely.. @ormsbee 's notes are always worth preserving :) |
||
|
||
The goal of these models is to provide a lightweight method of organizing | ||
PublishableEntities. The first use case for this is modeling the structure of a | ||
v1 Content Library within a LearningPackage. This is what we'll use the | ||
Collection model for. | ||
|
||
An important thing to note here is that Collections are *NOT* publishable | ||
entities themselves. They have no "Draft" or "Published" versions. Collections | ||
are never "published", though the things inside of them are. | ||
|
||
When a LibraryContentBlock makes use of a Content Library, it copies all of | ||
the items it will use into the Course itself. It will also store a version | ||
on the LibraryContentBlock -- this is a MongoDB ObjectID in v1 and an integer in | ||
v2 Libraries. Later on, the LibraryContentBlock will want to check back to see | ||
if any updates have been made, using its version as a key. If a new version | ||
exists, the course team has the option of re-copying data from the Library. | ||
|
||
ModuleStore based v1 Libraries and Blockstore-based v2 libraries both version | ||
the entire library in a series of snapshots. This makes it difficult to have | ||
very large libraries, which is an explicit goal for Modular Learning. In | ||
Learning Core, we've moved to tracking the versions of individual Components to | ||
address this issue. But that means we no longer have a single version indicator | ||
for "has anything here changed"? | ||
|
||
We *could* have put that version in the ``publishing`` app's PublishLog, but | ||
that would make it too broad. We want the ability to eventually collapse many v1 | ||
Libraries into a single Learning Core backed v2 Library. If we tracked the | ||
versioning in only a central location, then we'd have many false positives where | ||
the version was bumped because something else in the Learning Package changed. | ||
So instead, we're creating a new Collection model inside the LearningPackage to | ||
track that concept. | ||
|
||
A critical takeaway is that we don't have to store snapshots of every version of | ||
a Collection, because that data has been copied over by the LibraryContentBlock. | ||
We only need to store the current state of the Collection, and increment the | ||
version numbers when changes happen. This will allow the LibraryContentBlock to | ||
check in and re-copy over the latest version if the course team desires. | ||
|
||
That's why these models only store the current state of a Collection. Unlike the | ||
``components`` app, ``collections`` does not store fully materialized snapshots | ||
of past versions. This is done intentionally in order to save space and reduce | ||
the cost of writes. Collections may grow to be very large, and we don't want to | ||
be writing N rows with every version, where N is the number of | ||
PublishableEntities in a Collection. | ||
|
||
MVP of these models does not store changesets, but we can add this when there's a | ||
use case for it. The number of rows in these changesets would grow in proportion | ||
to the number of things that are actually changing (instead of copying over | ||
everything on every version). This is could be used to make it easier to figure out | ||
what changed between two given versions of a Collection. A LibraryContentBlock | ||
in a course would have stored the version number of the last time it copied data | ||
from the Collection, and we can eventually surface this data to the user. Note that | ||
while it may be possible to reconstruct past versions of Collections based off of | ||
this changeset data, it's going to be a very slow process to do so, and it is | ||
strongly discouraged. | ||
""" | ||
from __future__ import annotations | ||
|
||
|
@@ -9,10 +72,11 @@ | |
|
||
from ....lib.fields import MultiCollationTextField, case_insensitive_char_field | ||
from ....lib.validators import validate_utc_datetime | ||
from ..publishing.models import LearningPackage | ||
from ..publishing.models import LearningPackage, PublishableEntity | ||
|
||
__all__ = [ | ||
"Collection", | ||
"CollectionPublishableEntity", | ||
] | ||
|
||
|
||
|
@@ -79,6 +143,12 @@ class Collection(models.Model): | |
], | ||
) | ||
|
||
entities: models.ManyToManyField[PublishableEntity, "CollectionPublishableEntity"] = models.ManyToManyField( | ||
PublishableEntity, | ||
through="CollectionPublishableEntity", | ||
related_name="collections", | ||
) | ||
|
||
class Meta: | ||
verbose_name_plural = "Collections" | ||
indexes = [ | ||
|
@@ -96,3 +166,42 @@ def __str__(self) -> str: | |
User-facing string representation of a Collection. | ||
""" | ||
return f"<{self.__class__.__name__}> ({self.id}:{self.title})" | ||
|
||
|
||
class CollectionPublishableEntity(models.Model): | ||
""" | ||
Collection -> PublishableEntity association. | ||
""" | ||
collection = models.ForeignKey( | ||
Collection, | ||
on_delete=models.CASCADE, | ||
) | ||
entity = models.ForeignKey( | ||
PublishableEntity, | ||
on_delete=models.RESTRICT, | ||
) | ||
created_by = models.ForeignKey( | ||
settings.AUTH_USER_MODEL, | ||
on_delete=models.SET_NULL, | ||
null=True, | ||
blank=True, | ||
) | ||
created = models.DateTimeField( | ||
auto_now_add=True, | ||
validators=[ | ||
validate_utc_datetime, | ||
], | ||
) | ||
|
||
pomegranited marked this conversation as resolved.
Show resolved
Hide resolved
pomegranited marked this conversation as resolved.
Show resolved
Hide resolved
|
||
class Meta: | ||
constraints = [ | ||
# Prevent race conditions from making multiple rows associating the | ||
# same Collection to the same Entity. | ||
models.UniqueConstraint( | ||
fields=[ | ||
"collection", | ||
"entity", | ||
], | ||
name="oel_collections_cpe_uniq_col_ent", | ||
) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question] Is it a common use case that we'll want to add to multiple Collections simultaneously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't think so, but that's what was in the ticket: openedx/modular-learning#227
@bradenmacdonald can you comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was just trying to indicate that the API needs to support having an item linked to multiple collections, but it doesn't necessarily have to support linking multiple collections to an item in a single API call. If it simplifies the implementation to only operate by adding/removing a single collection at a time, that's fine with me.
There is only one use case I can think of where we might want to associate/dissociate from multiple collections simultaneously and that's import/export. But we haven't planned that part out yet at all.