feat(tags): handle case-insensitive tags and remove orphans #1937

laurent-laporte-pro · 2024-02-16T17:30:49Z

What's new

Change the search and update methods in StudyMetadataRepository class to handle case-insensitive tags and remove orphans.
Update the Alembic migration script to handle case-insensitive tags.
Change the filtering in the font-end to handle case-insensitive tags.

alembic/versions/dae93f1d9110_populate_tag_and_study_tag_tables_with_.py

antarest/study/repository.py

hdinia · 2024-02-19T15:52:49Z

antarest/study/repository.py

    def update_tags(self, study: Study, new_tags: t.Sequence[str]) -> None:
        """
        Updates the tags associated with a given study in the database,
-        replacing existing tags with new ones.
+        replacing existing tags with new ones (case-insensitive).

        Args:
            study: The pre-existing study to be updated with the new tags.
            new_tags: The new tags to be associated with the input study in the database.
        """
-        existing_tags = self.session.query(Tag).filter(Tag.label.in_(new_tags)).all()
-        new_labels = set(new_tags) - set([tag.label for tag in existing_tags])
-        study.tags = [Tag(label=tag) for tag in new_labels] + existing_tags
+        new_upper_tags = {tag.upper(): tag for tag in new_tags}
+        existing_tags = self.session.query(Tag).filter(func.upper(Tag.label).in_(new_upper_tags)).all()
+        for tag in existing_tags:
+            if tag.label.upper() in new_upper_tags:
+                new_upper_tags.pop(tag.label.upper())
+        study.tags = [Tag(label=tag) for tag in new_upper_tags.values()] + existing_tags
        self.session.merge(study)
        self.session.commit()
+        # Delete any tag that is not associated with any study.
+        # Note: If tags are to be associated with objects other than Study, this code must be updated.
+        self.session.query(Tag).filter(~Tag.studies.any()).delete(synchronize_session=False)  # type: ignore
+        self.session.commit()


You do not account for whitespaces in this normalization logic, which can lead to unintended duplicates. For example:
" Test","Test","Test " => will all pass and return => " TEST","Test","test "

Also inputs are mutated , and this is a bad practice regarding UX. if a user enters a tag as "NodeJs" expecting case sensitivity to be preserved, they might be surprised or confused to see it displayed as "NODEJS" in the application.

The same issue arises with special chars, if a user inputs "&Test", you will end up with another duplicate, if we dont consider this a duplicate then its ok.

Please add whitespace trimming and special character normalization to the tag processing logic.

J'ai modifié le point d'accès de l'API PUT /v1/studies/{uuid} afin de normaliser les espaces des tags. Je préfère effectuer cette normalisation en amont, au niveau du endpoint, plutôt qu'en aval, au moment de l'enregistrement des tags dans la base de données. Cela permet notamment de vérifier la longueur des tags et de renvoyer une erreur 422 en cas de longueur incorrecte.

Je ne comprends pas ce que tu veux dire par "Also inputs are mutated". En fait, dans l'interface web, il serait nécessaire d'améliorer la déduplication des tags en les rendant insensibles à la casse et en normalisant les espaces. Actuellement, la déduplication fonctionne uniquement de manière sensible à la casse et sans normalisation des espaces. C'est quelque chose que je ne sais pas faire en React. Si tu peux t'en charger, tu es le bienvenu.

Par contre, je pense qu'il ne faut pas trop en faire au sujet des tags et que permettre les caractères spéciaux n'est pas pénalisant.

def test_tag_capitalization_on_update(self, client: TestClient, user_access_token: str, study_id: str) -> None: """ Test to handle specific edge cases like "test" mutating to "Test" or "node" to "Node". """ # Test cases for specific mutations test_cases = [("test", "Test"), ("node", "Node")] for original_tag, expected_mutation in test_cases: res = client.put( f"/v1/studies/{study_id}", headers={"Authorization": f"Bearer {user_access_token}"}, json={"tags": [original_tag]}, ) assert res.status_code == 200, res.json() actual = res.json() assert set(actual["tags"]) == { expected_mutation}, f"Expected {expected_mutation} but got {set(actual['tags'])}"

Tu comprendras mieux avec ce test qui reproduit l'anomalie. C'est un Edge case et cela ne break rien (juste bizarre niveau UX) mais au moins vous êtes au courant que c'est la.

note: il y a aussi des cas ou l'API renvoie du UPPER alors que l'input etais en case "mixte"

Edit: on a debeug avec @laurent-laporte-pro, c'est un cas un peu aléatoire on ne reproduit pas en TU donc surement dû à l'environnement de dev, tester sur le serveur d'intégration c'est OK on ne reproduit pas.

hdinia

Here's a little feedback in the hope that it will help:

User Input Mutation: Mutating user inputs can lead to unexpected UX issues, including perceived bugs and user confusion.
Incomplete Normalization: The proposed normalization solution does not adequately address edge cases (whitespace)

Recommendations:

Implement Validation: Instead of normalizing inputs, validate them for duplicates and format issues directly at the point of entry. We may need to resolve existing duplicates as well. In this case, normalization can help, or a database migration could be implemented to remove all duplicates.
This validation will occur both on the frontend (user input checks and instant feedback, as shown in the screenshot below) and on the backend (checking against database values).

if needed I can commit the front-end part, let me know

Example:

Preserve User Input Integrity: This approach maintains user input avoiding UX issues and bug-like behavior.
Adopting a validation-centric strategy would significantly improve user experience by providing clear, immediate feedback (error messages) and avoiding the pitfalls of input mutation.

Could you provide further clarification on the rationale behind the current approach? I apologize if I've misunderstood any aspects of the implementation.

- Change the search and update methods in `StudyMetadataRepository` class to handle case-insensitive tags. - Update the Alembic migration script to handle case-insensitive tags.

…ngth

laurent-laporte-pro added front-end back-end labels Feb 16, 2024

laurent-laporte-pro added this to the v2.16.5 milestone Feb 16, 2024

laurent-laporte-pro requested review from skamril and mabw-rte February 16, 2024 17:30

laurent-laporte-pro self-assigned this Feb 16, 2024

pull-request-size bot added the size/L label Feb 16, 2024

laurent-laporte-pro force-pushed the bugfix/handle-case-insensitive-tags branch from 3052565 to e01c8de Compare February 16, 2024 17:41

skamril previously approved these changes Feb 19, 2024

View reviewed changes

mabw-rte requested changes Feb 19, 2024

View reviewed changes

alembic/versions/dae93f1d9110_populate_tag_and_study_tag_tables_with_.py Show resolved Hide resolved

antarest/study/repository.py Outdated Show resolved Hide resolved

laurent-laporte-pro dismissed skamril’s stale review via 54d6c24 February 19, 2024 13:00

laurent-laporte-pro force-pushed the bugfix/handle-case-insensitive-tags branch from e01c8de to 54d6c24 Compare February 19, 2024 13:00

hdinia requested changes Feb 19, 2024

View reviewed changes

mabw-rte previously approved these changes Feb 19, 2024

View reviewed changes

laurent-laporte-pro dismissed mabw-rte’s stale review via b7c063a February 19, 2024 19:55

laurent-laporte-pro added 4 commits February 22, 2024 09:49

feat(tags): handle case-insensitive tags

07e2d90

- Change the search and update methods in `StudyMetadataRepository` class to handle case-insensitive tags. - Update the Alembic migration script to handle case-insensitive tags.

feat(tags): remove orphan tags on update

a87af8f

feat(ui-tags): handle case-insensitive tags in filtering

bdc29cd

feat(api-tags): normalize whitespaces around tags and check string le…

ab7502f

…ngth

laurent-laporte-pro force-pushed the bugfix/handle-case-insensitive-tags branch from b7c063a to ab7502f Compare February 22, 2024 08:49

feat(study-search): improve access to session in update_tags

dc252a1

hdinia approved these changes Feb 23, 2024

View reviewed changes

laurent-laporte-pro merged commit 3396f2e into dev Feb 23, 2024
6 of 7 checks passed

laurent-laporte-pro deleted the bugfix/handle-case-insensitive-tags branch February 23, 2024 07:54

laurent-laporte-pro mentioned this pull request Feb 23, 2024

fix(tags): resolve issue with study.additional_data.patch attribute reading #1944

Merged

3 tasks

laurent-laporte-pro modified the milestones: v2.16.5, v2.16.6, v2.17 Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tags): handle case-insensitive tags and remove orphans #1937

feat(tags): handle case-insensitive tags and remove orphans #1937

laurent-laporte-pro commented Feb 16, 2024

hdinia Feb 19, 2024 •

edited

Loading

laurent-laporte-pro Feb 19, 2024

hdinia Feb 22, 2024 •

edited

Loading

hdinia left a comment •

edited

Loading

feat(tags): handle case-insensitive tags and remove orphans #1937

feat(tags): handle case-insensitive tags and remove orphans #1937

Conversation

laurent-laporte-pro commented Feb 16, 2024

hdinia Feb 19, 2024 • edited Loading

Choose a reason for hiding this comment

laurent-laporte-pro Feb 19, 2024

Choose a reason for hiding this comment

hdinia Feb 22, 2024 • edited Loading

Choose a reason for hiding this comment

hdinia left a comment • edited Loading

Choose a reason for hiding this comment

hdinia Feb 19, 2024 •

edited

Loading

hdinia Feb 22, 2024 •

edited

Loading

hdinia left a comment •

edited

Loading