Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(tags): handle case-insensitive tags and remove orphans #1937

Merged
merged 5 commits into from
Feb 23, 2024

Conversation

laurent-laporte-pro
Copy link
Contributor

What's new

  • Change the search and update methods in StudyMetadataRepository class to handle case-insensitive tags and remove orphans.
  • Update the Alembic migration script to handle case-insensitive tags.
  • Change the filtering in the font-end to handle case-insensitive tags.

skamril
skamril previously approved these changes Feb 19, 2024
Comment on lines 280 to 300
def update_tags(self, study: Study, new_tags: t.Sequence[str]) -> None:
"""
Updates the tags associated with a given study in the database,
replacing existing tags with new ones.
replacing existing tags with new ones (case-insensitive).

Args:
study: The pre-existing study to be updated with the new tags.
new_tags: The new tags to be associated with the input study in the database.
"""
existing_tags = self.session.query(Tag).filter(Tag.label.in_(new_tags)).all()
new_labels = set(new_tags) - set([tag.label for tag in existing_tags])
study.tags = [Tag(label=tag) for tag in new_labels] + existing_tags
new_upper_tags = {tag.upper(): tag for tag in new_tags}
existing_tags = self.session.query(Tag).filter(func.upper(Tag.label).in_(new_upper_tags)).all()
for tag in existing_tags:
if tag.label.upper() in new_upper_tags:
new_upper_tags.pop(tag.label.upper())
study.tags = [Tag(label=tag) for tag in new_upper_tags.values()] + existing_tags
self.session.merge(study)
self.session.commit()
# Delete any tag that is not associated with any study.
# Note: If tags are to be associated with objects other than Study, this code must be updated.
self.session.query(Tag).filter(~Tag.studies.any()).delete(synchronize_session=False) # type: ignore
self.session.commit()
Copy link
Member

@hdinia hdinia Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not account for whitespaces in this normalization logic, which can lead to unintended duplicates. For example:
" Test","Test","Test " => will all pass and return => " TEST","Test","test "

Also inputs are mutated , and this is a bad practice regarding UX. if a user enters a tag as "NodeJs" expecting case sensitivity to be preserved, they might be surprised or confused to see it displayed as "NODEJS" in the application.

The same issue arises with special chars, if a user inputs "&Test", you will end up with another duplicate, if we dont consider this a duplicate then its ok.

Please add whitespace trimming and special character normalization to the tag processing logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

J'ai modifié le point d'accès de l'API PUT /v1/studies/{uuid} afin de normaliser les espaces des tags. Je préfère effectuer cette normalisation en amont, au niveau du endpoint, plutôt qu'en aval, au moment de l'enregistrement des tags dans la base de données. Cela permet notamment de vérifier la longueur des tags et de renvoyer une erreur 422 en cas de longueur incorrecte.

Je ne comprends pas ce que tu veux dire par "Also inputs are mutated". En fait, dans l'interface web, il serait nécessaire d'améliorer la déduplication des tags en les rendant insensibles à la casse et en normalisant les espaces. Actuellement, la déduplication fonctionne uniquement de manière sensible à la casse et sans normalisation des espaces. C'est quelque chose que je ne sais pas faire en React. Si tu peux t'en charger, tu es le bienvenu.

Par contre, je pense qu'il ne faut pas trop en faire au sujet des tags et que permettre les caractères spéciaux n'est pas pénalisant.

Copy link
Member

@hdinia hdinia Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    def test_tag_capitalization_on_update(self, client: TestClient, user_access_token: str,
                                             study_id: str) -> None:
        """
        Test to handle specific edge cases like "test" mutating to "Test" or "node" to "Node".
        """

        # Test cases for specific mutations
        test_cases = [("test", "Test"), ("node", "Node")]
        for original_tag, expected_mutation in test_cases:
            res = client.put(
                f"/v1/studies/{study_id}",
                headers={"Authorization": f"Bearer {user_access_token}"},
                json={"tags": [original_tag]},
            )
            assert res.status_code == 200, res.json()
            actual = res.json()
            assert set(actual["tags"]) == {
                expected_mutation}, f"Expected {expected_mutation} but got {set(actual['tags'])}"

Tu comprendras mieux avec ce test qui reproduit l'anomalie. C'est un Edge case et cela ne break rien (juste bizarre niveau UX) mais au moins vous êtes au courant que c'est la.

note: il y a aussi des cas ou l'API renvoie du UPPER alors que l'input etais en case "mixte"

Edit: on a debeug avec @laurent-laporte-pro, c'est un cas un peu aléatoire on ne reproduit pas en TU donc surement dû à l'environnement de dev, tester sur le serveur d'intégration c'est OK on ne reproduit pas.

Copy link
Member

@hdinia hdinia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a little feedback in the hope that it will help:

  • User Input Mutation: Mutating user inputs can lead to unexpected UX issues, including perceived bugs and user confusion.

  • Incomplete Normalization: The proposed normalization solution does not adequately address edge cases (whitespace)

Recommendations:

  • Implement Validation: Instead of normalizing inputs, validate them for duplicates and format issues directly at the point of entry. We may need to resolve existing duplicates as well. In this case, normalization can help, or a database migration could be implemented to remove all duplicates.
  • This validation will occur both on the frontend (user input checks and instant feedback, as shown in the screenshot below) and on the backend (checking against database values).

if needed I can commit the front-end part, let me know

Example:
image

  • Preserve User Input Integrity: This approach maintains user input avoiding UX issues and bug-like behavior.

  • Adopting a validation-centric strategy would significantly improve user experience by providing clear, immediate feedback (error messages) and avoiding the pitfalls of input mutation.

Could you provide further clarification on the rationale behind the current approach? I apologize if I've misunderstood any aspects of the implementation.

mabw-rte
mabw-rte previously approved these changes Feb 19, 2024
- Change the search and update methods in `StudyMetadataRepository` class to handle case-insensitive tags.
- Update the Alembic migration script to handle case-insensitive tags.
@laurent-laporte-pro laurent-laporte-pro merged commit 3396f2e into dev Feb 23, 2024
6 of 7 checks passed
@laurent-laporte-pro laurent-laporte-pro deleted the bugfix/handle-case-insensitive-tags branch February 23, 2024 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants