-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(tags): handle case-insensitive tags and remove orphans #1937
Conversation
3052565
to
e01c8de
Compare
alembic/versions/dae93f1d9110_populate_tag_and_study_tag_tables_with_.py
Show resolved
Hide resolved
e01c8de
to
54d6c24
Compare
antarest/study/repository.py
Outdated
def update_tags(self, study: Study, new_tags: t.Sequence[str]) -> None: | ||
""" | ||
Updates the tags associated with a given study in the database, | ||
replacing existing tags with new ones. | ||
replacing existing tags with new ones (case-insensitive). | ||
|
||
Args: | ||
study: The pre-existing study to be updated with the new tags. | ||
new_tags: The new tags to be associated with the input study in the database. | ||
""" | ||
existing_tags = self.session.query(Tag).filter(Tag.label.in_(new_tags)).all() | ||
new_labels = set(new_tags) - set([tag.label for tag in existing_tags]) | ||
study.tags = [Tag(label=tag) for tag in new_labels] + existing_tags | ||
new_upper_tags = {tag.upper(): tag for tag in new_tags} | ||
existing_tags = self.session.query(Tag).filter(func.upper(Tag.label).in_(new_upper_tags)).all() | ||
for tag in existing_tags: | ||
if tag.label.upper() in new_upper_tags: | ||
new_upper_tags.pop(tag.label.upper()) | ||
study.tags = [Tag(label=tag) for tag in new_upper_tags.values()] + existing_tags | ||
self.session.merge(study) | ||
self.session.commit() | ||
# Delete any tag that is not associated with any study. | ||
# Note: If tags are to be associated with objects other than Study, this code must be updated. | ||
self.session.query(Tag).filter(~Tag.studies.any()).delete(synchronize_session=False) # type: ignore | ||
self.session.commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do not account for whitespaces in this normalization logic, which can lead to unintended duplicates. For example:
" Test","Test","Test "
=> will all pass and return => " TEST","Test","test "
Also inputs are mutated , and this is a bad practice regarding UX. if a user enters a tag as "NodeJs" expecting case sensitivity to be preserved, they might be surprised or confused to see it displayed as "NODEJS" in the application.
The same issue arises with special chars, if a user inputs "&Test", you will end up with another duplicate, if we dont consider this a duplicate then its ok.
Please add whitespace trimming and special character normalization to the tag processing logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
J'ai modifié le point d'accès de l'API PUT /v1/studies/{uuid}
afin de normaliser les espaces des tags. Je préfère effectuer cette normalisation en amont, au niveau du endpoint, plutôt qu'en aval, au moment de l'enregistrement des tags dans la base de données. Cela permet notamment de vérifier la longueur des tags et de renvoyer une erreur 422 en cas de longueur incorrecte.
Je ne comprends pas ce que tu veux dire par "Also inputs are mutated". En fait, dans l'interface web, il serait nécessaire d'améliorer la déduplication des tags en les rendant insensibles à la casse et en normalisant les espaces. Actuellement, la déduplication fonctionne uniquement de manière sensible à la casse et sans normalisation des espaces. C'est quelque chose que je ne sais pas faire en React. Si tu peux t'en charger, tu es le bienvenu.
Par contre, je pense qu'il ne faut pas trop en faire au sujet des tags et que permettre les caractères spéciaux n'est pas pénalisant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_tag_capitalization_on_update(self, client: TestClient, user_access_token: str,
study_id: str) -> None:
"""
Test to handle specific edge cases like "test" mutating to "Test" or "node" to "Node".
"""
# Test cases for specific mutations
test_cases = [("test", "Test"), ("node", "Node")]
for original_tag, expected_mutation in test_cases:
res = client.put(
f"/v1/studies/{study_id}",
headers={"Authorization": f"Bearer {user_access_token}"},
json={"tags": [original_tag]},
)
assert res.status_code == 200, res.json()
actual = res.json()
assert set(actual["tags"]) == {
expected_mutation}, f"Expected {expected_mutation} but got {set(actual['tags'])}"
Tu comprendras mieux avec ce test qui reproduit l'anomalie. C'est un Edge case et cela ne break rien (juste bizarre niveau UX) mais au moins vous êtes au courant que c'est la.
note: il y a aussi des cas ou l'API renvoie du UPPER alors que l'input etais en case "mixte"
Edit: on a debeug avec @laurent-laporte-pro, c'est un cas un peu aléatoire on ne reproduit pas en TU donc surement dû à l'environnement de dev, tester sur le serveur d'intégration c'est OK on ne reproduit pas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a little feedback in the hope that it will help:
-
User Input Mutation: Mutating user inputs can lead to unexpected UX issues, including perceived bugs and user confusion.
-
Incomplete Normalization: The proposed normalization solution does not adequately address edge cases (whitespace)
Recommendations:
- Implement Validation: Instead of normalizing inputs, validate them for duplicates and format issues directly at the point of entry. We may need to resolve existing duplicates as well. In this case, normalization can help, or a database migration could be implemented to remove all duplicates.
- This validation will occur both on the frontend (user input checks and instant feedback, as shown in the screenshot below) and on the backend (checking against database values).
if needed I can commit the front-end part, let me know
-
Preserve User Input Integrity: This approach maintains user input avoiding UX issues and bug-like behavior.
-
Adopting a validation-centric strategy would significantly improve user experience by providing clear, immediate feedback (error messages) and avoiding the pitfalls of input mutation.
Could you provide further clarification on the rationale behind the current approach? I apologize if I've misunderstood any aspects of the implementation.
- Change the search and update methods in `StudyMetadataRepository` class to handle case-insensitive tags. - Update the Alembic migration script to handle case-insensitive tags.
b7c063a
to
ab7502f
Compare
What's new
StudyMetadataRepository
class to handle case-insensitive tags and remove orphans.