Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make dataset alias comparisons case-insensitive #537

Merged

Conversation

rmn-boiko
Copy link
Contributor

@rmn-boiko rmn-boiko commented Mar 5, 2024

Description

Closes: Make dataset names and aliases comparisons case-insensitive

Checklist before requesting a review

  • CHANGELOG.md updated
  • API changes are backwards-compatible
  • Workspace layout changes include a migration
  • Documentation update PR: <link or N/A>
  • Dataset pipelines update scheduled if needed

@rmn-boiko rmn-boiko self-assigned this Mar 5, 2024
@rmn-boiko rmn-boiko force-pushed the feature/469-alias-case-insensitive-comparisons branch from 984c4a0 to dd03903 Compare March 5, 2024 16:16
@rmn-boiko rmn-boiko force-pushed the feature/469-alias-case-insensitive-comparisons branch 2 times, most recently from 0ab958b to 4f0a219 Compare March 5, 2024 17:42
@rmn-boiko rmn-boiko force-pushed the feature/469-alias-case-insensitive-comparisons branch from 4f0a219 to 24ff5c4 Compare March 5, 2024 17:54
@rmn-boiko rmn-boiko marked this pull request as ready for review March 5, 2024 18:14
Copy link
Contributor

@zaychenko-sergei zaychenko-sergei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I can see that you've touched the casing of accounts as well. On today account names are case-sensitive in all implementations of AuthenticationProvider trait. These would need to take into account that "Wasya" and "wasya" are the same accounts. Before you start digging into this direction, plz make a quick check if GitHub accounts are case-sensitive or not. It seems they are case-insensitive, but double check.

  2. I think what is done in the repositories implementation is not enough. You need to make methods like resolve_dataset_ref to be case-insensitive as well. Note that in multi-tenant repos account names are used in datasets folder names, and in single-tenant local repo the dataset name is used as name of folder. Basically, the entire infrastructure of dataset repositories should be lowercasing all objects and references. This, btw, would fix some potential bugs on Windows platform, where file names are case-insensitive.

  3. I'd like to see more tests. Repository changes for sure. Might be interesting to ensure our HTTP and GraphQL APIs are properly aligned with casing - in HTTP case you need to be careful with URL matching middlewares.

Copy link
Member

@s373r s373r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a minor comment added

@rmn-boiko rmn-boiko force-pushed the feature/469-alias-case-insensitive-comparisons branch 2 times, most recently from 7c158aa to bce4a15 Compare March 7, 2024 19:44
@rmn-boiko rmn-boiko force-pushed the feature/469-alias-case-insensitive-comparisons branch from bce4a15 to c4788ba Compare March 7, 2024 20:03
@zaychenko-sergei
Copy link
Contributor

Summary on the decisive discussion:

  • a note on case sensitivity of name components was added explicitly to ODF protocol
  • we should preserve original casing of the name components everywhere, but make case-insensitive lookups and case-insensitive name uniqueness checks, as well as ignore-case patterns
  • note that RepoName component is also important, this wasn't mentioned before, as we've only touched DatasetName and AccountName.

@rmn-boiko rmn-boiko force-pushed the feature/469-alias-case-insensitive-comparisons branch from 52c5e93 to 9594777 Compare March 27, 2024 18:14
@rmn-boiko rmn-boiko force-pushed the feature/469-alias-case-insensitive-comparisons branch from d2160c0 to 37dbbd2 Compare March 27, 2024 19:18
@rmn-boiko rmn-boiko requested a review from sergiimk March 27, 2024 21:39
@rmn-boiko rmn-boiko force-pushed the feature/469-alias-case-insensitive-comparisons branch from 6162ef5 to b657167 Compare March 31, 2024 17:22
Copy link
Member

@sergiimk sergiimk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things mostly look good, but I have doubts in dataset_repository_local_fs.rs. I suggested one extra test to see if my worries are correct.

Sorry you had to deal with all the ugly complexity of the local fs repo - I really hope it will soon go away after we stop storing datasets in account/name directories.

@rmn-boiko rmn-boiko force-pushed the feature/469-alias-case-insensitive-comparisons branch 6 times, most recently from 601690b to 60267d6 Compare April 2, 2024 19:54
@rmn-boiko rmn-boiko force-pushed the feature/469-alias-case-insensitive-comparisons branch from 60267d6 to 7a2eb00 Compare April 2, 2024 20:16
@rmn-boiko rmn-boiko requested a review from sergiimk April 2, 2024 20:27
@zaychenko-sergei
Copy link
Contributor

Problem:

image

I would expect that filter works without case sensitivity

@zaychenko-sergei
Copy link
Contributor

A similar problem
image

@zaychenko-sergei zaychenko-sergei merged commit 5e0ccd9 into master Apr 3, 2024
6 checks passed
@zaychenko-sergei zaychenko-sergei deleted the feature/469-alias-case-insensitive-comparisons branch April 3, 2024 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make dataset names and aliases comparisons case-insensitive
4 participants