-
Notifications
You must be signed in to change notification settings - Fork 600
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fuzzy searching #1270
Comments
I'll add the little bit that I know to try to explain why The Git2 crate's text search vector (minus the Readme) looks like:
You'll notice that the values for I like the idea of a hybrid approach, but I'd be curious about how that would affect the query speed. This would be done by ordering on a function of the ts_rank_cd result and the other pieces(all-time downloads, etc). I'm curious if Diesel can do this. Alternatively, in this case, #1266 would have included the title into the ranking as all trigrams of the search were in the package title. This could still miss relevant searches though, so it's not a catch-all. As another alternative, you could search by keyword |
Yes, it can.
I'm happy to experiment. Can you give me some specific queries you'd like tried?
It is. https://crates.io/keywords/git. I'm definitely open to suggestions for better exposing that. |
Probably related: A search for "ssh" only returns the probably most mature |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
I recently searched crates.io for "git" using the default sorting of relevance. I expected to find the git2 crate, but instead found the git crate.
Below is a screenshot of the exact match currently found when searching with https://crates.io/search?q=git.
Searching for "git2" directly with https://crates.io/search?q=git2 produces the desired result with an exact match.
I took a look at the source for crates.io briefly last night and it looks like the search controller uses the PostgreSQL
ts_rank_cd
text search function for the default search. I'm not familiar enough with the Cover Density Ranking algorithm to explain why or whether this produces the results above, but that might be a starting point in digging deeper into this.Relevance seems like a tricky term here. The default search probably does produce the most relevant package from a text similarity standpoint, but not necessarily to me as a programmer looking for a git library to use. Maybe a hybrid approach that considers text relevance, all-time downloads, and recent downloads would produce something closer to what I expected.
The text was updated successfully, but these errors were encountered: