-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Brainstorming search results that are autosuggested and shown on results page #2421
Comments
It is actually a black box! Full text search is a complex problem and we solve it with the "fulltext" module of MySQL, our database system; some pretty arcane (but thorough) documentation is here: https://dev.mysql.com/doc/refman/5.7/en/fulltext-search.html It does seem we can tune/adjust it, though. There is, for example, a "natural language" option which attempts to algorithmically determine "relevance" -- https://dev.mysql.com/doc/refman/5.7/en/fulltext-natural-language.html We use this fulltext feature on this line: Line 29 in 47b3004
It does look like we could "turn on" natural language mode by making that say: Revision.where('MATCH(node_revisions.body, node_revisions.title) AGAINST(? IN NATURAL LANGUAGE MODE)', query) We may also need to then add ordering by Revision.select('node_revisions.body, node_revisions.title, MATCH(node_revisions.body, node_revisions.title) AGAINST("' + query.to_s + '" IN NATURAL LANGUAGE MODE) AS score')
.where('MATCH(node_revisions.body, node_revisions.title) AGAINST(? IN NATURAL LANGUAGE MODE)', query) It might take some testing out. Would you like to try this out? I have to point out that I do NOT know what will happen. The documentation for "natural language" says:
As to the second issue, --
How might we break this down a bit? Do you mean that you'd like to show a mix of types, or that you'd like to show explanatory information about what different types are? Thanks! |
I tested the above query and it does run, although again, I'm not super clear on how it works. But it'd be pretty easy to put it into production if you'd like! |
What I'd like to see in the auto-suggest is a list of search terms based on weight (popular, busy pages first). On the results page I would like to see keyword results weighted by relevance (popularity, whether the word in question is included in a tag or a title, etc), and then sorted by type (note, profile, question, comment, etc). I would then like to be able to search within the keyword results (say, I'm interested in spectrometers, but would like to narrow down my search to find examples of how they've been used in schools) |
Hi, Bronwen, thanks. Let's break this into separate features:
Thanks! This is super helpful. |
And for the second one up there, do you mean not "relevance" as is defined in my comment above about "natural language search" but a definition of popularity such as "likes" or "views"? |
I think we'd probably want to create a rubric for relevance could includes likes/views, but also weights results based on KIND of page (a wiki page with search term in the title might always show up higher on a list than, say, a comment). One example where we're struggling with kinds of results is a search for "open hour. On our website, this search brings up 15 research notes in the auto suggest, and two research notes on the keyword search, but none of them direct to our Open Hour page. I do think a popularity ranking would help with this, and might be simpler than introducing a semantic search feature, but I can see either offering improvements. When I perform the same search on google (without boolean operators), I see a list or results that starts with our main open hour page, followed by items tagged with "openhour" and "open-hour", followed by links to pages for individual open hours. This would seem to be a sensible rubric for page-type sorting (providing that it's still possible to browse or narrow searchers for all occurrences of a search term on our site) |
Cool - super helpful. I think there's probably a way to do a more complex
ranking (maybe not Google-level pageRank but something) however I wonder if
we took a few proposals and made them testable, and examined the results.
For example it'd be pretty easy to set up views-based or likes-based
ordering, and not much harder to do natural language relevance as I
outlined above. If we made an option to view results for a given search
query in all three, we could see which seems to work better for us.
If that sounds good, we can start those code changes and have something to
look at in a week or so; what do you think of that as a next step? We could
tackle this iteratively and look at more advanced search rubrics as a
follow-up?
Thanks!!
…On Thu, Mar 1, 2018, 10:02 AM bronwen9 ***@***.***> wrote:
I think we'd probably want to create a rubric for relevance could includes
likes/views, but also weights results based on KIND of page (a wiki page
with search term in the title might always show up higher on a list than,
say, a comment).
One example where we're struggling with *kinds* of results is a search
for "open hour. On our website, this search brings up 15 research notes in
the auto suggest, and two research notes on the keyword search, but none of
them direct to our Open Hour page. I do think a popularity ranking would
help with this, and might be simpler than introducing a semantic search
feature, but I can see either offering improvements.
When I perform the same search on google (without boolean operators), I
see a list or results that starts with our main open hour page, followed by
items tagged with "openhour" and "open-hour", followed by links to pages
for individual open hours. This would seem to be a sensible rubric for
page-type sorting (providing that it's still possible to browse or narrow
searchers for all occurrences of a search term on our site)
[image: openhour]
<https://user-images.githubusercontent.com/8331717/36850950-07ea6d18-1d36-11e8-8ed6-e80faf55bba4.gif>
[image: openhour2]
<https://user-images.githubusercontent.com/8331717/36851397-1cfe5466-1d37-11e8-89bc-bc21bf98c4a7.gif>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2421 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABfJxGXx4qzmp9kf39jrk3Rly_N9qa7ks5taA05gaJpZM4SXKWA>
.
|
Ah, sorry for the late response, but I think that it would be great to try some of these. I think at some point we're going to need the ability to work with boolean operators (whether that's through additional search fields or allowing for more than one word or phrase in the field), but I think any of these options would help get us closer to understanding where things are going haywire in the existing search. Plus-one to trying all three! |
Work now ongoing in #2518 -- this will result in:
Soon! (update: now live on the site!) |
Hi, this needs some review and reorganization now that the above searches work -- @bronwen9 and @ebarry -- thanks for your help so far! Some additional steps might be:
Also just cleaning up the lead of this issue a bit or starting a new one with our next steps clearly laid out would be helpful! Thanks! |
As the dynamic search work is upcoming (as per your original schedule), I'm not sure if this one is on your radar, @milaaraujo and @stefannibrasil -- what do you think? |
we have some few things to finish this week, we are planning to start working on improving the search next week! |
I have some notes to share with you, but I need to organize them better before sharing with you xD |
So I left some maybe not super helpful comments on #3286 -- and just pulling it back here, I want to highlight that one of the questions we try to answer may need to be:
Make sense? |
Update: this is a long conversation and there are some next steps being broken out. Please continue to use this issue for brainstorming! Thanks :)
Original issue continues below:
Please describe the problem
The system by which autosuggested results seems to choose and rank content suggestions is mysterious, and seems like a black box.
Autosuggested results have a display limit of 15 assorted content types, but do not provide an overview of Public Lab resources on a topic.
I expect to understand what the results mean.
Please show us where to look
The Search box in the menu bar
The text was updated successfully, but these errors were encountered: