Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve textSearch_profiles result #3134

Merged

Conversation

milaaraujo
Copy link
Collaborator

@milaaraujo milaaraujo commented Jul 24, 2018

Hey everyone!

We are opening this request to fix the issue #3069.

  1. We created a new endpoint recentprofiles that returns the profiles with most recent activity.
  2. We fixed recentPeople method. In one of our latest PR (Remove API Search endpoints duplication #3045) we did the following change that was not correct. Because in this part of the code the limit is for the node search. So we undid this change and added a count to control the number of profiles returned (line 290).

nodes = Node.all.order("changed DESC").limit(100).distinct -> nodes = Node.all.order("changed DESC").limit(srchString).distinct

  1. We don't think the name of the recentPeople method is appropriate for the peopleLocations endpoint, since the method returns the location of people with most recent activity with a specific tag. Can we change it to peopleLocations? So maybe we can use recentPeople for our new method.

We would like some feedback about our strategy here, thank you! :)

fixes #3069
@publiclab/reviewers @publiclab/soc @jywarren

@ghost ghost assigned milaaraujo Jul 24, 2018
@ghost ghost added the in progress label Jul 24, 2018
@milaaraujo
Copy link
Collaborator Author

milaaraujo commented Jul 24, 2018

Moved from #3126

Hey @jywarren, I implemented a new version of profiles endpoint following your idea.

If we call /api/srch/profiles?srchString=Jeff the api perform the default search.
If we call /api/srch/profiles?srchString=Jeff&order=recentdesc the api returns more recently updated profiles for matching text.
That way I do not take the risk of breaking anything that already uses the endpoint. I'm going to open a new PR with these changes.

@plotsbot
Copy link
Collaborator

plotsbot commented Jul 24, 2018

1 Warning
⚠️ New migrations added. Please update schema.rb.example by overwriting it with a copy of the up-to-date db/schema.rb. Also, be aware to preserve the MySQL-specific conditions for full-text indices.
1 Message
📖 @milaaraujo Thank you for your pull request! I’m here to help with some tips and recommendations. Please take a look at the list provided and help us review and accept your contribution! And don’t be discouraged if you see errors – we’re here to help.

Generated by 🚫 Danger

@milaaraujo
Copy link
Collaborator Author

Hey @jywarren, regarding your question from the other PR (# 3126):

Have you tried requesting both and comparing their responses?

I created a silly test to illustrate the difference between the two ways of using the profiles endpoint.

  1. First, I created users, in this order: user5, user4, user1, user2 and user3.
  2. Then I created an activity (question) for users, in this order: user5, user4 and user3.
  3. If I call http://localhost:3000/api/srch/profiles?srchString=test, then:

{
"items": [
{
"docId": 0,
"docType": "user",
"docUrl": "/profile/test3",
"docTitle": "test3",
"docSummary": "",
"docScore": 0
},
{
"docId": 0,
"docType": "user",
"docUrl": "/profile/test2",
"docTitle": "test2",
"docSummary": "",
"docScore": 0
},
{
"docId": 0,
"docType": "user",
"docUrl": "/profile/test1",
"docTitle": "test1",
"docSummary": "",
"docScore": 0
},
{
"docId": 0,
"docType": "user",
"docUrl": "/profile/test4",
"docTitle": "test4",
"docSummary": "",
"docScore": 0
},
{
"docId": 0,
"docType": "user",
"docUrl": "/profile/test5",
"docTitle": "test5",
"docSummary": "",
"docScore": 0
}
],
"srchParams": {
"srchString": "test",
"seq": null,
"showCount": null,
"pageNum": null,
"tagName": null
}
}

  1. If I call http://localhost:3000/api/srch/profiles?srchString=test&order=recentdesc, then:

{
"items": [
{
"docId": 0,
"docType": "user",
"docUrl": "/profile/test3",
"docTitle": "test3",
"docSummary": "",
"docScore": 0
},
{
"docId": 0,
"docType": "user",
"docUrl": "/profile/test4",
"docTitle": "test4",
"docSummary": "",
"docScore": 0
},
{
"docId": 0,
"docType": "user",
"docUrl": "/profile/test5",
"docTitle": "test5",
"docSummary": "",
"docScore": 0
}
],
"srchParams": {
"srchString": "test",
"seq": null,
"showCount": null,
"pageNum": null,
"tagName": null
}
}

@milaaraujo milaaraujo force-pushed the profiles-endpoint-most-recently-people branch 3 times, most recently from 3970a28 to 6c6c84a Compare July 25, 2018 05:53
@jywarren
Copy link
Member

Looks like a good moment to address this optimization issue too! #3147

def getRecentProfiles
sresult = DocList.new

nodes = Node.all.order("changed DESC").limit(100).distinct
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little worried about the speed of this query. Could we collect up the uids, then run:

User.find_by(uid: uids.uniq).where(status: 1)

Then, we may also want to get recent edits, so what if we instead of doing Node, we searched Revision - that way we get both notes and wiki revisions?

users = SrchScope.find_users(srchString, limit = nil)
users =
if order == "recentdesc"
filterMatchingUserName(getRecentProfiles, srchString)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I worry we might not get any results if we filter after selecting the most recent 100. Is there another way to do this, like finding matching usernames first, sorting by last activity using the user table -- i think there's a last activity column, but not sure when we update that...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might take some kind of clever join and group query, like, joining revisions to the node table, and sorting by the revision timestamp...

@jywarren
Copy link
Member

Hi, sorry, we've created a conflict while fixing #3148 - but it should be pretty easy to resolve. Thanks!

@stefannibrasil
Copy link

@jywarren agree with your comments! we will talk with our coaches today to help us improve the query, will keep you updated!

@stefannibrasil stefannibrasil force-pushed the profiles-endpoint-most-recently-people branch 2 times, most recently from c01cae5 to 45ccc31 Compare July 26, 2018 04:26
@stefannibrasil stefannibrasil changed the title New version of 'profiles' endpoint and fixing 'recentPeople' endpoint [WIP] version of 'profiles' endpoint and fixing 'recentPeople' endpoint Jul 26, 2018
@stefannibrasil
Copy link

@jywarren we started doing the sorting and ordering today, still need to work on some things, but we believe we are targeting your concerns on them.

We are passing the a order_by and sort_direction as a parameter to the search/profiles? so we can use it on them.

The approach that our coaches are suggesting us to so is searching the profiles, joining them with nodes and then ordering them by the latest activities (ASC or DESC).

@ghost ghost removed the in progress label Jul 26, 2018
@ghost ghost added the in progress label Jul 26, 2018
@stefannibrasil stefannibrasil force-pushed the profiles-endpoint-most-recently-people branch from 45ccc31 to 90603f5 Compare July 27, 2018 03:50
@stefannibrasil
Copy link

hi @jywarren could you review here, please? We worked today on improving the textSearch_profiles query. We have some questions, could you please take a look at the comments along the code?

We will work on the recentProfiles later after we finish this one. Also tomorrow we will write a test.

@jywarren
Copy link
Member

Great! First, looks like a missing or extra "end" in this file?

SyntaxError: /app/app/services/search_service.rb:251: syntax error, unexpected keyword_end

Reading now...


if search_criteria.order_by == "recent"
user_scope = user_scope.joins(:revisions)\
.order("timestamp #{search_criteria.sort_direction}")\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess i have a couple thoughts here. In general, I think it's best not to be adding activeRecord filters or conditions to an object that's had them added at an earlier level of abstraction (like, within SrchScope.find_users), because you can't see them when you're editing, so there's a strong possibility of doing things redundantly or in conflict. Better to do all your ActiveRecord stuff in one place as much as possible.

Second, I wonder if there's a way to reduce the # of levels of abstraction. Here it seems like we do:

  1. A call arrives at app/api/srch/search.rb for get :profiles do
  2. its sent to app/services/execute_search.rb for execute()
  3. that's sent to app/services/search_service.rb for textSearch_profiles()
  4. that's sent to app/services/srch_scope.rb for SrchScope.find_users()
  5. finally an ActiveRecord call is made: User.search()...

Each file is now much more readable, but tracing the whole thing is a bit harder. The readability improvements are super, and the files are cleaner and less redundant. Great work! Now, I wonder if we could try flattening the system without losing those advantages? One way to try (and we must admit it's possible every layer is important! But let's try.) is to list why each layer is needed.

  1. where incoming calls get routed
  2. utility 'execute' function
  3. back-end standard service available inside the app
  4. scope?
  5. do the ActiveRecord call

I'm not saying there is an obvious or easy answer here. And if we stick with 5 layers, that's fine! But this is how I'd recommend we look at the question of "what complexity is necessary" and how readable it is. Make sense? What do you think?

To your narrower question of the optimization, I think it makes sense that you're doing an initial call for users, then making a more complex version of it by joining revisions and then sorting using that new joined table. This looks great! Because the incoming query is already going to be pretty efficient, we think, and then we're just joining something and adding an ordering. How do you like the results it returns?

@stefannibrasil stefannibrasil force-pushed the profiles-endpoint-most-recently-people branch 2 times, most recently from 04a9b81 to d210f2d Compare July 29, 2018 01:04
@jywarren
Copy link
Member

jywarren commented Aug 9, 2018

The only difference is that the Typeahead returns results with different info(tagId instead of docId, which I still don't understand why)

As to this, I think it's not a meaningful difference, although originally i think the idea was that it would return some representation of a tag or topic, while docId would represent a page on the site. I think we should aim consolidate them to remove ambiguity...

@jywarren jywarren mentioned this pull request Aug 9, 2018
@jywarren
Copy link
Member

jywarren commented Aug 9, 2018

I'm going to open a new PR to get Travis to run against the with-bio version of User.search...

@stefannibrasil
Copy link

on travis it works only with bio... so you need to pass a username as the param to see the errors. For example, if you pass "jeff" it returns nil, but if you pass "something interesting about jeff" it returns something. At least that's how it was before

@stefannibrasil
Copy link

stefannibrasil commented Aug 9, 2018

okay, about the meeting, tomorrow we can all attend at 5:00 pm EET, so can we confirm your -virtual- presence? :)

@jywarren
Copy link
Member

jywarren commented Aug 9, 2018 via email

@jywarren
Copy link
Member

jywarren commented Aug 9, 2018

I think i have an idea about searching for "jeff"... hang on...did it happen only for "jeff" or also for "steff"?

Do you think you could try to reproduce the Travis error in a separate PR? Or was it exactly the error we're seeing here: #3224 ?

@jywarren
Copy link
Member

jywarren commented Aug 9, 2018

And was the error you saw in test/unit/user_test.rb or in /test/functional/search_api_test.rb#L95 like in my PR?

@stefannibrasil
Copy link

yes, it's Eastern European Time! So, tomorrow 5:30 EET. See you there!

@jywarren
Copy link
Member

jywarren commented Aug 9, 2018 via email

@stefannibrasil stefannibrasil force-pushed the profiles-endpoint-most-recently-people branch 4 times, most recently from d42a395 to 10441b4 Compare August 11, 2018 05:01
@stefannibrasil stefannibrasil force-pushed the profiles-endpoint-most-recently-people branch from 10441b4 to 42c5e32 Compare August 11, 2018 05:24
@jywarren
Copy link
Member

OK, I modified the test PR at #3224 to build on your latest commits from this PR, and see what the issue was with the nil results. But I see you've modified to go back to user.search searching bio and username... did that work for you? It's passing Travis, and we can do the "switch to only username searching via parameter" variant in a separate PR if we want -- let's just get this PR merged!

Can you give me an update and I am happy to check in on it once I land.. it'll be sometime tomorrow ✈️

If it's ready to go, I approve and can merge it!

@stefannibrasil
Copy link

stefannibrasil commented Aug 14, 2018

Hi, @jywarren thanks for the approval and for the help! Saturday we were able to make it work searching for username and bio!! You can try now with our last commit if you want to check it out too! We don't know exactly where things started going wrong. I guess the errors that Travis was showing confused us and we missed something small and everything became a snowball of errors...

Yeah, I agree that it's better to create a new PR for the autocomplete feature, will start now!

So... I can't believe I'm saying this, LOL, but yes this can be merged!! 🎉 thank you!!

@@ -42,7 +43,7 @@ class User < ActiveRecord::Base
after_destroy :destroy_drupal_user

def self.search(query)
User.where('MATCH(username, bio) AGAINST(?)', query)
User.where('MATCH(bio, username) AGAINST(? IN BOOLEAN MODE)', query + '*')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jywarren gold lesson from this PR about the MySQL fulltext search: the 'IN BOOLEAN MODE' modifier with the * wildcard.

https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html

really glad that this is working now, thanks for the help!

@jywarren jywarren merged commit 17827d1 into publiclab:master Aug 14, 2018
@ghost ghost removed the ready label Aug 14, 2018
@jywarren
Copy link
Member

Congratulations!!! 👍🏼⚡⚡

@milaaraujo milaaraujo deleted the profiles-endpoint-most-recently-people branch October 31, 2018 04:18
SrinandanPai pushed a commit to SrinandanPai/plots2 that referenced this pull request May 5, 2019
* Version 1: implementing new version of profiles endpoint and fixing recentPeople endpoint

* Code climate

* Code climate

* Code climate [2]

* Code climate [3]

* Code climate [4]

* Refactoring getRecentProfiles method

* fixing order_by X sort_by

* Trying to fix travis error

* trying to fix travis again

* using find_users

* Fix User.search query

* Fixing conflict

* Giving better names for the methods

* Hacking travis
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Show most recently updated people on Search API
4 participants