-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve peopleLocations endpoint: people_locations method #3217
Improve peopleLocations endpoint: people_locations method #3217
Conversation
Generated by 🚫 Danger |
This looks great. I'd love a review from one other person maybe? @publiclab/reviewers |
app/services/search_service.rb
Outdated
user_scope = User.where('rusers.status <> 0')\ | ||
.joins(:user_tags)\ | ||
.where('value LIKE "lat:%"')\ | ||
.joins(:revisions)\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a bit heavy if all you're wanting is the timestamp. I wonder if you could select just the timestamp, but you might have to do .select('node_revisions.timestamp',
plus every other column (Not sure if you can wildcard *
multiple columns...) so then the way to get past that could be to get the absolute minimum number of columns with a select()
, collect the nid
s with nids = query.collect(&:nid)
and then run a second query, like:
User.where(nid: nids)
This way the original query is very narrowly restricted, then a second query collects the full User records. This is a bit messy looking but you can see how it might result in a more efficient initial query followed by a narrower second query which gets all the User
table bits and pieces you're looking for, without the messy joins. Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I don't know if it makes sense! 😝
Actually, under rails, the query being executed is: SELECT DISTINCT "rusers".* FROM "rusers" INNER JOIN "user_tags" ON "user_tags"."uid" = "rusers"."id" INNER JOIN "node" ON "node"."uid" = "rusers"."id" INNER JOIN "node_revisions" ON "node_revisions"."nid" = "node"."nid" WHERE (rusers.status <> 0) AND (value LIKE "lat:%") ORDER BY node_revisions.timestamp DESC LIMIT ?
. We are only selecting the data from the user table, which is what we need at the end of the day, and we use the timestamp just to sort the results. Of course, I do not know exactly how mysql runs this query, but it is unlikely that it actually returns the entire secondary table to only then select what the query explicitly asks for...
Hmm, well I guess I meant not the whole table meaning all the rows, but I
meant every column, and the revisions table has some bigger columns like
the node body content. So I think it would be a relatively big amount of
data to return. Maybe this optimization is more than we need, and I'm just
being paranoid...
@namangupta01 what do you think, you were recently thinking about a query
optimization?
…On Thu, Aug 9, 2018, 1:08 AM Camila Araújo ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In app/services/search_service.rb
<#3217 (comment)>:
> sresult = DocList.new
- nodes = Node.all.order("changed DESC").limit(srchString).distinct
- users = []
- nodes.each do |node|
- if node.author.status != 0
- if tagName.blank?
- users << node.author.user
- else
- users << node.author.user if node.author.user.has_tag(tagName)
- end
- end
+ user_scope = User.where('rusers.status <> 0')\
+ .joins(:user_tags)\
+ .where('value LIKE "lat:%"')\
+ .joins(:revisions)\
To be honest I don't know if it makes sense! 😝
Actually, under rails, the query being executed is: SELECT DISTINCT
"rusers".* FROM "rusers" INNER JOIN "user_tags" ON "user_tags"."uid" =
"rusers"."id" INNER JOIN "node" ON "node"."uid" = "rusers"."id" INNER JOIN
"node_revisions" ON "node_revisions"."nid" = "node"."nid" WHERE
(rusers.status <> 0) AND (value LIKE "lat:%") ORDER BY
node_revisions.timestamp DESC LIMIT ?. We are only selecting the data
from the user table, which is what we need at the end of the day, and we
use the timestamp just to sort the results. Of course, I do not know
exactly how mysql runs this query, but it is unlikely that it actually
returns the entire secondary table to only then select what the query
explicitly asks for...
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#3217 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AABfJ6LlqpcIAYfU6Xn_f2YqmLFXsEQWks5uO8OjgaJpZM4VzLMP>
.
|
But what I'm saying is that we are not returning every column of the revisions table. We are returning -> However, @namangupta01 what do you think, you were recently thinking about a query optimization? [2] 😄 |
I think I'm at the limit of my knowledge of mysql too! I thought that joining essentially adds new columns for each of the joined tables columns. But I'm trying to test this to see what the raw msyql result is, and whether we're getting revision body content in the results... hang on a sec and apologies! |
Hmm: > user_scope = User.where('rusers.status <> 0').joins(:user_tags).where('value LIKE "lat:%"').joins(:revisions).order("node_revisions.timestamp DESC")
# ActiveRecord::ConfigurationError: Can't join 'User' to association named 'revisions'; perhaps you misspelled it? |
Oh wait, duh - |
I see the query as: User Load (17.8ms) SELECT `rusers`.* FROM `rusers` INNER JOIN `user_tags` ON `user_tags`.`uid` = `rusers`.`id` INNER JOIN `node` ON `node`.`uid` = `rusers`.`id` INNER JOIN `node_revisions` ON `node_revisions`.`nid` = `node`.`nid` WHERE (rusers.status <> 0) AND (value LIKE "lat:%") ORDER BY node_revisions.timestamp DESC LIMIT 11 let me think... |
OK, full SQL output (just running the generated query in the command-line +----+----------+-------------------+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+-------------+--------------------+---------------------+------------------+---------------+------------------+---------------+---------------------+---------------------+-------------------+-------+-----------+-----------------+--------------------+-----------------+------+--------------+--------+
| id | username | email | crypted_password | password_salt | persistence_token | login_count | failed_login_count | last_request_at | current_login_at | last_login_at | current_login_ip | last_login_ip | created_at | updated_at | openid_identifier | role | reset_key | photo_file_name | photo_content_type | photo_file_size | bio | token | status |
+----+----------+-------------------+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+-------------+--------------------+---------------------+------------------+---------------+------------------+---------------+---------------------+---------------------+-------------------+-------+-----------+-----------------+--------------------+-----------------+------+--------------+--------+
| 1 | Bob | bob@publiclab.org | 4d7b3086073fdc9d7647f598d9e036e0e6e210b680bf6ea402631ae28afc1a92d8c9640923ed9d8d738c341b264ae147aa06bbf4024fc319ab76554fbd3d0b52 | a517e2d46393a2a1ae1fd29cc5153abee965973ed572f72bbeca977d5b81be51fdbb8a115b6d4ace6c6cefa05fecd2f5dbf57669092613cfa734b4b41c66ee82 | 0f4d6908dc76f59302b57fd35b77661a2938d7481727f611086e1ab347669337bb122e32b4b3f905f1547fb29725c10cf7133455cf8f2492888e67e64187b2f0 | 0 | 0 | 2018-01-26 01:24:55 | NULL | NULL | NULL | NULL | 2018-01-26 01:24:55 | 2018-01-26 01:24:55 | NULL | basic | NULL | NULL | NULL | NULL | | abcdefg12345 | 1 |
+----+----------+-------------------+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+-------------+--------------------+---------------------+------------------+---------------+------------------+---------------+---------------------+---------------------+-------------------+-------+-----------+-----------------+--------------------+-----------------+------+--------------+--------+
1 row in set (0.00 sec) |
Oddly, it seems different when I do this: User.where('rusers.status <> 0').joins(:user_tags).where('value LIKE "lat:%"').joins(:revisions).order("node_revisions.timestamp DESC").select('rusers.*, node_revisions.timestamp').limit(2) User Load (12.4ms) SELECT rusers.*, node_revisions.timestamp FROM `rusers` INNER JOIN `user_tags` ON `user_tags`.`uid` = `rusers`.`id` INNER JOIN `node` ON `node`.`uid` = `rusers`.`id` INNER JOIN `node_revisions` ON `node_revisions`.`nid` = `node`.`nid` WHERE (rusers.status <> 0) AND (value LIKE "lat:%") ORDER BY node_revisions.timestamp DESC LIMIT 2 That outputs: +----+----------+-------------------+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+-------------+--------------------+---------------------+------------------+---------------+------------------+---------------+---------------------+---------------------+-------------------+-------+-----------+-----------------+--------------------+-----------------+------+--------------+--------+------------+
| id | username | email | crypted_password | password_salt | persistence_token | login_count | failed_login_count | last_request_at | current_login_at | last_login_at | current_login_ip | last_login_ip | created_at | updated_at | openid_identifier | role | reset_key | photo_file_name | photo_content_type | photo_file_size | bio | token | status | timestamp |
+----+----------+-------------------+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+-------------+--------------------+---------------------+------------------+---------------+------------------+---------------+---------------------+---------------------+-------------------+-------+-----------+-----------------+--------------------+-----------------+------+--------------+--------+------------+
| 1 | Bob | bob@publiclab.org | 4d7b3086073fdc9d7647f598d9e036e0e6e210b680bf6ea402631ae28afc1a92d8c9640923ed9d8d738c341b264ae147aa06bbf4024fc319ab76554fbd3d0b52 | a517e2d46393a2a1ae1fd29cc5153abee965973ed572f72bbeca977d5b81be51fdbb8a115b6d4ace6c6cefa05fecd2f5dbf57669092613cfa734b4b41c66ee82 | 0f4d6908dc76f59302b57fd35b77661a2938d7481727f611086e1ab347669337bb122e32b4b3f905f1547fb29725c10cf7133455cf8f2492888e67e64187b2f0 | 0 | 0 | 2018-01-26 01:24:55 | NULL | NULL | NULL | NULL | 2018-01-26 01:24:55 | 2018-01-26 01:24:55 | NULL | basic | NULL | NULL | NULL | NULL | | abcdefg12345 | 1 | 1522444233 |
| 1 | Bob | bob@publiclab.org | 4d7b3086073fdc9d7647f598d9e036e0e6e210b680bf6ea402631ae28afc1a92d8c9640923ed9d8d738c341b264ae147aa06bbf4024fc319ab76554fbd3d0b52 | a517e2d46393a2a1ae1fd29cc5153abee965973ed572f72bbeca977d5b81be51fdbb8a115b6d4ace6c6cefa05fecd2f5dbf57669092613cfa734b4b41c66ee82 | 0f4d6908dc76f59302b57fd35b77661a2938d7481727f611086e1ab347669337bb122e32b4b3f905f1547fb29725c10cf7133455cf8f2492888e67e64187b2f0 | 0 | 0 | 2018-01-26 01:24:55 | NULL | NULL | NULL | NULL | 2018-01-26 01:24:55 | 2018-01-26 01:24:55 | NULL | basic | NULL | NULL | NULL | NULL | | abcdefg12345 | 1 | 1522444233 |
+----+----------+-------------------+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+-------------+--------------------+---------------------+------------------+---------------+------------------+---------------+---------------------+---------------------+-------------------+-------+-----------+-----------------+--------------------+-----------------+------+--------------+--------+------------+ |
hmm, things get stranger but maybe clearer?...
|
Hmm, so I think i'm now stuck. In response to the error in the last one, i think it's saying we need to add |
I guess I suspect the original version without the extra 2.4.1 :022 > User.where('rusers.status <> 0').joins(:user_tags).where('value LIKE "lat:%"').joins(:revisions).order("node_revisions.timestamp DESC").limit(2).distinct User Load (2.3ms) SELECT DISTINCT `rusers`.* FROM `rusers` INNER JOIN `user_tags` ON `user_tags`.`uid` = `rusers`.`id` INNER JOIN `node` ON `node`.`uid` = `rusers`.`id` INNER JOIN `node_revisions` ON `node_revisions`.`nid` = `node`.`nid` WHERE (rusers.status <> 0) AND (value LIKE "lat:%") ORDER BY node_revisions.timestamp DESC LIMIT 2
ActiveRecord::StatementInvalid: Mysql2::Error: Expression #1 of ORDER BY clause is not in SELECT list, references column 'plots.node_revisions.timestamp' which is not in SELECT list; this is incompatible with DISTINCT: SELECT DISTINCT `rusers`.* FROM `rusers` INNER JOIN `user_tags` ON `user_tags`.`uid` = `rusers`.`id` INNER JOIN `node` ON `node`.`uid` = `rusers`.`id` INNER JOIN `node_revisions` ON `node_revisions`.`nid` = `node`.`nid` WHERE (rusers.status <> 0) AND (value LIKE "lat:%") ORDER BY node_revisions.timestamp DESC LIMIT 2 |
This is hard!!! 😭 |
OK, we're also affected by this: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html I'm going to take a short break here, probs a good idea! |
OK, thinking thinking. Although we could potentially solve this, our intent is to return people who are recently active. Thinking outside the box, what if we simply sorted by |
I checked here, we are using this version of mariadb:
|
I think the last commit will solve the problem for mysql! 🤞 But Travis is failing with this new weird error:
😩 |
oh exciting... i'll try it out myself too -- let me see about the travis
error maybe we can restart it...
…On Fri, Aug 10, 2018 at 1:29 PM Camila Araújo ***@***.***> wrote:
I think the last commit will solve the problem for mysql! 🤞
But Travis is failing with this new weird error:
ERROR: Service 'web' failed to build: The command '/bin/sh -c bundle
install --jobs 4' returned a non-zero code: 5
😩
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#3217 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABfJ6tsyDD_0XAuONI2eqbYOYGlgZQ4ks5uPcLdgaJpZM4VzLMP>
.
|
Looks like a timeout issue, maybe travis servers lost internet connection for a bit. Restarting!
|
My last commit didn't work on Travis, so I'm removing the changes. |
67bd2e5
to
9aa3ef5
Compare
Hey Jeff, now I'm running mysql locally and I could reproduce the error you were getting. That fix I tried early really solved the problem for mysql... But it failed on Travis. 1 - This query works for mariaDB, Sqlite and Travis:
2 - This one works for mysql:
3 - So I'm trying now this logic (last commit). It works for mysql and sqlite, but failed on Travis:
The conclusion today is that it is not possible to make everyone happy! My question now is which database does Travis use and which database is used in production? If a code passes without errors by Travis, does that mean it will run smoothly in production? If this is true, can we leave the version 2 that works on Travis? |
Hey, @jywarren! More insights here about this... I'm using mysql to run the site locally and a lot of things (not related to our work) are broken, for example when I try to create a new question: So I don't think we are using mysql in production, right? I think it makes more sense to keep the version that works on Travis, what do you think? |
Just a note that we are using Mariadb 10.2 in production. Not sure why you are seeing this issue! Thanks for your hard work! |
Hey @icarito! I was testing locally with mysql, because Jeff pointed out he was seeing some errors in my query using mysql... But then I discovered that other things also break when I test with mysql. Now I'm focusing on leaving the code error-free testing with MariaDB (which is what's in production). My last commit seems to be working okay! Now that Jeff is on vacation, could you please review it for me? 😄 |
Thanks this is really thorough and good, I'm planning on doing a more
thorough review later today before finally taking off. I haven't forgotten!
😄
…On Sun, Aug 12, 2018, 2:14 PM Camila Araújo ***@***.***> wrote:
Hey @icarito <https://github.com/icarito>! I was testing locally with
mysql, because Jeff pointed out he was seeing some errors in my query using
mysql... But then I discovered that other things also break when I test
with mysql.
Now I'm focusing on leaving the code error-free testing with MariaDB
(which is what's in production). My last commit seems to be working okay!
Now that Jeff is on vacation, could you please review it for me? 😄
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3217 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABfJ-SdO1fU1Buhx2F_fvLLpVVR3y_uks5uQHBUgaJpZM4VzLMP>
.
|
OK, going through this now. First, on the While Does In general it's nice to use generic-enough SQL features that it'll work in all similar database systems. But we're already pushing this a bit since we use full-text search, for example. So if Thanks for looking through this so carefully! A very evidence-driven decision making process. If you try Thanks!!! |
(also just want to note this error here for future reference; e4a72e5 produces the following): - [ ] There was a test error at: Failure: test_search_recent_people_functionality_having_specified_tagName(Minitest::Result) [/usr/local/bundle/gems/mysql2-0.5.2/lib/mysql2/client.rb:131]: ActiveRecord::StatementInvalid: Mysql2::Error: Operand should contain 1 column(s): SELECT 'rusers'.* FROM 'rusers' INNER JOIN 'user_tags' ON 'user_tags'.'uid' = 'rusers'.'id' WHERE (user_tags.value LIKE 'tool:barometer') AND ('rusers'.'id') IN (SELECT DISTINCT rusers.*,node_revisions.timestamp, rusers.id FROM 'rusers' INNER JOIN 'user_tags' ON 'user_tags'.'uid' = 'rusers'.'id' INNER JOIN 'node' ON 'node'.'uid' = 'rusers'.'id' INNER JOIN 'node_revisions' ON 'node_revisions'.'nid' = 'node'.'nid' WHERE (rusers.status <> 0) AND (value LIKE "lat:%") ORDER BY node_revisions.timestamp DESC) LIMIT 100 app/services/search_service.rb:178:in 'people_locations' app/services/execute_search.rb:23:in 'execute' app/services/execute_search.rb:3:in 'by' app/api/srch/search.rb:112:in 'execute' app/api/srch/search.rb:100:in 'block (2 levels) in <class:Search>' <a href='https://github.com/milaaraujo/plots2/tree/improve-peopleLocations-endpoint/test/functional/search_api_test.rb#L214'>test/functional/search_api_test.rb:214</a>:in 'block in <class:SearchApiTest>' |
Hey @jywarren! I know that the
The
So the effective response from the database is not exactly the same. If you don't think this is a problem, I can certainly send a new commit (using |
And I see you are worried about the support for mysql. But I don't think at this point, considering the code we have in production now, we really don't support mysql. I run |
We'll, if you want to try select() with includes, we can try narrowing, but
otherwise I think the includes() version will be fine. Thanks!
…On Tue, Aug 14, 2018, 12:41 AM Camila Araújo ***@***.***> wrote:
And I see you are worried about the support for mysql. But I don't think
at this point, considering the code we have in production now, we really
support mysql. I run rake test yesterday in the master branch, using
mysql, and it pointed out 43 errors (or something close to this, I can't
remember)! Did you try to do this before? Or maybe I'm using a wrong mysql
version...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3217 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABfJ6etqcMJ2aJdte30OBTsa7QOWH0Pks5uQfJ5gaJpZM4VzLMP>
.
|
@jywarren, done. |
Fantastic! |
…3217) * version 1 * Version 2 * changing mothod name: recentPeople -> people_locations * Adding distinct * trying to fix mysql error * code climate fixes * removing mysql fixes (it didn't work) * ActiveRecord::Base.connection.adapter_name == 'Mysql2' * Version that works on Travis * using 'includes()'
Hey everybody!
We are opening this request to fix the issue #3196. Here we implemented a new version of
recentPeople
method.Now we first select the users that have lat/lon and then we join the users table with the revisions table to order them by most recent activities. We also change the name of the method from
recentPeople
topeople_locations
.We would like some feedback, thanks! :)
fixes #3196
@publiclab/reviewers @publiclab/soc @jywarren