Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow slug generation queries with latest BSON and mongo driver #235

Closed
mzikherman opened this issue Jan 30, 2017 · 3 comments · Fixed by #238
Closed

Very slow slug generation queries with latest BSON and mongo driver #235

mzikherman opened this issue Jan 30, 2017 · 3 comments · Fixed by #238
Labels

Comments

@mzikherman
Copy link
Collaborator

mzikherman commented Jan 30, 2017

Updating the mongo driver to 2.4 from 2.1 ( and bson gem to 4.2 to 3.2 coming from the mongo driver update), caused some major performance issues in our database.

Specifically, I was able to reproduce a query that's kicked off in order to generate a unique slug, no longer being able to efficiently use an index (even though the regular expression is prefixed, which according to the docs should be able to use an index).

Here's some explain output:

Before upgrade:

{"queryPlanner"=>
  {"plannerVersion"=>1,
   "namespace"=>"gravity_development.artworks",
   "indexFilterSet"=>false,
   "parsedQuery"=>
    {"_slugs"=>
      #<BSON::Regexp::Raw:0x007ff6f0fa4cd8
       @options=0,
       @pattern="^alex\\-reisfar(?:-(\\d+))?$">},
   "winningPlan"=>
    {"stage"=>"KEEP_MUTATIONS",
     "inputStage"=>
      {"stage"=>"FETCH",
       "filter"=>
        {"_slugs"=>
          #<BSON::Regexp::Raw:0x007ff6f0f96de0
           @options=0,
           @pattern="^alex\\-reisfar(?:-(\\d+))?$">},
       "inputStage"=>
        {"stage"=>"IXSCAN",
         "keyPattern"=>{"_slugs"=>1},
         "indexName"=>"_slugs_1",
         "isMultiKey"=>true,
         "isUnique"=>true,
         "isSparse"=>false,
         "isPartial"=>false,
         "indexVersion"=>1,
         "direction"=>"forward",
         "indexBounds"=>
          {"_slugs"=>
            ["[\"alex-reisfar\", \"alex-reisfas\")",
             "[/^alex\\-reisfar(?:-(\\d+))?$/, /^alex\\-reisfar(?:-(\\d+))?$/]"]}}}},
   "rejectedPlans"=>[]},
 "executionStats"=>
  {"executionSuccess"=>true,
   "nReturned"=>1,
   "executionTimeMillis"=>0,
   "totalKeysExamined"=>2,
   "totalDocsExamined"=>1,

Notice that the index is being used, and totalDocsExamined and totalKeysExamined is tiny, as expected.

After the upgrade:

{"plannerVersion"=>1,
   "namespace"=>"gravity_development.artworks",
   "indexFilterSet"=>false,
   "parsedQuery"=>
    {"_slugs"=>
      #<BSON::Regexp::Raw:0x007fee695c4fe0
       @options="m",
       @pattern="^alex\\-reisfar(?:-(\\d+))?$">},
   "winningPlan"=>
    {"stage"=>"KEEP_MUTATIONS",
     "inputStage"=>
      {"stage"=>"FETCH",
       "filter"=>
        {"_slugs"=>
          #<BSON::Regexp::Raw:0x007fee695c44f0
           @options="m",
           @pattern="^alex\\-reisfar(?:-(\\d+))?$">},
       "inputStage"=>
        {"stage"=>"IXSCAN",
         "keyPattern"=>{"_slugs"=>1},
         "indexName"=>"_slugs_1",
         "isMultiKey"=>true,
         "isUnique"=>true,
         "isSparse"=>false,
         "isPartial"=>false,
         "indexVersion"=>1,
         "direction"=>"forward",
         "indexBounds"=>
          {"_slugs"=>
            ["[\"\", {})",
             "[/^alex\\-reisfar(?:-(\\d+))?$/m, /^alex\\-reisfar(?:-(\\d+))?$/m]"]}}}},
   "rejectedPlans"=>[]},
 "executionStats"=>
  {"executionSuccess"=>true,
   "nReturned"=>1,
   "executionTimeMillis"=>29681,
   "totalKeysExamined"=>1280090,
   "totalDocsExamined"=>730625,

Notice the totalKeys and totalDocs has now exploded to the entire collection. It still says 'IXSCAN', so it is recognizing that there is an index to use, but it's not able to efficiently use it.

Now, you may see that the actual query that's being generated is slightly different in the two versions. Specifically, the 'multiline' option is now specified: (that is the m option in the query). That has been added in mongodb/bson-ruby@d42c6ec , which seems to implicitly add this option to all regular expressions.

I'm not 100% sure, but it does seem as though this option would make it impossible to efficiently use the index, since you lose any benefit from prefixing, as you'd still need to scan entire strings to find newlines, and thus not be able to use any range bounding from the index. Indeed, adding this option explicitly in the older versions of the gems which don't automatically add them recreates this 'bad' behavior.

So, I'm filing an issue in this repo since, essentially, queries kicked off by slug-related code here (and using latest versions of the mongo driver and one of its dependencies, the BSON gem) no longer efficiently use the _slugs index, and so are unsustainable for larger collections. Furthermore, I believe that the BSON gem upgrade with the multiline option always being added is what's making this broken.

@dblock
Copy link
Collaborator

dblock commented Jan 30, 2017

This makes sense. You should open an issue in bson-ruby against that commit as well, because it's clearly a performance regression.

Then, is there a way to write a test for this in mongoid-slug (eg. examining the query results and making sure it hit an index or the number of documents scanned was 1 when there're 2 records in the DB for example), and then way to modify the query and remove that option in mongoid-slug?

@dblock dblock added the bug? label Jan 30, 2017
@mzikherman
Copy link
Collaborator Author

This makes sense. You should open an issue in bson-ruby against that commit as well, because it's clearly a performance regression.

Will do, wasn't sure if I should but I'll file something there and see if anyone has something to add.

Then, is there a way to write a test for this in mongoid-slug (eg. examining the query results and making sure it hit an index or the number of documents scanned was 1 when there're 2 records in the DB for example), and then way to modify the query and remove that option in mongoid-slug?

I will investigate this, thanks!

@mzikherman
Copy link
Collaborator Author

As an FYI, I have a branch where I've switched to use Regexp::Raw over Ruby's Regexp to get around the multiline option performance hit.

The former is a class provided by BSON in order to get around this. see mongodb/bson-ruby#71 for that discussion.

I've just commented there mongodb/bson-ruby#71 (comment) . My branch still has 2 failing specs around embedded documents, and I was unsuccessful in debugging that (besides thinking that there might be a further Mongoid issue).

When I get that answered, I should be able to come back here and finish this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants