Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mongoengine killed mongodb performance when used with pymongo 3.x #1446

Closed
anih opened this issue Dec 27, 2016 · 8 comments · Fixed by #2702
Closed

mongoengine killed mongodb performance when used with pymongo 3.x #1446

anih opened this issue Dec 27, 2016 · 8 comments · Fixed by #2702

Comments

@anih
Copy link
Contributor

anih commented Dec 27, 2016

The root problem is that mongoengine switched from pymongo ensure_index to create_index method inside mongoengine ensure_indexes.
ensure_index had simple caching mechanizm but create_index doesn't have it.

This switch plus that ensure_indexes is called on each doc.save() results in that mongodb need to handle with createIndex which is quite heavy operation and in our case this results in 50% drop in performance.

ensure_index was added here #812 because there was problem with test and IMHO it was a workaround. Better solution could be to make proxy method drop_database similar to drop_collection which would reset cls._collection on Document.

Later I will prepare pull request with such change.

@anih
Copy link
Contributor Author

anih commented Dec 27, 2016

e6da9c27-28e8-4f6c-4439-afc1000ddaea
This is a chart from monitoring

@wojcikstefan
Copy link
Member

wojcikstefan commented Dec 27, 2016

Thanks for a thorough report @anih! I'm looking forward to a PR. IMHO, the entire index-ensuring logic is troubling right now and - while convenient during development - it kills production systems. I touched on it briefly in the comments on #357

@anih
Copy link
Contributor Author

anih commented Dec 29, 2016

#1457 First draft of changes, unfortunately I went to far and did change to switching db's and collections, but previous approach wasn't thread safe and was quite dirty. Test should pass but I didn't write any new as I prefer to get feedback if changes are going in right direction.

@boazin
Copy link

boazin commented Apr 4, 2017

Any news about this? Any workaround?

@wojcikstefan
Copy link
Member

Full history of the issue: Previously, ensure_indexes could be called many times over because PyMongo maintained a local cache of the indexes with a TTL of 5 minutes (see their v2.8 docs). Then, that method was deprecated and instead you could use create_index with a cache_for param (v2.9 docs). Finally, in v3.0 the cache_for param was removed (https://jira.mongodb.org/browse/PYTHON-861). As that issue said:

The difference between ensure_index and create_index is that ensure_index consults an index "cache" before sending a create index operation to the server. This causes hard to debug race conditions when dropping and immediately re-creating an index, and provides no real benefits. To avoid these problems we're deprecating the method. Use create_index instead.

Most likely the best way to fix this issue is to implement some of the ideas mentioned in #357.

@rrmerugu
Copy link

rrmerugu commented Oct 6, 2017

Im also facing the same issue (based on . my understanding)

Im using mongoengine==0.13.0. i have a collection that 28million documents and everytime, a new document is created mongodb is reindexing the entire data.

Based on the db.currenOp its showing this message

            "query" : {
                "createIndexes" : "keyword",
                "indexes" : [ 
                    {
                        "unique" : true,
                        "background" : false,
                        "sparse" : false,
                        "key" : {
                            "text" : 1
                        },
                        "name" : "text_1"
                    }
                ],
                "writeConcern" : {}
            },
            "msg" : "Index Build Index Build: 26834459/28427263 94%",
            "progress" : {
                "done" : 26834459,
                "total" : 28427263
            },

my model is

class Article(Document):
      text = StringField(required=True, unique=True)

# im not using any indexing in meta 

How can I avoid re-indexing the entire data on every save(). This is actually block me from reading the database. I dont want to remove unique=True though.. any thoughts !!

@bagerard
Copy link
Collaborator

Issue is quite old but I'm trying to understand the impact. Mongo index documentation mentions that

Recreating an Existing Index
If you call db.collection.createIndex() for an index that already exists, MongoDB does not recreate the index.

So unless you modify the indexes, the subsequent calls to create_indexes aren't actually re-creating the indexes. That leaves us with the overhead of a few python calls (involved in the ensure_indexes dance), the create_index operation and associated round trip to the database server which should be a few milliseconds. A quick test with/without enabling auto_create_index gives me a factor of 2 to 3 performance boost (inserting 10.000 documents on a empty collection, ~ 4 vs 13 seconds).

Long story short, it is still valuable to improve this but it looks like skipping the call to create_indexes on every .save() will only be noticeable if this gets called thousands of times

@victorct-pronto
Copy link

Any updates on this? I'm not seeing a big performance hit on my database, but I'm seeing at leas 15ms added to every request that creates a document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants