Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamped index management #357

Open
wojcikstefan opened this issue Jun 4, 2013 · 12 comments
Open

Revamped index management #357

wojcikstefan opened this issue Jun 4, 2013 · 12 comments

Comments

@wojcikstefan
Copy link
Member

wojcikstefan commented Jun 4, 2013

Hi @rozza!

Just wanted to pick your brain about the default index creation. Seems like non-background indexes are only useful at the very early stage of a project. When any of the collections grow fairly large (say above 1M documents), createIndex blocks the database for a significant amount of time.

It's a bit annoying to add {'index_background': True} in each Document's meta and sometimes you're just not able to control it that way. E.g. changing allow_inheritance will create a new non-background index (although I may be wrong whether _cls-related indexes respect meta's index_background or not).

I'd argue that { background: true } should be a default option in MongoEngine. What do you think?

@rozza
Copy link
Contributor

rozza commented Jun 5, 2013

Hi @wojcikstefan I'm going to ponder this one!

I like it conceptually, however I'm not sold on have auto index generation as you say it can take time and maybe undesirable in production. Maybe one to review for 0.9.

@wojcikstefan
Copy link
Member Author

wojcikstefan commented Jun 5, 2013

By "auto index generation" I meant the current status quo, i.e. MongoEngine ensures all of the defined indexes are actually created in the db at runtime. If you add something to meta = {'indexes': [...]}, it basically means the new index will be automatically created (and lock the whole db for a while, if the collection is fairly large).

@xqliang
Copy link

xqliang commented Dec 13, 2016

+1

@wojcikstefan
Copy link
Member Author

wojcikstefan commented Dec 13, 2016

I have one more request related to indexes - there should be an easy way to globally disable automatic index creation for all documents. While it's a useful feature during development, it's dangerous for production environments... You want full control over when and what index is created (and how).

It's also better to decouple index definitions from application logic if you have a reasonably big engineering team, because it allows indexes to be managed by a separate DevOps team.

@mclate
Copy link

mclate commented Mar 7, 2017

Wanted to bring this one back again as we also have just experienced pretty significant downtime due to automated foreground index creation on production. Just as @wojcikstefan said, at some point it is desirable to disable automated index creation completely and let devops control this.
Maybe add this as an option to a connection definition (Among with one for background indexes)?

@wojcikstefan
Copy link
Member Author

Few requirements for the updated index management:

  • By default we shouldn't create any indexes. For any serious library, safe defaults are much more important than fastest setup possible. This would also allow people to handle indexes outside of MongoEngine, which might be desirable.
  • It should be easy to enable automatic background or foreground indexing in the dev environment for smoother development.
  • We should support an intuitive way to set up all kinds of indexes (sparse, geo, text, unique, partial, TTL, etc.).
  • There should be a way to "diff" the MongoEngine index spec vs the state of the database, so that you can easily learn e.g. if your production database has some gratuitous indexes or if any indexes defined in the code are missing from the db.
  • Given how crucial correct indexes are to performance, their definition should be as explicit as possible, leaving no room for confusion. E.g. right now the way we handle _cls is quite implicit. We should scrutinize that logic and see if there's room for more clarity (even if through better documentation).

Anything else y'all might think of @lafrech @thedrow @mclate @gukoff ?

@wojcikstefan

This comment has been minimized.

@jtharpla
Copy link
Contributor

jtharpla commented Mar 7, 2017

Speaking from experience, I'd definitely prefer no automated index creation as a sane default intended for production with the option to do automated background index creation if desired. Automated foreground index creation should be clearly documented as intended for development environments only.

@wojcikstefan wojcikstefan changed the title Indexes created in the background by default Revamped index management Apr 30, 2017
@mindojo-victor
Copy link
Contributor

@wojcikstefan I agree with your suggestions. What about also removing field arguments related to indexes, so that all index info would be in one place -- _meta (not sure how to deal with primary)?

@wojcikstefan
Copy link
Member Author

not sure how to deal with primary

I think the _id: 1 index is the the only one that can be created implicitly, i.e. we don't need to explicitly define it in meta['indexes']. That's because we can't control it in any way or stop its creation. MongoDB docs say: "MongoDB creates a unique index on the _id field during the creation of a collection. (...) You cannot drop this index on the _id field."

What about also removing field arguments related to indexes, so that all index info would be in one place -- _meta

On the one hand, I agree. The more scattered index creation is, the harder it is to grasp what indexes are defined in your code (which is particularly dangerous if indexes are auto-created).

On the other hand, people might be used to index-related arguments on fields. For example, latest Django still offers Field options like db_index and unique.

Is dropping field options like unique, unique_with, and sparse, and forcing users to instead define such indexes in the meta dict too drastic of a change? It seems to me like it would make index definitions clearer and more centralized, but lmk what you think @bagerard, @erdenezul, @thomasst, @mpessas, @tsx.

@mpessas
Copy link

mpessas commented Jun 24, 2019

Is dropping field options like unique, unique_with, and sparse, and forcing users to instead define such indexes in the meta dict too drastic of a change? It seems to me like it would make index definitions clearer and more centralized

I agree it would make index definitions more explicit, but it is against what most people coming to mongoengine from other "ORM"s would expect. That said, as long as it is fine breaking backwards compatibility, dropping those field options sounds good to me.

@wojcikstefan
Copy link
Member Author

The more I think about it, the more I like the idea of defining all of the indexes in meta['indexes']. This makes scanning for indexes in the code easier and follows The Zen of Python in that "There should be one-- and preferably only one --obvious way to do it."

This is similar to defining all of your indexes via Meta.indexes in Django or via the Index construct in SQLAlchemy (and forgoing index-related field/column kwargs).

That said, as long as it is fine breaking backwards compatibility, dropping those field options sounds good to me.

I don't mind introducing a breaking change if it's done alongside a major version bump. Perhaps we could get to MongoEngine v1.0.0 after this and some performance/functional/cleanup improvements...

@bagerard what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants