Mongodb: when to call ensureIndex?

When should I call ensureIndex? Before inserting a single record, after inserting a single record, or before calling find()?




It seems my comment has been a little misunderstood, so I'll clarify. It doesn't really matter when you call it so long as it's called at some point before you call find() for the first time. In other words, it doesn't really matter when you create the index, as long as it's there before you expect to use it.

A common pattern that I've seen a lot is coding the ensureIndex at the same time (and in the same place) as the find() call. ensureIndex will check if the index exists and create it if it doesn't. There is undoubted some overhead (albeit very small) in calling ensureindex before ever call to find() so it's preferable not to do this.

I do call ensureIndex in code to simplify deployments and to avoid having to manage the db and codebase separately. The tradeoff of ease of deployment balances out the redundancy of subsequent calls to ensureIndex (for me.)

I'd recommend calling ensureIndex once, when your application starts.

It doesn't matter, but you only have to do this once. If you want to batch insert a large amount of data to an empty collection then it is best to create the index after the inserts but otherwise it doesn't really matter.

You only need to do this once. Example:

db.table.insert({foo: 'bar'});
var foo = db.table.findOne({foo: 'bar'}); // => delivered from FS, not RAM
db.table.ensureIndex({foo: 1});
var foo = db.table.findOne({foo: 'bar'}); // => delivered from RAM, not FS
db.table.insert({foo: 'foo'});
var foo = db.table.findOne({foo: 'foo'}); // => delivered from RAM, not FS

If you add an index before hand, every insert/update/delete call has to modify each index also. So, from an optimization stand point, you probably want to put it off as long as possible before issuing queries. However, from a functional stand point, it doesn't matter.

I typically put my ensureIndex() calls within an init block for the part of my application that manages communication with MongoDB. Also, I wrap those ensureIndex() calls within a check for existence of a collection I know must exist for the application to function; this way, the ensureIndex() calls are only ever called once, ever, the first time the application is run against a specific MongoDB instance.

I've read elsewhere an opinion against putting ensureIndex() calls in application code, as other developers can mistakenly change them and alter the DB (the indexes), but wrapping it in a check for a collection's existence helps to guard against this.

Java MongoDB driver example:

DB db = mongo.getDB("databaseName");
Set<String> existingCollectionNames = db.getCollectionNames();

// init collections; ensureIndexes only if creating collection
// (let application set up the db if it's not already)
DBCollection coll = db.getCollection("collectionName");
if (!existingCollectionNames.contains("collectionName")) {
// ensure indexes...
coll.ensureIndex(BasicDBObjectBuilder.start().add("date", 1).get());
    // ...

If you have a collection that have millions of records and you are building multiple compound indices with auto-indexing turned off then you MUST ensure that you are invoking ensureIndexes() much before your first find query, possibly synchronously i.e. after ensureIndexes method returns.

The mode(foreground vs background) in which indexes are build adds extra complexity. Foreground mode locks the complete db while it is building the indexes whereas background mode allows you to query the db. However background mode of index building takes extra time.

So you must make sure that indexes have been created successfully. You can use db.currentOp() to check progress of ensureIndexes() while it is still creating indexes.

Need Your Help