Asynchronous initialization of Node.js module

I'd like to initialize module in asynchronous way and come up with couple of ideas. I need DB object with list of collections from Mongo and other data, but list of files in ./ will do for brevity.

I can't export function or class because I need require('db') to return same object everytime.


First and simplest what came to my mind is to assign module.exports to Object and populate it later:

var exports = {};
module.exports = exports;

require('fs').readdir('.', function(err, files) {
  exports.error = err;
  exports.files = files;
});

Bad thing — I don't really know from outside when list is ready and no good way to check for errors.


Second way I've comed up with is to inherit EventEmitter and notify everyone that DB is ready or error occured. If everything ok - keep going.

var events = require('events');
var util = require('util');

function Db() {
  events.EventEmitter.call(this);
  this.ready = false;
  this.files = null;
  this.initialize();
}

util.inherits(Db, events.EventEmitter);

Db.prototype.initialize = function() {
  if (this.ready)
    return this.emit('ready');

  var self = this;
  require('fs').readdir('.', function(err, files) {
    if (err)
      return self.emit('error', err);

    self.files = files;
    self.ready = true;
    self.emit('ready');
  });
};

module.exports = new Db();

And now I think that's more reasonable:

// db.js
var exports = {init: init};
module.exports = exports;

function init(callback) {
  callback = (typeof callback === 'function') ? callback : function() {};
  require('fs').readdir('.', function(err, files) {
    delete exports.init;
    exports.result = files; // that's pretty much what I need,
                            // so don't mind result slightly differs
                            // from previous cases
    callback(err);
  });
}
// main.js
var db = require('./db');

// check for `db.init` presence maybe...

db.init(function(err) {
  return err ? console.error('Bad!')
             : console.log(db); // It works!
});

What should I pick and why? How bad is such idea in general and my options in particular?

Thanks for feedback.

Answers


TL;DR: Use readdirSync() instead of readdir() if you're just planning to read local files at startup time. If you're planning to actually read data from remote database or do any I/O at runtime, use your option #2 - the callback. Explanation and code examples below.

Detailed explanation:

While at first this might seem like a module/dependecy/require-related question, it's really not. It's a generic question of how to handle asynchronous code. Let me explain:

require() is basically the only synchronous function widely used throughout node that deals with I/O (it requires other modules from filesystem). Synchronous means it actually returns it's data as return value, instead of calling a callback.

The most basic 101 rule in asynchronous programming is:

You can never take an asynchronous piece of code and create a synchronous API for it.

require uses a special synchronous version of readFile called readFileSync. Since modules are really only loaded at the start of the program, the fact that it blocks the node.js execution while it's reading the module is not a problem.

In your example however, you try to perform additional asynchronous I/O - readdir() done during the require stage. Thus, you either need to use synchronous version of this command or the API needs to change...

So there's the background to your problem.

You identified the two basic options:

  1. using a promise (which is essentially the same as your EventEmitter example)
  2. using a callback (your second example shows this well) and a third is:
  3. using a synchronous version of the readdir() command called readdirSync()

I would use the option #3 for simplicity reason - but only if you're planning to just read a couple files at startup time as your example implies. If later your DB module is actually going to connect to a database - or if you're planning to do any of this at runtime, jump the boat now and go with async API.

Not many people remember this anymore, but promises were actually the original default of how to handle async in node.js. In node 0.1.30 however promisses were removed and replaced by a standardized callback with the function(err, result) signature. This was done largely for simplicity reasons.

These days, vast majority of your async calls takes this standard callback as the last parameter. Your database driver does it, your web framework does it - it's everywhere. You should stay with the prevalent design and use it too.

The only reason to prefer promises or events is if you have multiple different results that can happen. For example a socket can be opened, receive data, be closed, flushed etc.

This is not your case. Your module always does the same (reads some files). So option #2 it is (unless you can stay synchronous).

Finally, here are the two winning options rewritten slightly:

Synchronous option: good just for local filesystem at startup time

// db.js
var fs = require('fs');
exports = fs.readdirSync('.');

// main.js
var db = require('./db');
// insert rest of your main.js code here

Asynchronous option: for when you want to use DBs etc.

// db.js
var fs = require('fs'), cached_files;

exports.init = function(callback) {
  if (cached_files) {
    callback(null, cached_files);
  } else {
    fs.readdir('.', function(err, files) {
      if (!err) {
        cached_files = files;
      }
      callback(err, files);
    });
  }
};

// main.js
require('./db').init(function(err, files) {
  // insert rest of your main.js code here
});

In general it's very bad idea to have any state in module. Modules should expose functions, not data (yes, this requires to change your code structure a bit). Just pass references to your data to a modules functions as parameters.

(edit: just realised that this is approach from your last example. My vote for it)

module1:

module.exports = function(params, callback) { ... }

module2:

var createSomething = require('module1');
module.exports = function(params, callback) { 
   ...
   var db = createSomething(params, function(err, res) {
       ...
       callback(err, res);
   }
}

main code:

var createSomethingOther = require('module2');
createSomethingOther(err, result) {
    // do stuff
}

On my side such module is a function that takes callback (and if internally configured with promises also returns promise (see https://github.com/medikoo/deferred));

The only problem with callback is that by convention it always should be invoked in nextTick, so even when you call your module function when all the data is gathered you should still call your callback in next tick with result set.


Need Your Help

Back button re-submit form data ($_POST)

php forms post button back

My problem is that the back button causes the browser to say something like "Page expired" when the previous page was created by a form.