Node.js 100% CPU - gettimeofday call

I've got a long running node.js process that sometimes jumps to 100% CPU and stops responding to requests. The most recent time it did this I attached strace to the process, and here's what I saw:

Process 19523 attached - interrupt to quit
gettimeofday({1394558574, 234192}, NULL) = 0
gettimeofday({1394558574, 235381}, NULL) = 0
gettimeofday({1394558574, 306460}, NULL) = 0
mmap(0x3edab823a000, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x3edab823a000
munmap(0x3edab823a000, 811008)          = 0
munmap(0x3edab8400000, 237568)          = 0
mmap(0x3edab8300000, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3edab8300000
gettimeofday({1394558574, 316435}, NULL) = 0
gettimeofday({1394558574, 316576}, NULL) = 0
gettimeofday({1394558574, 316677}, NULL) = 0
gettimeofday({1394558574, 316775}, NULL) = 0
gettimeofday({1394558574, 316879}, NULL) = 0
gettimeofday({1394558574, 316978}, NULL) = 0
gettimeofday({1394558574, 317439}, NULL) = 0
gettimeofday({1394558574, 317537}, NULL) = 0
gettimeofday({1394558574, 318139}, NULL) = 0
gettimeofday({1394558574, 318234}, NULL) = 0
gettimeofday({1394558574, 318343}, NULL) = 0
gettimeofday({1394558574, 318437}, NULL) = 0
gettimeofday({1394558574, 318530}, NULL) = 0
gettimeofday({1394558574, 318624}, NULL) = 0
gettimeofday({1394558574, 319135}, NULL) = 0
gettimeofday({1394558574, 319648}, NULL) = 0
gettimeofday({1394558574, 319769}, NULL) = 0
gettimeofday({1394558574, 319975}, NULL) = 0
futex(0x7f5b380008c8, FUTEX_WAKE_PRIVATE, 1) = 1
gettimeofday({1394558574, 322266}, NULL) = 0
gettimeofday({1394558574, 322426}, NULL) = 0
gettimeofday({1394558574, 322520}, NULL) = 0
gettimeofday({1394558574, 322759}, NULL) = 0
gettimeofday({1394558574, 322853}, NULL) = 0
gettimeofday({1394558574, 322995}, NULL) = 0
futex(0x7f5b380008c8, FUTEX_WAKE_PRIVATE, 1) = 1
gettimeofday({1394558574, 417614}, NULL) = 0
gettimeofday({1394558575, 386566}, NULL) = 0
gettimeofday({1394558575, 387704}, NULL) = 0
gettimeofday({1394558575, 463410}, NULL) = 0
mmap(0x24cfd260f000, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x24cfd260f000
munmap(0x24cfd260f000, 987136)          = 0
munmap(0x24cfd2800000, 61440)           = 0
mmap(0x24cfd2700000, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x24cfd2700000

That's a lot of gettimeofday calls, and not much else! What might cause node.js to get stuck like this?

UPDATE: I upgraded from an older version of node.js to 10.29 (I think) and this went away. I've just upgraded to 10.33 and the problem has come back. I've made a little more progress on debugging it this time out. First strace:

$ sudo strace -p 11812 -c
Process 11812 attached - interrupt to quit
Process 11812 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.000192           0      2780           gettimeofday
  0.00    0.000000           0      1390           getrusage
  0.00    0.000000           0        31           futex
  0.00    0.000000           0      1390           clock_gettime
------ ----------- ----------- --------- --------- ----------------
100.00    0.000192                  5591           total

From Node.js: How to attach to a running process and to debug the server with a console? I found out about attaching the node debugger to a running process, and here's where I got with that:

$ sudo kill -s SIGUSR1 11812
$ sudo node debug -p 11812
connecting... ok
break in timers.js:79
  77 }
  78
  79 function listOnTimeout() {
  80   var msecs = this.msecs;
  81   var list = this;
debug> bt
#0 timers.js:79:23

So it looks like it is definitely related to a timer, but I don't know how to move further up the stack to find out where in my code the problem is getting triggered.

Answers


My guess is that someone manually implemented a blocking "setTimeout". This could happen if someone doesn't want to release control of the main JS thread in an effort to prevent a potential race condition.

For obvious reasons this is a bad practice for production code, but I've used it on occasion in debugging to force asynchronous processes to execute in a specific order.

You could look for something silly like this:

var mockAsyncFunction = function (cb) {
    setTimeout(function () {
        cb(null, 'dummy_result')
    }, 5000);
};

var myResult = null;
mockAsyncFunction(function (err, result) {
    myResult = result;
});
var timeOut = 10000; // timeout in 10 sec.
var timeStart = new Date().getTime();


while (1) {
    if (new Date().getTime() - timeStart > 10000) {
        break;
    }
}
console.log('DONE');

or something more nefarious with nextTick recursion like:

var timeStart = new Date().getTime();

var recurseUntilDone = function () {
    if (new Date().getTime() - timeStart < 10000) {
        process.nextTick(recurseUntilDone);
    } else {
        console.log('Done recursing');
    }
};

recurseUntilDone();

Use node-inspector to be able to pause your code when the CPU is 100% - my bet is also on some badly implemented loop checking state until a certain time has passed, but they can be hard to find.

Attach the debugger with --debug when starting node (ie. node index.js --debug) and in a separate window run node-inspector. Use Chrome to connect to your debug session (the url is outputted from node-inspector command) and wait until the problem occurs and pause the execution and you should be able to find it.


We saw this too, in both production and development. Also looking for an answer. We started investigating the node run-time, but the problem is so infrequent it's not getting any priority. Because it's completely cpu bound, no system calls, it's hard capture with an strace.

It won't be timed loop checking state, because a new Date().getTime() loop does not make any calls to gettimeofday (in node v0.10.29; it just does a long series of nanosleeps in one pid and just futex calls in the other. Clever, actually). Same for Date.now().


don't know if it's related, but here's a simple way to crush a node process: use an array as a queue when it has data in it. The symptoms are very very slow processing rate, and 100% cpu use.

//a = [];
a = new Array();

for (i=0; i<87370; i++) a.push({});

for (i=0; i<1000000; i++) {
    a.shift();
    a.push({});
    if (i%1000 === 0) {
    process.stdout.write(".");
    //global.gc();
    }
}

// node v0.10.29: 10k:  0.15 sec
//        80k:  0.17 sec
//        87369: 0.17 sec
//            87370: instant 41k, then .9k per second
//        87400: Array: instant 41k, then .9k per second
//        87400: []: instant 28k, then .9k per second
//        87400: Array, pushing 1 or 'x': .9k per second from the start
// node v0.11.13: 10k: 1.94 sec
//        16683: 3.87 sec
//        16884: uniform 237.16 sec

I ran perf record -F 997 -g node-v0.11.13 --perf-basic-prof script.js but pretty much all the time is ascribed to Builtin_ArrayShift (5-6 different hex locations called from ~)

Note that the behavior has a toggle point, below which it's fast, above it's slow. The 0.10 and 0.11 behavior is different; 0.11 seems to have two toggle points (3 different speeds of dots). The toggle points don't seem to stay the same between groups of runs.

Does not matter if global.gc() is invoked or not (with the --expose_gc switch). Does not matter if the push/shift is in a continuable or in a blocking loop like above.

Edit: what's weird is that this is both data and loop count dependent. Shorter loops run faster (even though they're chunked into 1k batches).

Also, when this condition exists, shift/push to other (empty) lists is also slowed to the same .9k ops/sec (node v0.10.29), but curiously they can be overlapped: 10 concurrent shift/append loops will each insert .9k items/sec, ie a total of 9k/sec. (So there are resources for 10x more throughput -- an internal throttle then?)

Edit: not an internal throttle; the slow thread is using a solid chunk of 0.8 - 1.2 ms of cpu, and all other threads are blocked until it finishes. Since concurrent threads must yield periodically (I used setImmediate), they keep getting blocked behind the cpu hog and make correspondingly slow progress.


explicit delete from larger hashes or arrays, like cleanup of an internal data container, could also cause symptoms like this. The inherent node operation can be abysmally slow:

h = {}
for (i=0; i<200000; i++) h[i] = i;     // 25,000,000 / sec
for (i=0; i<1000; i++) delete h[i];    // 11,000 / sec
for (i=0; i<200000; i++) delete h[i];  // 7,700 / sec

a = new Array();
for (i=0; i<200000; i++) a[i] = i;     // 50,000,000 / sec
for (i=0; i<1000; i++) delete a[i];    // 10,000 / sec
for (i=0; i<200000; i++) delete a[i];  // 8,000 / sec
// and out of curiousity...
for (i=0; i<200000; i++) a[i];         // 250,000,000 / sec
for (i=1; i<200000; i++) a[i-1] = a[i]; // 180,000,000 / sec

deleting from back to front is 2x faster, but that's still very slow. (note: assigning into a new Array is 50m/s, but into [] is only 25m/s)

The slowdown is O(n^2): doubling the array size to 400k quadruples the runtime. Each delete operation is O(n) of the size of the hash/array.

The strace of a 400k run (100 seconds) shows no sytem calls to speak of, and does not match the gettimeofday() pattern seen above.

...
14:08:36.327886 munmap(0x44200000, 802816) = 0
14:08:36.328145 munmap(0xf7100000, 1183744) = 0
14:08:36.328483 clock_gettime(CLOCK_REALTIME, {1418584116, 328495018}) = 0
14:08:36.328540 clock_gettime(CLOCK_REALTIME, {1418584116, 328551680}) = 0
14:08:36.328581 clock_gettime(CLOCK_REALTIME, {1418584116, 328592628}) = 0
14:08:36.328633 futex(0x8e79a7c, FUTEX_WAKE_PRIVATE, 1) = 1
14:09:03.130336 clock_gettime(CLOCK_REALTIME, {1418584143, 130406483}) = 0
14:09:03.130479 clock_gettime(CLOCK_REALTIME, {1418584143, 130503357}) = 0
14:09:03.130560 futex(0x8e79a7c, FUTEX_WAKE_PRIVATE, 1) = 1
14:09:37.090127 clock_gettime(CLOCK_REALTIME, {1418584177, 90195108}) = 0
14:09:37.090271 clock_gettime(CLOCK_REALTIME, {1418584177, 90296760}) = 0
14:09:37.090355 futex(0x8e79a7c, FUTEX_WAKE_PRIVATE, 1) = 1
14:10:17.588552 mmap2(0x3f302000, 4231168, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x3f302000
14:10:17.588694 munmap(0x3f302000, 1040384) = 0
14:10:17.588768 munmap(0x3f709000, 8192) = 0
...

Need Your Help

Using '-webkit-overflow-scrolling: touch' hides content while scrolling/dragging

css webkit mobile-safari

as the title says I'm having the problem with content getting hidden while scrolling/dragging the content when using the CSS property -webkit-overflow-scrolling: touch.