How to invert a permutation array in numpy

Given a self-indexing (not sure if this is the correct term) numpy array, for example:

a = np.array([3, 2, 0, 1])

This represents this permutation (=> is an arrow):

0 => 3
1 => 2
2 => 0
3 => 1

I'm trying to make an array representing the inverse transformation without doing it "manually" in python, that is, I want a pure numpy solution. The result I want in the above case is:

array([2, 3, 1, 0])

Which is equivalent to

0 <= 3                0 => 2
1 <= 2       or       1 => 3
2 <= 0                2 => 1
3 <= 1                3 => 0

It seems so simple, but I just can't think of how to do it. I've tried googling, but haven't found anything relevant.


The inverse of a permutation p of np.arange(n) is the array of indices s that sort p, i.e.

p[s] == np.arange(n)

must be all true. Such an s is exactly what np.argsort returns:

>>> p = np.array([3, 2, 0, 1])
>>> np.argsort(p)
array([2, 3, 1, 0])
>>> p[np.argsort(p)]
array([0, 1, 2, 3])

Sorting is an overkill here. This is just a single-pass, linear time algorithm with constant memory requirement:

from __future__ import print_function
import numpy as np

p = np.array([3, 2, 0, 1])
s = np.empty(p.size, dtype=np.int32)
for i in np.arange(p.size):
    s[p[i]] = i

print('s =', s)

The above code prints

 s = [2 3 1 0]

as required.

The rest of the answer is concerned with the efficient vectorization of the above for loop. If you just want to know the solution, jump to the end of this answer.

(The original answer from Aug 27, 2014; the timings are valid for NumPy 1.8. An update with NumPy 1.11 follows later.)

A single-pass, linear time algorithm is expected to be faster than np.argsort; interestingly, the trivial vectorization (s[p] = xrange(p.size), see index arrays) of the above for loop is actually slightly slower than np.argsort as long as p.size < 700 000 (well, on my machine, your mileage will vary):

import numpy as np

def np_argsort(p):
    return np.argsort(p)

def np_fancy(p):
    s = np.zeros(p.size, p.dtype) # np.zeros is better than np.empty here, at least on Linux
    s[p] = xrange(p.size) 
    return s

def create_input(n):
    indices = np.arange(n, dtype = np.int32)
    return np.random.permutation(indices)

From my IPython notebook:

p = create_input(700000)
%timeit np_argsort(p)
10 loops, best of 3: 72.7 ms per loop
%timeit np_fancy(p)
10 loops, best of 3: 70.2 ms per loop

Eventually the asymptotic complexity kicks in (O(n log n) for argsort vs. O(n) for the single-pass algorithm) and the single-pass algorithm will be consistently faster after a sufficiently large n = p.size (threshold is around 700k on my machine).

However, there is a less straightforward way to vectorize the above for loop with np.put:

def np_put(p):
    n = p.size
    s = np.zeros(n, dtype = np.int32)
    i = np.arange(n, dtype = np.int32)
    np.put(s, p, i) # s[p[i]] = i 
    return s

Which gives for n = 700 000 (the same size as above):

p = create_input(700000)
%timeit np_put(p)
100 loops, best of 3: 12.8 ms per loop

This is a nice 5.6x speed up for next to nothing!

To be fair, np.argsort still beats the np.put approach for smaller n (the tipping point is around n = 1210 on my machine):

p = create_input(1210)
%timeit np_argsort(p)
10000 loops, best of 3: 25.1 µs per loop
%timeit np_fancy(p)
10000 loops, best of 3: 118 µs per loop
%timeit np_put(p)
10000 loops, best of 3: 25 µs per loop

This is most likely because we allocate and fill in an extra array (at the np.arange() call) with the np_put approach.

Although you didn't ask for a Cython solution, just out of curiosity, I also timed the following Cython solution with typed memoryviews:

import numpy as np
cimport numpy as np

def in_cython(np.ndarray[np.int32_t] p):    
    cdef int i
    cdef int[:] pmv
    cdef int[:] smv 
    pmv = p
    s = np.empty(p.size, dtype=np.int32)
    smv = s
    for i in xrange(p.size):
        smv[pmv[i]] = i
    return s


p = create_input(700000)
%timeit in_cython(p)
100 loops, best of 3: 2.59 ms per loop

So, the np.put solution is still not as fast as possible (ran 12.8 ms for this input size; argsort took 72.7 ms).

Update on Feb 3, 2017 with NumPy 1.11

Jamie, Andris and Paul pointed out in comments below that the performance issue with fancy indexing was resolved. Jamie says it was already resolved in NumPy 1.9. I tested it with Python 3.5 and NumPy 1.11 on the machine that I was using back in 2014.

def invert_permutation(p):
    s = np.empty(p.size, p.dtype)
    s[p] = np.arange(p.size)
    return s


p = create_input(880)
%timeit np_argsort(p)
100000 loops, best of 3: 11.6 µs per loop
%timeit invert_permutation(p)
100000 loops, best of 3: 11.5 µs per loop

A significant improvement indeed!


All in all, I would go with the

def invert_permutation(p):
    '''The argument p is assumed to be some permutation of 0, 1, ..., len(p)-1. 
    Returns an array s, where s[i] gives the index of i in p.
    s = np.empty(p.size, p.dtype)
    s[p] = np.arange(p.size)
    return s

approach for code clarity. In my opinion, it is less obscure than argsort, and also faster for large input sizes. If speed becomes an issue, I would go with the Cython solution.

I'd like to offer a tiny bit more background to larsmans correct answer. The reason why argsort is correct can be found when you use the representation of a permutation by a matrix. The mathematical advantage to a permutation matrix P is that the matrix "operates on vectors", i.e. a permutation matrix times a vector permutes the vector.

Your permutation looks like:

import numpy as np
a   = np.array([3,2,0,1])
N   = a.size
rows = np.arange(N)
P   = np.zeros((N,N),dtype=int)
P[rows,a] = 1

[[0 0 0 1]
 [0 0 1 0]
 [1 0 0 0]
 [0 1 0 0]]

Given a permutation matrix, we can "undo" multipication by multiplying by it's inverse P^-1. The beauty of permutation matrices is that they are orthogonal, hence P*P^(-1)=I, or in other words P(-1)=P^T, the inverse is the transpose. This means we can take the indices of the transpose matrix to find your inverted permutation vector:

inv_a = np.where(P.T)[1]
[2 3 1 0]

Which if you think about it, is exactly the same as finding the indices that sort the columns of P!

Need Your Help

Search on multiple collections in MongoDB

mongodb mongodb-query aggregation-framework

I know the theory of MongoDB and the fact that is doesn't support joins, and that I should use embeded documents or denormalize as much as possible, but here goes:

WebSocket on IE10 giving a SecurityError

javascript websocket internet-explorer-10

I am currently developing a website under IE10 (on Windows 8), using WebSockets in JavaScript. It runs fine under Firefox 18 and Chrome 25, but on IE10 I get a SecurityError when I establish the