Parallelization of calls to scipy RectBivariateSpline
I'm working on a python code where I need to evaluate a 2D spline at an arbitrary set of points many times. The code looks like this:
spline = scipy.interpolate.RectBivariateSpline(...) for i in range(1000000): x_points, y_points = data.get_output_points(i) vals = spline.ev(x_points, y_points) """ do stuff with vals """
There is no overlap of the output points. I would like to parallelize this using threads or some kind of shared memory since data.get_output_points uses a lot of memory. Naively, I tried spawning 10 threads and giving them each 1/10 of that loop. However, this doesn't give me any speed-up over running with a single thread.
I profiled the code, and it is spending all of its time in fitpack2.py:674(\__call__), which is the _BivariateSplineBase evaluation function. It seems like I'm running into some GIL issue which is preventing the threads from running independently.
How can I get around the GIL issue and parallelize this? Is there a way to call into the fitpack routines that will parallelize well, or a different spline that I could use? My input grid is uniform and oversampled, but my output points can be anywhere. I have tried using RegularGridInterpolator (linear interpolation) which has good enough, although not ideal, performance, but it parallelizes poorly using threads.
EDIT: Here is what I mean by naive thread parallelization:
def worker(start, end): for i in range(start, end): x_points, y_points = data.get_output_points(i) vals = spline.ev(x_points, y_points) """ do stuff with vals """ t1 = threading.Thread(target=worker, args=(0, 500000)).start() t2 = threading.Thread(target=worker, args=(500001, 1000000)).start() t1.join() t2.join()
There are multiple ways to process in parallel in python avoiding the GIL:
See here for more
And yes, you are hitting the GIL neckbottle.