Python multiprocessing load balancer
Short question: Is it possible to have N work processes and a balancer process that will find worker that does nothing at this time and pass UnitOfWork to it?
Long question: Imagine class like this, witch will be subclassed for certain tasks:
class UnitOfWork: def __init__(self, **some_starting_parameters): pass def init(self): # open connections, etc. def run(self): # do the job
Start the balancer and worker process:
balancer = LoadBalancer() workers = balancer.spawn_workers(10)
Deploy work (balancer should find a lazy worker, and pass a task to it, or else if every worker is busy, add UOW to queue and wait till free worker):
balancer.work(UnitOfWork(some=parameters)) # internally, find free worker, pass UOW, ouw.init() + ouw.run()
Is this possible (or is it crazy)?
PS I'm familiar with multiprocessing Process class, and process pools, but:
- Every Process instance starts a process (yep :) ) - I want fixed num of workers
- I want Process instance that can make generic work
You don't need any smarts in the balancer; the Queue alone will do what you want. Throw each unit of work into the queue, and have the workers loop, taking a single work unit from the queue and processing it on each iteration. I don't think there's any problem passing an instance of UnitOfWork through the queue.
If you have a fixed amount of work to be done, you can create a "no more work to be done" work unit (a "poison pill") that tells a worker to shut down, and after all the regular work is put into the queue, put as many poison pills into the queue as you have workers.
I suggest you take a look at multiprocessing.Pool() because I believe it exactly solves your problem. It runs N "worker processes" and as each worker finishes a task, another task is provided. And there is no need for "poison pills"; it is very simple.
I have always used the .map() method on the pool.
EDIT: Here is an answer I wrote to another question, and I used multiprocessing.Pool() in my answer.