Serving real-time data python web application
I am writing a Python web application with the Flask framework, WSGIServer and geventwebsockets.
I have a thread pool of workers doing heavy processing work which then insert completed data into a MongoDB database. I want to be able to show a real-time stream of new data from MongoDB to the user on site.
What I have done at the moment is open a socket to connect with the client and poll MongoDB for new data every 3 seconds as shown here:
Are there any limitations with the way this has been written? Are there any more efficient/best practise alternatives that could produce a more seamless stream to the users? I want the application to be able to handle a lot more requests to the database.
There are two problems with your approach. One, each client connected to this Flask server polls the database separately, so if you have 100 clients connected you're doing 100 queries every 3 seconds. Better to have one background thread poll the database every 3 seconds, and update the other threads. echo_socket could wait on a global Condition variable that's notified by the background thread after each update.
The other problem with your code is that you're short-polling MongoDB, when you could be long-polling. Long-polling would give you lower latency between a message arriving in the database and your broadcasting it to users, and it will reduce load on the server. Consider Rick Copeland's blog post on MongoDB pub/sub for inspiration.