Performance of event-source

I'm currently working on a large project, which requires server-sent events implementation. I've decided to use event-source transport for it, and started from simple chat. Currently client side listens only to a new chat message event, but project will have a lot more events in the future. First of, I'm really concerned about a server-side script and loop in it, and second, I'm not sure that using mySQL database as storage(in this case, for chat messages) is actually a good practice. Current loop gives away new messages as they appear in database:

$statement = $connect->prepare("SELECT id, event, user, message FROM chat WHERE id > :last_event_id");
while(TRUE) {
    try {
        $statement->execute(array(':last_event_id' => $lastEventId));
        $result = $statement->fetchAll();
        foreach($result as $row) {
            echo "id: " . $row['id'] . "\n";
            echo "event: " . $row['event'] . "\n";
            echo "data: |" . $row['user'] . "| >>> \n";
            echo "data: " . $row['message'] . "\n\n";
            $lastEventId++;
        }
    } catch(PDOException $PDOEX) {
        echo $PDOEX->getMessage();
    }
    ob_flush();
    flush();
    usleep(10000);
}

From what I've read such loop is inevitable, and my task is to optimize it's performance. Currently I'm using prepared statement outside of while() and reasonable(?) usleep().

So, the questions to those who got experience in server-side events:

  1. Is such technique reasonable to use in moderately loaded web-sites(1000-5000 users on-line)?
  2. If yes, is there any way to boost performance?
  3. Could mySQL database be a bottleneck in this case?

Appreciate any help, as question is quite complex and searching info won't give me any tips or ways to test it.

Answers


Will all 1000+ users be connected simultaneously? And are you using Apache with PHP? If so, I think the thing you should really be concerned about is memory: each user is holding open a socket, an Apache process, and a PHP instance. You'll need to measure yourself, for your own setup, but if we say 20MB each, that is 20GB of memory for 1000 users. If you tighten things so each process is 12MB that is still 12GB per 1000 users. (A m2.xlarge EC2 instance has 17GB of memory, so if you budget one of those per 500-1000 users I think you will be okay.)

In contrast, with your 10 second poll time, CPU usage is very low. For the same reason, I would not imagine polling the MySQL DB will be the bottleneck, but at that level of use I would consider having each DB write also do a write to memcached. Basically, if you don't mind throwing a bit of hardware at it, your approach looks doable. It is not the most efficient use of memory, but if you are familiar with PHP it will probably be the most efficient use of programmer time.


UPDATE: Just saw OP's comment and realized that was usleep(10000) is 0.01s, not 10s. Oops! That changes everything:

  • your CPU usage is now high!
  • You need a set_time_limit(0) at the top of your script: you are going to hit the default 30 second CPU usage very quickly with that tight limit.
  • Instead of polling a DB you should use a notification queue service.

I'd use the queue service instead instead of memcached, and you could either find something off the shelf, or write something custom in PHP fairly easily. You can still keep MySQL as the main DB and have your queue service poll MySQL; the difference here is you only have one process polling it intensively, not one thousand. The queue service is a simple socket server, that accepts a connection from each of your front-facing PHP scripts. Each time its polling finds a new message, it broadcasts that to all the clients that have connected to it. (There are different ways to architect it, but I hope that gives you the general idea.)

Over on the front-facing PHP script, you use a socket_select() call with a 15-second timeout. It only wakes up when there is no data, so is using zero CPU the rest of the time. (The 15-second timeout is so you can send SSE keep-alives.)


(Source for the 20MB and 12MB figures)


Need Your Help

Basemap causing data to not plot

python matplotlib netcdf matplotlib-basemap

I am getting a very strange error using basemap. No error appears, yet my 3rd plot has no data plotted when data does indeed exist. Below is my code. When run, you will see that both modis and s...

Multiway tree comparable interface issue

java generics b-tree multiway-tree

I am creating a generic multiway tree which I KNOW will take only one of four types (Integer, Double, String and Character). I am having problems with the comparable interface and my insert function.