Flatten/merge overlapping time intervals
I have a 'Service' table with millions of rows. Each row corresponds to a service provided by a staff in a given date and time interval (Each row has a unique ID). There are cases where a staff might provide services in overlapping time frames. I need to write a query that merges overlapping time intervals and returns the data in the format shown below.
I tried grouping by StaffID and Date fields and getting the Min of BeginTime and Max of EndTime but that does not account for the non-overlapping time frames. How can I accomplish this? Again, the table contains several million records so a recursive CTE approach might have performance issues. Thanks in advance.
ID StaffID Date BeginTime EndTime 1 101 2014-01-01 08:00 09:00 2 101 2014-01-01 08:30 09:30 3 101 2014-01-01 18:00 20:30 4 101 2014-01-01 19:00 21:00
StaffID Date BeginTime EndTime 101 2014-01-01 08:00 09:30 101 2014-01-01 18:00 21:00
Here is another sample data set with a query proposed by a contributor. http://sqlfiddle.com/#!6/bfbdc/3
The first two rows in the results set should be merged into one row (06:00-08:45) but it generates two rows (06:00-08:30 & 06:00-08:45)
I only came up with a CTE query as the problem is there may be a chain of overlapping times, e.g. record 1 overlaps with record 2, record 2 with record 3 and so on. This is hard to resolve without CTE or some other kind of loops, etc. Please give it a go anyway.
The first part of the CTE query gets the services that start a new group and are do not have the same starting time as some other service (I need to have just one record that starts a group). The second part gets those that start a group but there's more then one with the same start time - again, I need just one of them. The last part recursively builds up on the starting group, taking all overlapping services.
Here is SQLFiddle with more records added to demonstrate different kinds of overlapping and duplicate times.
I couldn't use ServiceID as it would have to be ordered in the same way as BeginTime.
;with flat as ( select StaffID, ServiceDate, BeginTime, EndTime, BeginTime as groupid from services S1 where not exists (select * from services S2 where S1.StaffID = S2.StaffID and S1.ServiceDate = S2.ServiceDate and S2.BeginTime <= S1.BeginTime and S2.EndTime <> S1.EndTime and S2.EndTime > S1.BeginTime) union all select StaffID, ServiceDate, BeginTime, EndTime, BeginTime as groupid from services S1 where exists (select * from services S2 where S1.StaffID = S2.StaffID and S1.ServiceDate = S2.ServiceDate and S2.BeginTime = S1.BeginTime and S2.EndTime > S1.EndTime) and not exists (select * from services S2 where S1.StaffID = S2.StaffID and S1.ServiceDate = S2.ServiceDate and S2.BeginTime < S1.BeginTime and S2.EndTime > S1.BeginTime) union all select S.StaffID, S.ServiceDate, S.BeginTime, S.EndTime, flat.groupid from flat inner join services S on flat.StaffID = S.StaffID and flat.ServiceDate = S.ServiceDate and flat.EndTime > S.BeginTime and flat.BeginTime < S.BeginTime and flat.EndTime < S.EndTime ) select StaffID, ServiceDate, MIN(BeginTime) as begintime, MAX(EndTime) as endtime from flat group by StaffID, ServiceDate, groupid order by StaffID, ServiceDate, begintime, endtime