T-SQL Master Itzik Ben-Gan covered different solutions in SQL Magazine quite a few years ago, and there are many other articles out there about it.
I thought I would give a current example we were solving where this came into play. For the ED Dashboard we have a few charts that break down different metrics by day. We currently do not know any specifics about the shift the provider may have worked, but we wanted to try and assign the cases to the appropriate start of the provider day as best as we could.
In the below example we can see that generally it is simple to assign the day by using the Contact Start Time of the case. However, Case 14/15 and 32/35 we really want assigned to the prior day since the provider was continuously seeing patients the entire time.
The SQL Script is attached below for you to run, but I will go through it here.
First, we will generate sample data based on above.
Second, in order to determine continuous hours we need to generate a table of dates by hour.
For our test I just need the hours for anything between our minimum considered dates (Contact Start Date) and maximum considered dates (Contact End Date).
The results look something like this with each hour represented and the number of hours. Number of hours is important here for us to determine the islands and it is simply the difference in hours between a known starting point (1/1/1900) and the Date Hour.
Next we determine the islands, which I will break down below
Our first query – CTE_DistinctProviderHours – determines all of the distinct hours for our provider between their associated cases contact start date and contact end dates. If no contact end date is available then they get credit for the contact start date hour only. For example Case 1 for provider one the distinct hours are 7am, 8am, 9am on April 1.
The next query – CTE_ProviderIslandAssignment – is where the islands are actually determined. We know our Number Of Hours represent the actual distinct hours the provider saw patients and it is incrementing based on a known rate, 1 hour. By sequencing all of the distinct hours for the provider we have a second value incrementing by one. When both the hours and the sequence are incrementing at the same rate we can determine the island by subtracting the sequence from the hour, as illustrated below.
Finally, if we group by the Provider and RowNumber above we can take the MIN and MAX of the NumberOfHours to have our ranges.
Using the ranges we can easily assign the date we want the cases reported to. If we wanted we could further aggregate by provider and case reporting date with the MIN Start Range and MAX End Range.
Sample Code: SQL Provider Islands.zip
Have you used gaps and island logic before?
Share your thoughts in the comments below.