Filling gaps in time-series data
When working with time-series data, there can be gaps in the data due to missing data or inactivity.
Typically, we don't want those gaps to exist when we query the data. In this case, the WITH FILL
clause can come in handy.
This guide discusses how to use WITH FILL
to fill gaps in your time-series data.
Setup
Imagine we've got the following table that stores metadata on images generated by a GenAI image service:
Let's import some records:
Querying by bucket
We're going to explore the images created between 00:24:03
and 00:24:04
on the 24th March 2023, so let's create some parameters for those points in time:
Next, we'll write a query that groups the data into 100ms buckets and returns the count of images created in that bucket:
The result set only includes the buckets where an image was created, but for time-series analysis, we might want to return each 100ms bucket, even if it doesn't have any entries.
WITH FILL
We can use the WITH FILL
clause to fill in these gaps.
We'll also specify the STEP
, which is the size of the gaps to fill.
This defaults to 1 second for DateTime
types, but we'd like to fill gaps of 100ms in length, so let's an interval of 100ms as our step value:
We can see that the gaps have been filled with 0 values in the count
column.
WITH FILL...FROM
There is, however, still a gap at the beginning of the time range, which we can fix by specifying FROM
:
We can see from the results that the buckets from 00:24:03.000
to 00:24:03.500
all now appear.
WITH FILL...TO
We're still missing some buckets from the end of the time range though, which we can fill by providing a TO
value.
TO
is not inclusive, so we'll add a small amount to the end time to make sure that it's included:
The gaps have all now been filled and we have entries for every 100 ms from 00:24:03.000
to 00:24:05.000
.
Cumulative count
Let's say we now want to keep a cumulative count of the number of images created across the buckets.
We can do this by adding a cumulative
column, as shown below:
The values in the cumulative column aren't working how we'd like them to.
WITH FILL...INTERPOLATE
Any rows that have 0
in the count
column also have 0
in the cumulative column, whereas we'd rather it use the previous value in the cumulative
column.
We can do this by using the INTERPOLATE
clause, as shown below:
That looks much better.
And now to finish it off, let's add a bar chart using the bar
function, not forgetting to add our new column to the INTERPPOLATE
clause.