Application time series with postgres materialized views
27 Jul 2015
data
postgresql
sql
bi
When building an application that manages a huge amount of data, it’s often a requirement to provide to the users a way to visualize how data grow over the time.
Let’s consider we are making a payment processing app for merchants with the following database schema:
With few test data:
On the merchant dashboard, we want to display a wonderful chart showing them the total amount of transaction per month for the whole year.
The natural way to extract that kind of data is to consider aggregating over the normalized application schema:
That results in something like:
But the query took 1068.265 ms to execute, which is really too slow for a web app…
Store a time serie within a materialized view:
As in our dashboard we also want to allow the user to zoom in a specific month and a specific day, we will then consider that the unit serie will be: total amount / merchant / day.
So the idea here is to create a (materialized) view to store our time serie:
And a unique index on it:
Now, retrieving our per month data for a whole year (with total) is as simple as:
That results in:
within 1.906 ms!
Or, for example, to query transaction amount per day for last month with total:
That returns:
within 0.113 ms.
Refresh strategy
Unfortunately, postgresql still doesn’t provide a way to partially refresh a view, but 9.4 introduced a concurrently parameter that allows to run a non blocking refesh:
Scale up considerations
put the views in physical volumes separated from the application tables.
query the views from dedicated slave nodes.
make per time granularity views, e.g.: per hour, per day, per month.