Cumulative Monthly Totals and Postgresql
I am trying to calculate number of users, cumulatively for the dellstore2 database. Looking at answers here, and other forums, I used this
select date_trunc('month',orderdate), sum(count(distinct(customerid))) over (order by date_trunc('month',orderdate)) from orders group by date_trunc('month',orderdate)
2004-01-01 00:00:00.0 979 2004-02-01 00:00:00.0 1,952 2004-03-01 00:00:00.0 2,922 2004-04-01 00:00:00.0 3,898 2004-05-01 00:00:00.0 4,873 2004-06-01 00:00:00.0 5,846 2004-07-01 00:00:00.0 6,827 2004-08-01 00:00:00.0 7,799 2004-09-01 00:00:00.0 8,765 2004-10-01 00:00:00.0 9,745 2004-11-01 00:00:00.0 10,710 2004-12-01 00:00:00.0 11,681
Each month is
979 973 970 976 975 973 981 972 966 980 965 971
It seems to be totaling fine, looking at the first few items. But when I ran
select count(distinct(customerid)) from orders
for the entire thing, I get
which does not agree with the last item in the first output 11,681. I guess the calculation above cannot determine uniqueness across months. What is the fastest way for this calculation, preferably without using self-joins?
Instead of selecting directly from orders, you could use a subquery like so:
SELECT OrderDate, SUM(COUNT(DISTINCT customerid)) OVER (ORDER BY OrderDate) FROM ( SELECT CustomerID, DATE_TRUNC('MONTH', MIN(OrderDate)) AS OrderDate FROM Orders GROUP BY CustomerID ) AS Orders GROUP BY OrderDate
I think this would work as required.
If you still needed both methods (i.e. distinct and running total) you could use this:
SELECT OrderDate, COUNT(DISTINCT CustomerID) AS MonthTotal, SUM(COUNT(DISTINCT customerid)) OVER (ORDER BY OrderDate) AS CumulativeTotal, SUM(COUNT(DISTINCT CASE WHEN OrderNumber = 1 THEN customerid END)) OVER (ORDER BY OrderDate) AS CumulativeDistinctTotal FROM ( SELECT CustomerID, OrderDate, ROW_NUMBER() OVER(PARTITION BY CustomerID ORDER BY OrderDate) AS OrderNumber FROM Orders ) AS Orders GROUP BY OrderDate