SQL query not finding correct answer
I am trying to figure out website visits. Every visit within 30 minutes should count as one visit for that user.
My table looks like this
TimeUser, Userid, OrderID 10/7/2013 14:37:14 _26Tf-0PjaS0dpiZXB61Rg 151078706 10/7/2013 14:39:59 _26Tf-0PjaS0dpiZXB61Rg 151078706 10/7/2013 14:40:35 _26Tf-0PjaS0dpiZXB61Rg 151078706 10/11/2013 0:09:23 _2MrGz4L_d5AF3UHpP-oJQ 151078706 10/2/2013 20:55:05 _4Pb2wEwiQomUny_XwVuvQ 151078706 10/2/2013 20:55:06 _4Pb2wEwiQomUny_XwVuvQ 151078706 10/2/2013 20:55:06 _4Pb2wEwiQomUny_XwVuvQ 151078706
In this case 151078706 should return 3 visits. I think my SQL query looks right, but when I check my answer with my Excel created Visits number, some of orders off by 5%. I am hundred percent sure Excel numbers are correct.
Here is what I have so far. If anyone sees any issue with my query please correct me. And also if there any other better ways to find visits?
SET @row_num=0, @temp_row=1; SELECT orderidtable.orders, count(orderidtable.users) FROM (SELECT temptab.temprow, temptab.userid users, temptab.orderid orders, temptab.TimeUser FROM (SELECT @row_num := @row_num + 1 AS rownumber, TimeUser, userid, orderid FROM order.order_dec ORDER BY orderid, userid, timeuser) subtable , (SELECT @temp_row:= @temp_row+1 AS temprow, Timeuser, userid, orderid FROM ORDER.order_dec ORDER BY orderid, userid, timeuser) temptab WHERE (subtable.rownumber=temptab.temprow AND abs(Time_To_Sec(subtable.TimeUser)-Time_To_Sec(temptab.TimeUser))>=1800) OR (subtable.rownumber=temptab.temprow AND subtable.userid<>temptab.userid) OR (subtable.rownumber=temptab.temprow AND subtable.orderid<>temptab.orderid)) orderidtable GROUP BY orderidtable.orders
Numbering the rows is a right strategy; your query is going wrong in where condition.
Algorithm to solve it would be:
- Number the rows ordering by orderid, userid, timeuser. Make two copies (subtable and temptable) of this dataset as you are already doing.
Join these tables on following condition:
subtable.rownumber =temptab.temprow + 1
What we trying to do here is to join the tables in a manner such that a row of subtable joins with a row of temptable with rownumber 1 lesser than its own. We are doing it to be capable of comparing consecutive time of visits of an user to an Ad. (You have already done it by setting @row_num=0, @temp_row=1). This is the only condition we should apply to the JOIN.
Now in the SELECT statement use CASE statement like below
(CASE WHEN subtable.orderid = temptable.orderid AND subtable.userid = temptable.userid AND (Time_To_Sec(subtable.TimeUser)-Time_To_Sec(temptab.TimeUser))< 1800 THEN 0 ELSE 1) As IsVisit
Now in an outer query GROUP BY order_id and in SELECT sum up IsVisit.
Let me know should you need more clarity or let me know if it worked.
Addendum: From the previous query you can try replacing the where condition as subtable.rownumber = temptab.temprow + 4 and in SELECT statement replace the CASE statement of above query with the following:
(CASE WHEN subtable.orderid = temptable.orderid AND subtable.userid = temptable.userid AND (Time_To_Sec(subtable.TimeUser)-Time_To_Sec(temptab.TimeUser))< 900 THEN 1 ELSE 0) As IsVisit
Take UNION of the result set returned by previous query and this one, and then apply GROUP BY.
One issue I see: Your query is overly complex. What about this?
Now then, both your original and this query will err when there's a visit near midnight, and another visit right shortly after it - in this case, both queries will count them as 2 visits when they really should be counted as one, if I understood your request correctly. From this simplified query, though, it should be easy for you to do the required change.
SELECT orderidtable.OrderID, COUNT(orderidtable.UserID) visits FROM ( SELECT Timeuser, Userid, OrderID FROM order.order_dec SubTab1 WHERE NOT EXISTS ( SELECT 1 FROM order.order_dec SubTab2 WHERE SubTab1.OrderID = SubTab2.OrderID AND SubTab2.TimeUser > SubTab2.TimeUser AND Time_To_Sec(SubTab2.TimeUser) BETWEEN Time_To_Sec(SubTab1.OrderID) AND Time_To_Sec(SubTab1.OrderID)+1800 ) ) orderidtable GROUP BY orderidtable.OrderID
I think just one time table full scan is sufficient for what you want as follows.
You can test here. http://www.sqlfiddle.com/#!2/a5dbcd/1.
Although my Query is not tested on many sample data, I think minor change is needed if it has bugs.
SELECT MAX(current_uv) AS uv FROM ( SELECT orderid, userid, timeuser, IF(orderid != @prev_orderid, @prev_timeuser := 0, @prev_timeuser) AS prev_timeuser, @prev_orderid := orderid AS prev_orderid, IF(userid != @prev_userid, @prev_timeuser := 0, @prev_timeuser) AS prev_timeuser2, @prev_userid := userid AS prev_userid, IF(TO_SECONDS(timeuser) - @prev_timeuser > 1800, @current_uv := @current_uv + 1, @current_uv) AS current_uv, @prev_timeuser := TO_SECONDS(timeuser) AS prev_timeuser3 FROM order_dec, (SELECT @prev_orderid := 0, @prev_userid = '', @prev_timeuser := 0, @current_uv := 0) init ORDER BY orderid, userid, timeuser ) x;