SQL query not finding correct answer

I am trying to figure out website visits. Every visit within 30 minutes should count as one visit for that user.

My table looks like this

TimeUser,             Userid,    OrderID

10/7/2013 14:37:14  _26Tf-0PjaS0dpiZXB61Rg  151078706
10/7/2013 14:39:59  _26Tf-0PjaS0dpiZXB61Rg  151078706
10/7/2013 14:40:35  _26Tf-0PjaS0dpiZXB61Rg  151078706
10/11/2013 0:09:23  _2MrGz4L_d5AF3UHpP-oJQ  151078706
10/2/2013 20:55:05  _4Pb2wEwiQomUny_XwVuvQ  151078706
10/2/2013 20:55:06  _4Pb2wEwiQomUny_XwVuvQ  151078706
10/2/2013 20:55:06  _4Pb2wEwiQomUny_XwVuvQ  151078706

In this case 151078706 should return 3 visits. I think my SQL query looks right, but when I check my answer with my Excel created Visits number, some of orders off by 5%. I am hundred percent sure Excel numbers are correct.

Here is what I have so far. If anyone sees any issue with my query please correct me. And also if there any other better ways to find visits?

SET @row_num=0,
    @temp_row=1;


SELECT orderidtable.orders,
       count(orderidtable.users)
FROM
  (SELECT temptab.temprow,
          temptab.userid users,
          temptab.orderid orders,
          temptab.TimeUser
   FROM
     (SELECT @row_num := @row_num + 1 AS rownumber, TimeUser,
                                                    userid,
                                                    orderid
      FROM order.order_dec
      ORDER BY orderid,
               userid,
               timeuser) subtable ,
     (SELECT @temp_row:= @temp_row+1 AS temprow, Timeuser,
                                                 userid,
                                                 orderid
      FROM
      ORDER.order_dec
      ORDER BY orderid,
               userid,
               timeuser) temptab
   WHERE (subtable.rownumber=temptab.temprow
          AND abs(Time_To_Sec(subtable.TimeUser)-Time_To_Sec(temptab.TimeUser))>=1800)
     OR (subtable.rownumber=temptab.temprow
         AND subtable.userid<>temptab.userid)
     OR (subtable.rownumber=temptab.temprow
         AND subtable.orderid<>temptab.orderid)) orderidtable
GROUP BY orderidtable.orders

Answers


Numbering the rows is a right strategy; your query is going wrong in where condition.

Algorithm to solve it would be:

  1. Number the rows ordering by orderid, userid, timeuser. Make two copies (subtable and temptable) of this dataset as you are already doing.
  2. Join these tables on following condition:

    subtable.rownumber =temptab.temprow + 1 
    

    What we trying to do here is to join the tables in a manner such that a row of subtable joins with a row of temptable with rownumber 1 lesser than its own. We are doing it to be capable of comparing consecutive time of visits of an user to an Ad. (You have already done it by setting @row_num=0, @temp_row=1). This is the only condition we should apply to the JOIN.

  3. Now in the SELECT statement use CASE statement like below

    (CASE WHEN subtable.orderid = temptable.orderid AND subtable.userid = temptable.userid AND (Time_To_Sec(subtable.TimeUser)-Time_To_Sec(temptab.TimeUser))< 1800 THEN 0 ELSE 1) As IsVisit

  4. Now in an outer query GROUP BY order_id and in SELECT sum up IsVisit.

Let me know should you need more clarity or let me know if it worked.

Addendum: From the previous query you can try replacing the where condition as subtable.rownumber = temptab.temprow + 4 and in SELECT statement replace the CASE statement of above query with the following:

 (CASE WHEN subtable.orderid = temptable.orderid AND subtable.userid = temptable.userid AND (Time_To_Sec(subtable.TimeUser)-Time_To_Sec(temptab.TimeUser))< 900 THEN 1
      ELSE 0) As IsVisit

Take UNION of the result set returned by previous query and this one, and then apply GROUP BY.


One issue I see: Your query is overly complex. What about this?

Now then, both your original and this query will err when there's a visit near midnight, and another visit right shortly after it - in this case, both queries will count them as 2 visits when they really should be counted as one, if I understood your request correctly. From this simplified query, though, it should be easy for you to do the required change.

SELECT orderidtable.OrderID, COUNT(orderidtable.UserID) visits
FROM (
    SELECT Timeuser, Userid, OrderID
      FROM order.order_dec SubTab1
     WHERE NOT EXISTS (
            SELECT 1 FROM order.order_dec SubTab2
             WHERE SubTab1.OrderID  = SubTab2.OrderID
               AND SubTab2.TimeUser > SubTab2.TimeUser
               AND Time_To_Sec(SubTab2.TimeUser)
                   BETWEEN Time_To_Sec(SubTab1.OrderID)
                       AND Time_To_Sec(SubTab1.OrderID)+1800
    )
) orderidtable
GROUP BY orderidtable.OrderID

I think just one time table full scan is sufficient for what you want as follows.

You can test here. http://www.sqlfiddle.com/#!2/a5dbcd/1.

Although my Query is not tested on many sample data, I think minor change is needed if it has bugs.

SELECT MAX(current_uv) AS uv
FROM (
  SELECT orderid, userid, timeuser,
    IF(orderid != @prev_orderid, @prev_timeuser := 0, @prev_timeuser) AS prev_timeuser,
    @prev_orderid := orderid AS prev_orderid,

    IF(userid != @prev_userid, @prev_timeuser := 0, @prev_timeuser) AS prev_timeuser2,
    @prev_userid := userid AS prev_userid,

    IF(TO_SECONDS(timeuser) - @prev_timeuser > 1800, @current_uv := @current_uv + 1, @current_uv) AS current_uv,
    @prev_timeuser := TO_SECONDS(timeuser) AS prev_timeuser3
  FROM order_dec,
    (SELECT @prev_orderid := 0, @prev_userid = '', @prev_timeuser := 0, @current_uv := 0) init
  ORDER BY orderid, userid, timeuser
) x;

Need Your Help

Print series of prime numbers in python

python primes series

I am trying to learn Python programming, and I'm pretty new at this.

multiple media queries declaration on responsive design

css css3 media-queries

@media only screen and (min-device-width: 480px) {