sql query with subqueries optimization

I'm a beginner in SQL and I'm having an issue with my subqueries below. The reason I set up the subquery is I need to order and group a column differently than how data is sorted on the main table. Are there any other ways to optimize this query? It has been timing out and has been running for more than 30 minutes. I'd like to avoid rescanning the piwik_log_visit table as much as possible. Is there a way to Order By or Group By specific columns only? I appreciate any help. Thank you!

Set @theDate = cast('2015-11-26 08:00:00' as datetime);

SELECT  t2.idorder AS 'Order ID',
        (
            SELECT  COALESCE(NULLIF(t3.referer_name,''), 'Direct')
            FROM    piwik_log_visit t3
            WHERE   conv(hex(t3.idvisitor), 16, 10) = conv(hex(t2.idvisitor), 16, 10) 
            AND     t3.visit_first_action_time >= (@theDate - INTERVAL 32 DAY)
            ORDER BY t3.visit_last_action_time DESC
            limit 1
        ) AS 'Referrer (Last)',
        (
            SELECT  COALESCE(NULLIF(t4.referer_name,''), 'Direct')
            FROM    piwik_log_visit t4
            WHERE   inet_ntoa(conv(hex(t4.location_ip), 16, 10)) = inet_ntoa(conv(hex(t1.location_ip), 16, 10)) 
            AND     t4.visit_first_action_time >= (@theDate - INTERVAL 32 DAY)
            GROUP BY inet_ntoa(conv(hex(t4.location_ip), 16, 10))
            ORDER BY t4.visit_first_action_time
            limit 1
        ) AS 'Referrer (IP:First)',
        t1.referer_url AS 'Referrer URL'
FROM    piwik_log_visit t1, 
        piwik_log_conversion t2
WHERE   conv(hex(t1.idvisitor), 16, 10) = conv(hex(t2.idvisitor), 16, 10) 
AND     t2.idorder IS NOT NULL 
AND     t2.server_time BETWEEN '2015-11-25 07:59:59' AND '2015-11-26 08:00:00' 
AND     t1.visit_first_action_time >= (@theDate - INTERVAL 32 DAY)
GROUP BY t2.idorder

Explain plan:

http://screencast.com/t/3loUAUkTe

Answers


The main reason for the poor performance is because of the join conditions between the tables. They are not optimizable with index due to data conversion on the columns involved. Without index, the joins become Cartesian joins that are then filtered by evaluating the join expression.

Why do you need to do the conversion on t1.idvisitor and t2.idvisitor? Changing the join conditions to simply

WHERE t1.idvisitor = t2.idvisitor

will help greatly. Same with t3 and t4.


Try without conv(hex(), ,).

Set @theDate = cast('2015-11-26 08:00:00' as datetime); SELECT t2.idorder AS 'Order ID', ( SELECT COALESCE(NULLIF(t3.referer_name,''), 'Direct') FROM piwik_log_visit t3 WHERE t3.idvisitor = t2.idvisitor AND t3.visit_first_action_time >= (@theDate - INTERVAL 32 DAY) ORDER BY t3.visit_last_action_time DESC limit 1 ) AS 'Referrer (Last)', ( SELECT COALESCE(NULLIF(t4.referer_name,''), 'Direct') FROM piwik_log_visit t4 WHERE t4.location_ip = t1.location_ip AND t4.visit_first_action_time >= (@theDate - INTERVAL 32 DAY) GROUP BY t4.location_ip ORDER BY t4.visit_first_action_time limit 1 ) AS 'Referrer (IP:First)', t1.referer_url AS 'Referrer URL' FROM piwik_log_visit t1, piwik_log_conversion t2 WHERE t1.idvisitor = t2.idvisitor AND t2.idorder IS NOT NULL AND t2.server_time BETWEEN '2015-11-25 07:59:59' AND '2015-11-26 08:00:00' AND t1.visit_first_action_time >= (@theDate - INTERVAL 32 DAY) GROUP BY t2.idorder


Need Your Help

Count time intervals for events greater than threshold

matlab intervals threshold

I am stuck at something in MATLAB and would appreciate if someone can help. Here is what I would like to do. I have a velocity field data w.r.t. time (see figure attached).

How to escape single quotes in MySQL

mysql string quotes

How do I insert a value in MySQL that consist of single or double quotes. i.e