RANK conversion of MS SQL to MYSQL
I am converting our project database from SQL Server to MySQL, the DB conversion has done already.
We have code as below to identify duplicate records based on hashcode and update them as duplicate.
Rank function in MySQL ([Rank function in MySQL) need rank based on age which will start with 1 and increment by 1 for each record. But for me Rank for each hascode should start from 1 and increment by 1 for same hascode, if new hascode comes Rank should start from 1.
update table set Duplicate=1 WHERE id IN ( SELECT id FROM ( select RANK() OVER (PARTITION BY Hashcode ORDER BY Date asc) R,* from table )A where R!=1 )
Below is table structure
CREATE TABLE TBL ( id int(11) NOT NULL AUTO_INCREMENT, FileName varchar(100) DEFAULT NULL, date datetime DEFAULT NULL, hashcode varchar(255) DEFAULT NULL, FileSize varchar(25) DEFAULT NULL, IsDuplicate bit(1) DEFAULT NULL, IsActive bit(1) DEFAULT NULL PRIMARY KEY (`id`) )
Please help me to migrate this code to MYSQL.
You don't need to use enumeration for this logic. You just want to set the duplicate flag on everything that is not the minimum date for the hashcode:
update table t join (select hashcode, min(date) as mindate from table t group by hashcode ) tt on t.hashcode = tt.hashcode and t.date > tt.mindate set t.Duplicate = 1;
MySQL features a rather unique way to delete duplicates:
alter ignore table YourTable add unique index ux_yourtable_hashcode (hashcode);
The trick here is in the ignore option:
If IGNORE is specified, only one row is used of rows with duplicates on a unique key. The other conflicting rows are deleted.
But there are also other ways. Based on your comment, there is an auto_increment column called id. Since this column is unique and not null, you can use it to distinguish duplicates. You'd need a temporary table to work around the cant specify target table TBL for update in FROM clause error:
create temporary table tmp_originals (id int); insert tmp_originals (id) select min(id) from YourTable group by hashcode; update YourTable set Duplicate = 1 where id not in (select id from tmp_originals);
The group by query selects the lowest id per group of rows with the same hashcode.