How to detect duplicate rows in a SQL Server table?
What is the most efficient way to detect duplicates in a 10 column / 50K row table? I'm using MSSQL 8.0
To show an example of what others have been describing:
SELECT Col1, -- All of the columns you want to dedupe on Col2, -- which is not neccesarily all of the columns Col3, -- in the table Col4, Col5, Col6, Col7, Col8, Col9, Col10 FROM MyTable GROUP BY Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8, Col9, Col10 HAVING COUNT(*) > 1
You can use "group by" on all columns and then count(*)>1
Select * From Table Group By [List all fields in the Table here] Having Count(*) > 1
To detect, just group by as Guge said.
select fieldA, fieldB, count(*) from table group by fieldA, fieldB having count(*) > 1
If you want to delete dupes... pseudo....
select distinct into a temp table truncate original table select temp table back into original table
With truncate you may run into problems if you have FK constraints, so be smart about dropping constraints and making sure you don't orphan records.
In addition to the suggestions provided, I would then go to the effort of preventing duplicates in the future, rather than trying to locate them later.
This is done using unique indexes on columns (or groups of columns) that are supposed to be unique. Remember that data in the database can be modified from other locations other than through the specific app that you are working on, so it's best to define what is and isn't allowed in a table at the DB level.