Delete duplicates but keep first record

Published: {ts '2020-08-11 00:00:00'}
Author: Steven Neiland
Site Url: http://www.neiland.net/article/delete-duplicates-but-keep-first-record/

Last week I posted about getting the most recent entry in each group of records. This week I want to expand on that by showing how you can use the same technique to remove duplicate entries but saving the first entry.

Sample Data

Again our sample data:

Id	Name	Favorite_Color	Date_Entered
1	John	Red	2019-08-01
2	John	Red	2019-07-01
3	John	Red	2019-09-01
3	John	Blue	2019-09-01
4	Peter	Red	2020-07-01
5	Peter	Red	2020-08-01

Desired Result

This time instead of returning the most recent entry for each group as a resultset we want to actually delete from the table any records that are duplicated (in this case have columns the same name and color entries) but keeping the oldest entry so that our database table becomes this:

Id	Name	Favorite_Color	Date_Entered
2	John	Red	2019-07-01
3	John	Blue	2019-09-01
4	Peter	Red	2020-07-01

Solution

To get the desired result we need to identify all records that are duplicated which in this case is all duplicate combinations of the name and favorite_color columns. We partition and number the rows sorting by date_entered to place the oldest one first and then use the CTE to delete those rows numbered higher than 1.

We could also use the Id column the same way since in this case it is an identity column but the Id could just as easily be a UUID so I wanted to show using an actual date column.


WITH CTE AS (
	SELECT 
		*,
		ROW_NUMBER() OVER(PARTITION BY name, favorite_color ORDER BY Date_Entered ASC) AS DuplicateCount
	FROM 
		myTable
)
DELETE FROM CTE WHERE DuplicateCount > 1