Delete duplicates but keep first record

Author: Steven Neiland
Published:

Warning: This blog entry was written two or more years ago. Therefore, it may contain broken links, out-dated or misleading content, or information that is just plain wrong. Please read on with caution.

Last week I posted about getting the most recent entry in each group of records. This week I want to expand on that by showing how you can use the same technique to remove duplicate entries but saving the first entry.

Sample Data

Again our sample data:

Id Name Favorite_Color Date_Entered
1JohnRed2019-08-01
2JohnRed2019-07-01
3JohnRed2019-09-01
3JohnBlue2019-09-01
4PeterRed2020-07-01
5PeterRed2020-08-01

Desired Result

This time instead of returning the most recent entry for each group as a resultset we want to actually delete from the table any records that are duplicated (in this case have columns the same name and color entries) but keeping the oldest entry so that our database table becomes this:

Id Name Favorite_Color Date_Entered
2JohnRed2019-07-01
3JohnBlue2019-09-01
4PeterRed2020-07-01

Solution

To get the desired result we need to identify all records that are duplicated which in this case is all duplicate combinations of the name and favorite_color columns. We partition and number the rows sorting by date_entered to place the oldest one first and then use the CTE to delete those rows numbered higher than 1.

We could also use the Id column the same way since in this case it is an identity column but the Id could just as easily be a UUID so I wanted to show using an actual date column.

WITH CTE AS (
      SELECT
            *,
            ROW_NUMBER() OVER(PARTITION BY name, favorite_color ORDER BY Date_Entered ASC) AS DuplicateCount
      FROM
            myTable
)
DELETE FROM CTE WHERE DuplicateCount > 1

Reader Comments

Mark Gregory's Gravatar
Mark Gregory
Tuesday, August 11, 2020 at 4:22:28 PM Coordinated Universal Time

Just had one of those "ahhhh" moments, wondering why I have been doing this any of several ways that are more convoluted. Your solution is the obvious and clean one. Better than the 'count all and group, stuff in to temp table, delete from join temp table where count >1' kind of approach.

  • Please keep comments on-topic.
  • Please do not post unrelated questions or large chunks of code.
  • Please do not engage in flaming/abusive behaviour.
  • Comments that contain advertisments or appear to be created for the purpose of link building, will not be published.

Archives Blog Listing