Print Email

Author: Steven Neiland
Published: 11-Aug-2020

Warning: This blog entry was written two or more years ago. Therefore, it may contain broken links, out-dated or misleading content, or information that is just plain wrong. Please read on with caution.

Last week I posted about getting the most recent entry in each group of records. This week I want to expand on that by showing how you can use the same technique to remove duplicate entries but saving the first entry.

Sample Data

Again our sample data:

Id	Name	Favorite_Color	Date_Entered
1	John	Red	2019-08-01
2	John	Red	2019-07-01
3	John	Red	2019-09-01
3	John	Blue	2019-09-01
4	Peter	Red	2020-07-01
5	Peter	Red	2020-08-01

Desired Result

This time instead of returning the most recent entry for each group as a resultset we want to actually delete from the table any records that are duplicated (in this case have columns the same name and color entries) but keeping the oldest entry so that our database table becomes this:

Id	Name	Favorite_Color	Date_Entered
2	John	Red	2019-07-01
3	John	Blue	2019-09-01
4	Peter	Red	2020-07-01

Solution

To get the desired result we need to identify all records that are duplicated which in this case is all duplicate combinations of the name and favorite_color columns. We partition and number the rows sorting by date_entered to place the oldest one first and then use the CTE to delete those rows numbered higher than 1.

We could also use the Id column the same way since in this case it is an identity column but the Id could just as easily be a UUID so I wanted to show using an actual date column.

Download Code Snippet

WITH CTE AS (
      SELECT 
            *,
            ROW_NUMBER() OVER(PARTITION BY name, favorite_color ORDER BY Date_Entered ASC) AS DuplicateCount
      FROM 
            myTable
)
DELETE FROM CTE WHERE DuplicateCount > 1

« Previous Blog

Next Blog »

Reader Comments

Mark Gregory
Tuesday, August 11, 2020 at 4:22:28 PM Coordinated Universal Time

Just had one of those "ahhhh" moments, wondering why I have been doing this any of several ways that are more convoluted. Your solution is the obvious and clean one. Better than the 'count all and group, stuff in to temp table, delete from join temp table where count >1' kind of approach.

Delete duplicates but keep first record

Sample Data

Desired Result

Solution

Reader Comments

Archives Blog Listing

Tag Listing