Reason - We made a mistake, and allowed to insert duplicate values.
Now we need to remove/change the duplicates from the Database permanently. So, we need to find them first.
Duplicate
records can create problems sometimes when displaying reports or performing a
Multiple Insert update. Finding duplicate records in a database needs further
investigation. In some cases, duplicate records are positive, but it all
depends on the data and the database design as well.
Identify Duplicate Rows:
There are
various times when we need to find duplicate records in SQL Server. It is
possible to find duplicates using DISTINCT,
ROW NUMBER as well as the GROUP BY approach.
Using the DISTINCT approach, we can quickly get unique rows in a table. However,
DISTINCT does not show how many times a row has been duplicated. Using the
GROUP BY approach, we can find this.
Adding grouping
and a counting field to our display of fields combination shows how many times
each record appears.
SQL> SELECT * From DupTest;
ID NAME
---------- --------------------
101 Ravi
101 Ravi
102 Rajan
103 Karan
103 Karan
103 Karan
104 Pritesh
106 Subho
105 Manu
105 Manu
10 rows selected.
SQL> SELECT ID, Count(*) Occurrences
2 From DupTest
3 Group By ID;
ID OCCURRENCES
---------- -----------
102 1
101 2
104 1
105 2
103 3
106 1
6 rows selected.
GROUP BY will
show just one record for each combination of Selected records. The
count(Occurrences) shows how many times the row has been duplicated. Count = 1
indicates that row appears only once.
Let us filter
out using the Having clause to exclude rows that appear only once(Display only Duplicate rows).
SQL> SELECT ID, Count(*) Occurrences
2 From DupTest
3 Group By ID
4 Having Count(*)>1;
ID OCCURRENCES
---------- -----------
101 2
105 2
103 3
SQL> SELECT * From DupTest
2 Where ID IN(SELECT ID From DupTest Group By ID Having Count(*)>1);
ID NAME
---------- --------------------
101 Ravi
101 Ravi
105 Manu
105 Manu
103 Karan
103 Karan
103 Karan
7 rows selected.
Delete Duplicate Rows:
Let’s remove duplicate rows from table.
Following ways can be used to delete duplicate records from table. Here assume that ID
column should be unique after deleting. Now we see how to delete the duplicate
records from the DupTest table in different ways.
Method 1: Using
Rowid With Group By
The following Delete
statement deletes the rows using the Rowid.
SQL> SELECT * From DupTest;
ID NAME
---------- --------------------
101 Ravi
101 Ravi
102 Rajan
103 Karan
103 Karan
103 Karan
104 Pritesh
106 Subho
105 Manu
105 Manu
10 rows selected.
SQL> DELETE From DupTest
2 Where ROWID NOT IN
3 (SELECT MAX(ROWID) From DupTest Group By ID);
4 rows deleted.
SQL> SELECT * From DupTest;
ID NAME
---------- --------------------
101 Ravi
102 Rajan
103 Karan
104 Pritesh
106 Subho
105 Manu
6 rows selected.
Method 2: Using
Rowid With Self-Join
The following Delete
statement deletes the rows using the Rowid with Self join. This is similar to
the first method except we need not to group the records.
SQL> SELECT * From DupTest;
ID NAME
---------- --------------------
101 Ravi
101 Ravi
102 Rajan
103 Karan
103 Karan
103 Karan
104 Pritesh
106 Subho
105 Manu
105 Manu
10 rows selected.
SQL> DELETE From DupTest T1
2 Where ROWID NOT IN
3 (SELECT MAX(ROWID) From DupTest T2 Where T1.ID=T2.ID);
4 rows deleted.
SQL> SELECT * From DupTest;
ID NAME
---------- --------------------
101 Ravi
102 Rajan
103 Karan
104 Pritesh
106 Subho
105 Manu
6 rows selected.
Method 3: Using
Row_Number()
The following Delete statement deletes the rows using Analytical Function. The Row_Number analytic function is used to rank
the rows. Here we use the Row_Number function to rank the rows for each group
of ID and then select only record from the group. SQL> SELECT * From DupTest;
ID NAME
---------- --------------------
101 Ravi
101 Ravi
102 Rajan
103 Karan
103 Karan
103 Karan
104 Pritesh
106 Subho
105 Manu
105 Manu
10 rows selected.
SQL> DELETE From DupTest Where ROWID IN
2 (
3 SELECT Rid From
4 (SELECT ROWID Rid,
5 ROW_NUMBER() OVER(PARTITION BY ID ORDER BY ID) Rn
6 From DupTest
7 )
8 Where Rn>1
9 );
4 rows deleted.
SQL> SELECT * From DupTest;
ID NAME
---------- --------------------
101 Ravi
102 Rajan
103 Karan
104 Pritesh
106 Subho
105 Manu
6 rows selected.
Method 4: Using
Dense_Rank()
The following Delete
statement deletes the rows using Analytical Function. DENSE_RANK() Returns the rank of rows within the partition of
a result set, without any gaps in the ranking.
SQL> SELECT * From DupTest;
ID NAME
---------- --------------------
101 Ravi
101 Ravi
102 Rajan
103 Karan
103 Karan
103 Karan
104 Pritesh
106 Subho
105 Manu
105 Manu
10 rows selected.
SQL> DELETE From DupTest Where ROWID IN
2 (
3 SELECT Rid From
4 (SELECT ROWID Rid,
5 DENSE_RANK() OVER(PARTITION BY ID ORDER BY ROWID) Rn
6 From DupTest
7 )
8 Where Rn>1
9 );
4 rows deleted.
SQL> SELECT * From DupTest;
ID NAME
---------- --------------------
101 Ravi
102 Rajan
103 Karan
104 Pritesh
106 Subho
105 Manu
6 rows selected.
Here you can use both Rank() and Dense_Rank() since both will give unique records when order by Rowid.
You can any of the above the above method to delete the Duplicate records as per your requirement and data volume.
Get involved
and leave your Comments in the Box Below. The more people get involved, the
more we all benefit.
So, leave your thoughts before you leave the page.
So, leave your thoughts before you leave the page.
Nice Blog, thaanks For Sharing this informative article.
ReplyDeleteOracle Fusion SCM Online Training
Oracle Fusion Financials Online Training
Workday HCM Online Training
Thanks sharing nice article
ReplyDeleteSSIS training
SSRS training
tableau training
Teradata training
how to identify duplicate records in self join
ReplyDelete