Delete first occurrence duplicate row in postgres

Question

I have the following table

ID   VALUE1  VALUE2
1    aaa     bbb
1    aaa     bbb

Sadly, my table has no primmary keys and I want to create it. But first I need to delete only the first occurrance of duplicated rows. I don't have any unique identifier, the rows are exactly equal.

I know that I can use something like this if I have an unique identifier:

DELETE FROM
    table
WHERE
    table.id < table.id

But, what is the best approach to this if I don't have a unique identifier?

My table has 500 million lines, about 100k are duplicated.

Possible duplicate: stackoverflow.com/questions/26769454/… Note: "first occurnce" is a misnomer as you have no way to identify first, and your end game is to eliminate duplicates keeping 1 record, first is somewhat irrelevant. — xQbert
– xQbert, Commented Mar 16, 2022 at 14:25

romborimba · Accepted Answer · 2022-03-16 14:30:05Z

1

If I understand the question correctly, you can proceed using Row_Number():

with dataset as (select 1 as ID, 'aaa' as VALUE1, 'bbb' as VALUE2
     union all select 1, 'aaa', 'bbb'
     union all select 2, 'ccc', 'ddd'
     union all select 2, 'ccc', 'ddd')

select *, row_number() over (partition by id, value1, value2) from dataset;

This will create the following column row_number:

 id | value1 | value2 | row_number 
----+--------+--------+------------
  1 | aaa    | bbb    |          1
  1 | aaa    | bbb    |          2
  2 | ccc    | ddd    |          1
  2 | ccc    | ddd    |          2

and then you can delete * where row_number = 1;

answered Mar 16, 2022 at 14:30

romborimba

2432 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Delete first occurrence duplicate row in postgres

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related