0

I have the following table

ID   VALUE1  VALUE2
1    aaa     bbb
1    aaa     bbb

Sadly, my table has no primmary keys and I want to create it. But first I need to delete only the first occurrance of duplicated rows. I don't have any unique identifier, the rows are exactly equal.

I know that I can use something like this if I have an unique identifier:

DELETE FROM
    table
WHERE
    table.id < table.id

But, what is the best approach to this if I don't have a unique identifier?

My table has 500 million lines, about 100k are duplicated.

1
  • 1
    Possible duplicate: stackoverflow.com/questions/26769454/… Note: "first occurnce" is a misnomer as you have no way to identify first, and your end game is to eliminate duplicates keeping 1 record, first is somewhat irrelevant. Commented Mar 16, 2022 at 14:25

1 Answer 1

1

If I understand the question correctly, you can proceed using Row_Number():

with dataset as (select 1 as ID, 'aaa' as VALUE1, 'bbb' as VALUE2
     union all select 1, 'aaa', 'bbb'
     union all select 2, 'ccc', 'ddd'
     union all select 2, 'ccc', 'ddd')

select *, row_number() over (partition by id, value1, value2) from dataset;

This will create the following column row_number:

 id | value1 | value2 | row_number 
----+--------+--------+------------
  1 | aaa    | bbb    |          1
  1 | aaa    | bbb    |          2
  2 | ccc    | ddd    |          1
  2 | ccc    | ddd    |          2

and then you can delete * where row_number = 1;

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.