Check python array (or csv) for duplicates in PostgreSQL, display the status in the adjacent column

Question

I have input data in python:

First_Name	Last_Name	Location
Pennie	Moore	Santa Clara,CA,USA
Paul	Lapointe	Torrance,CA,USA
Travis	Day	San Jose,CA,USA
Kiva	Dale	Boise Metropolitan Area
Michael	Goss	Fredericksburg,VA,USA

DB stores data in same format.

I need a query that checks if that data exist in PostgreSQL DB, and display the duplicate status (yes or no) in the adjacent column, like that

(Python array [{"First_Name": "Pennie", "Last_Name": "Moore", "Location": "Santa Clara, CA, USA"}, {"First_Name": "Kiva", "Last_Name": "Dale", "Location": "Boise Metropolitan Area"}] )

First_Name	Last_Name	Location	Duplicates
Pennie	Moore	Santa Clara,CA,USA	yes
Paul	Lapointe	Torrance,CA,USA	no
Travis	Day	San Jose,CA,USA	no
Kiva	Dale	Boise Metropolitan Area	yes
Michael	Goss	Fredericksburg,VA,USA	no

Input data can be represented as csv or simple list of jsons [{"First_Name": "Pennie", "Last_Name": "Moore", "Location": "Santa Clara, CA, USA"}, {...}, ..]

Not necessary but desirable that First and Last names from list will be searched IN (not EQUALS) the same values in DB (e.g. Dan is containing in Daniel)

So far I tried concating first+last names +location, and searching by the same way of concating column in DB, but on large volumes I am receiving big delay.

Also I found that question, but the solution only helps when searching by one column, and not considering WORD CONTAINS.

Stefanov.sm · Accepted Answer · 2021-01-31 17:33:23Z

3

After you insert the data then you may check for duplicates.
To do this group by (first_name, last_name, location) as a record value.

select first_name,
       last_name,
       location, 
       case when count(1) > 1 then 'yes' else 'no' end as duplicates
  from _table
 group by (first_name, last_name, location);

edited Jan 31, 2021 at 17:33

answered Jan 31, 2021 at 16:55

Stefanov.sm

13.3k2 gold badges25 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

vokaplok Over a year ago

Thank you! I do not need to insert the data to a table, I only need to check if the data from python array (list) exist in DB table.

sandthorn Over a year ago

In that case, you may learn some pandas basics and functionality. => thispointer.com/…

Collectives™ on Stack Overflow

Check python array (or csv) for duplicates in PostgreSQL, display the status in the adjacent column

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related