I have input data in python:
| First_Name | Last_Name | Location |
|---|---|---|
| Pennie | Moore | Santa Clara,CA,USA |
| Paul | Lapointe | Torrance,CA,USA |
| Travis | Day | San Jose,CA,USA |
| Kiva | Dale | Boise Metropolitan Area |
| Michael | Goss | Fredericksburg,VA,USA |
DB stores data in same format.
I need a query that checks if that data exist in PostgreSQL DB, and display the duplicate status (yes or no) in the adjacent column, like that
(Python array [{"First_Name": "Pennie", "Last_Name": "Moore", "Location": "Santa Clara, CA, USA"}, {"First_Name": "Kiva", "Last_Name": "Dale", "Location": "Boise Metropolitan Area"}] )
| First_Name | Last_Name | Location | Duplicates |
|---|---|---|---|
| Pennie | Moore | Santa Clara,CA,USA | yes |
| Paul | Lapointe | Torrance,CA,USA | no |
| Travis | Day | San Jose,CA,USA | no |
| Kiva | Dale | Boise Metropolitan Area | yes |
| Michael | Goss | Fredericksburg,VA,USA | no |
Input data can be represented as csv or simple list of jsons
[{"First_Name": "Pennie", "Last_Name": "Moore", "Location": "Santa Clara, CA, USA"}, {...}, ..]
Not necessary but desirable that First and Last names from list will be searched IN (not EQUALS) the same values in DB (e.g. Dan is containing in Daniel)
So far I tried concating first+last names +location, and searching by the same way of concating column in DB, but on large volumes I am receiving big delay.
Also I found that question, but the solution only helps when searching by one column, and not considering WORD CONTAINS.