2

I have a table that has a jsonb type with an array and I'm trying to specifically hash/anonymize emails (via md5 or sha1) within that array that follow certain requirements. Using the dummy data below, I'm having a hard time trying to to target the owners array specifically for any emails not with a @google.comdomain or admin.

{
    "blah": blah,
    "owners": [
        "admin"
    ],
},
{
    "blah": blah,
    "owners": [
        "[email protected]"
    ],
},
{
    "blah": blah,
    "owners": [
        "[email protected]", "[email protected]"
    ],
},
{
    "blah": blah,
    "owners": [
        "[email protected]", "[email protected]"
    ],
}

Ideally the end result should be

{
    "blah": blah,
    "owners": [
        "admin"
    ],
},
{
    "blah": blah,
    "owners": [
        "[email protected]"
    ],
},
{
    "blah": blah,
    "owners": [
        "[email protected]", "[email protected]"
    ],
},
{
    "blah": blah,
    "owners": [
        "[email protected]", "[email protected]"
    ],
}

The logic here probably is to iterate through every single owners array in the column, but not sure how the query would specifically look like.

2 Answers 2

1

You can use a subquery with json_agg:

select jsonb_set(t.data, '{owners}', 
   (select jsonb_agg(case when v.value::text = '"admin"' or v.value::text ~ 'google\.com"' then replace(v.value::text, '"', '') else md5(substring(v.value::text from '^"([\w\.\-]+)@'))||'@'||md5(substring(v.value::text from '@([\w\.\-]+)"$')) end) 
   from jsonb_array_elements(t.data -> 'owners') v)) 
from vals t

For the purposes of demonstration, the hash used is md5.

Sign up to request clarification or add additional context in comments.

9 Comments

appreciate the input! One weird thing to note is that for some emails (let's say @example.com), it outputs the value to null. md5 works fine and it's hashing anything that's not google, but missing random other emails for whatever reason. Not fully understanding the logic here, and also for some reason, the ignored domain ends up with ["\"[email protected]\""] and wondering if we can strip the extra ` and ""`
@stoiven Please see my recent edit, I updated the solution to remove the double quotes (produced by the JSON to text conversion). Can you post an example of the emails that are failing to be hashed? It might have something to do with the regular expressions being used to parse out the name and domain from the email string.
Ah gotcha, just tried it out, and looks super good! Still a noobie at SQL, so forgive me with this. I recently edited the fiddle as well, and I think it's because the email has a .? @Ajax1234 I guess this fiddle
@stoiven Thanks for the fiddle. Yes, the . separator was the issue. I have updated both the post and the demo fiddle link.
Ah, I think it also fails, when there is a - separator too. Like ["[email protected]"] @Ajax1234 cause I got a null reference here as well. Also, for my further knowledge, any good recs for better practices or tips/tricks on SQL? I truly do appreciate all this help!
|
0

(a) Your jsonb data sample has some error inside. For this answer, I will use instead :

[{
    "blah": "blah",
    "owners": [
        "admin"
    ]},
{
    "blah": "blah",
    "owners": [
        "[email protected]"
    ]},
{
    "blah": "blah",
    "owners": [
        "[email protected]", "[email protected]"
    ]},
{
    "blah": "blah",
    "owners": [
        "[email protected]", "[email protected]"
    ]}]

(b) Making multiple updates within a jsonb data is not easy with postgres. The following solution breaks down the jsonb data structure while tracking the path information that is then used when updating the jsonb data structure with the new email values.

(c) The solution is based on the jsonb_set() standard function which can apply only one update to a given jsonb data. So the first step is to create the function jsonb_set_agg() as an aggregate based on jsonb_set and which will be able to perform multiple updates to the same jsonb data while iterating on a set of records :

CREATE OR REPLACE FUNCTION jsonb_set(x jsonb, y jsonb, p text[], z jsonb, c boolean)
RETURNS jsonb LANGUAGE sql AS
$$ SELECT jsonb_set(COALESCE(x,y), p, z, c) ; $$ ;

CREATE OR REPLACE AGGREGATE jsonb_set_agg(x jsonb,p text[], z jsonb, c boolean)
(SFUNC = jsonb_set, STYPE = jsonb) ;

(d) Finally, the following query breaks down the jsonb data (FROM clause), filters the emails which must be updated (WHERE clause), and then updates the jsonb data with the new email values, according to the path based on the ORDINALITY information :

SELECT jsonb_set_agg(j.json_data, array[(b.id - 1) :: text, 'owners', (c.id - 1) :: text], to_jsonb('insert_here_the_new_value' :: text), true)
  FROM (VALUES ('[{
    "blah": "blah",
    "owners": [
        "admin"
    ]},
{
    "blah": "blah",
    "owners": [
        "[email protected]"
    ]},
{
    "blah": "blah",
    "owners": [
        "[email protected]", "[email protected]"
    ]},
{
    "blah": "blah",
    "owners": [
        "[email protected]", "[email protected]"
    ]}]' :: jsonb)) AS j(json_data)
 CROSS JOIN LATERAL jsonb_array_elements(j.json_data) WITH ORDINALITY AS b(json_data, id)
 CROSS JOIN LATERAL jsonb_array_elements_text(b.json_data->'owners') WITH ORDINALITY AS c(json_data, id)
 WHERE c.json_data <> 'admin'
   AND NOT c.json_data ~ '@google.com$'

The result is :

[ {"blah": "blah", "owners": ["admin"]}
, {"blah": "blah", "owners": ["[email protected]"]}
, {"blah": "blah", "owners": ["insert_here_the_new_value","insert_here_the_new_value"]}
, {"blah": "blah", "owners": ["insert_here_the_new_value", "[email protected]"]}
]

all the details in dbfiddle.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.