Postgresql sequential add/remove reduction operation

Question

I have a table with line numbers and either a "define" or an "undefine" event of an identifier. Example:

line_no | def | undef
--------------------
1       | 'a' | NULL
2       | 'b' | NULL
3       | NULL| 'a'
... 
42      | NULL| 'b'

I want to compute the "live variables" information on every line. Iteratively, I'd just write code like the following pseudocode:

live = [] 
for each r in table:
    if r.def:
        live.append(r.def)
    else:
        live.remove(r.undef)
    store(r.line_no, live)

The expected result is a table like :

line_no | live
1.      | ['a']
2.      | ['a', 'b'] 
3.      | ['b'] 
... 
42.     | []

I managed to write the equivalent sequential loop as a plpgsql function. However, I was wondering if there is a (maybe more elegant) way as a SQL query? I tried various things using lag() but somehow that never resulted in this "reduction" operation I am looking for?

(bonus points if the query can also partition over a field, such as 'filename')

You can do anything in SQL, but that's very clearly a procedural requirement that is probably performed more easily and efficiently in procedural code, for example a function. — Laurenz Albe
– Laurenz Albe, Commented May 20, 2022 at 11:57
In this table an identifier is guaranteed to be undefed before a def again. Would have been an error before (double definition) — Christoph
– Christoph, Commented May 20, 2022 at 12:33

klin · Accepted Answer · 2022-05-20 12:43:53Z

1

If you want a pure SQL solution, use a recursive query.

with recursive cte as (
    select line_no, array[def] as list
    from my_table
    where line_no = 1
union all
    select 
        t.line_no, 
        case when def is null 
            then array_remove(list, undef)
            else array_append(list, def)
        end
    from my_table t
    join cte c on c.line_no = t.line_no- 1
)
select *
from cte;

However, a more efficient and flexible solution in this case may be creating a function.

create or replace function list_of_vars()
returns table (line_no int, list text[])
language plpgsql as $$
declare
    arr text[];
    rec record;
begin
    for rec in
        select *
        from my_table
        order by line_no
    loop
        if rec.def is null then
            arr := array_remove(arr, rec.undef);
        else
            arr := array_append(arr, rec.def);
        end if;
        line_no := rec.line_no;
        list := arr;
        return next;
    end loop;
end
$$;

Use:

select *
from list_of_vars()
order by line_no

Test it in db<>fiddle.

answered May 20, 2022 at 12:43

klin

123k15 gold badges241 silver badges263 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Christoph Over a year ago

Thanks, I decided to go the function route and more or less used what you provided (I added some partitioning). It's linear performance but given the nature of the problem that's just how it is :) Thanks again for your help!

klin Over a year ago

Yes, I should have added that the recursive query requires a sequence of line numbers without gaps. Fortunately, reliable a_horse_with_no_name did it.

user330315 · Accepted Answer · 2022-05-20 12:44:03Z

1

One way this can be achieved is to use a recursive query which is in the end pretty similar to the iteration you are thinking of.

with recursive lines as (
  select line_no, 
         (case when def is not null then jsonb_build_array(def) else '[]'::jsonb end) - coalesce(undef, '') live
  from the_table
  where line_no = 1
  union all
  select c.line_no, 
         (p.live || case when def is not null then jsonb_build_array(c.def) else '[]'::jsonb end) - coalesce(c.undef, '')
  from the_table c
    join lines p on c.line_no - 1 = p.line_no
)
select *
from lines;

jsonb_build_array will happily include a NULL value, that's why we need the somewhat convoluted CASE expression to turn a single value into an array (with one or zero elements).

The - operator for jsonb removes the element on the right hand side from the array. However an null value on the right hand side, will remove all elements, hence the coalesce()

This requires the values for line_no to not have any gaps (and start with 1). If that is not a case, another step would be required to generate a gap-less number for each row (which would make this even slower)

Online example

I doubt if that is actually faster then a "proper" procedural solution.

answered May 20, 2022 at 12:44

user330315

1 Comment

Christoph Over a year ago

Thanks, especially for the explanations. Especially about the explanation that line_no must not have any gaps for the recursive query to work, this simple fact suddenly made it click for me. I accepted the other answer as the solution, as I ended up going the function route, partially because of the line_no issue, which can have gaps for me and makes the solution much less concise. But your solution surely taught me more :)

Collectives™ on Stack Overflow

Postgresql sequential add/remove reduction operation

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related