1

I have a table with line numbers and either a "define" or an "undefine" event of an identifier. Example:

line_no | def | undef
--------------------
1       | 'a' | NULL
2       | 'b' | NULL
3       | NULL| 'a'
... 
42      | NULL| 'b'

I want to compute the "live variables" information on every line. Iteratively, I'd just write code like the following pseudocode:

live = [] 
for each r in table:
    if r.def:
        live.append(r.def)
    else:
        live.remove(r.undef)
    store(r.line_no, live) 

The expected result is a table like :

line_no | live
1.      | ['a']
2.      | ['a', 'b'] 
3.      | ['b'] 
... 
42.     | [] 

I managed to write the equivalent sequential loop as a plpgsql function. However, I was wondering if there is a (maybe more elegant) way as a SQL query? I tried various things using lag() but somehow that never resulted in this "reduction" operation I am looking for?

(bonus points if the query can also partition over a field, such as 'filename')

3
  • You can do anything in SQL, but that's very clearly a procedural requirement that is probably performed more easily and efficiently in procedural code, for example a function. Commented May 20, 2022 at 11:57
  • What happens if a value appears twice in the def column? Commented May 20, 2022 at 12:27
  • In this table an identifier is guaranteed to be undefed before a def again. Would have been an error before (double definition) Commented May 20, 2022 at 12:33

2 Answers 2

1

If you want a pure SQL solution, use a recursive query.

with recursive cte as (
    select line_no, array[def] as list
    from my_table
    where line_no = 1
union all
    select 
        t.line_no, 
        case when def is null 
            then array_remove(list, undef)
            else array_append(list, def)
        end
    from my_table t
    join cte c on c.line_no = t.line_no- 1
)
select *
from cte;

However, a more efficient and flexible solution in this case may be creating a function.

create or replace function list_of_vars()
returns table (line_no int, list text[])
language plpgsql as $$
declare
    arr text[];
    rec record;
begin
    for rec in
        select *
        from my_table
        order by line_no
    loop
        if rec.def is null then
            arr := array_remove(arr, rec.undef);
        else
            arr := array_append(arr, rec.def);
        end if;
        line_no := rec.line_no;
        list := arr;
        return next;
    end loop;
end
$$;

Use:

select *
from list_of_vars()
order by line_no

Test it in db<>fiddle.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, I decided to go the function route and more or less used what you provided (I added some partitioning). It's linear performance but given the nature of the problem that's just how it is :) Thanks again for your help!
Yes, I should have added that the recursive query requires a sequence of line numbers without gaps. Fortunately, reliable a_horse_with_no_name did it.
1

One way this can be achieved is to use a recursive query which is in the end pretty similar to the iteration you are thinking of.

with recursive lines as (
  select line_no, 
         (case when def is not null then jsonb_build_array(def) else '[]'::jsonb end) - coalesce(undef, '') live
  from the_table
  where line_no = 1
  union all
  select c.line_no, 
         (p.live || case when def is not null then jsonb_build_array(c.def) else '[]'::jsonb end) - coalesce(c.undef, '')
  from the_table c
    join lines p on c.line_no - 1 = p.line_no
)
select *
from lines;

jsonb_build_array will happily include a NULL value, that's why we need the somewhat convoluted CASE expression to turn a single value into an array (with one or zero elements).

The - operator for jsonb removes the element on the right hand side from the array. However an null value on the right hand side, will remove all elements, hence the coalesce()

This requires the values for line_no to not have any gaps (and start with 1). If that is not a case, another step would be required to generate a gap-less number for each row (which would make this even slower)

Online example

I doubt if that is actually faster then a "proper" procedural solution.

1 Comment

Thanks, especially for the explanations. Especially about the explanation that line_no must not have any gaps for the recursive query to work, this simple fact suddenly made it click for me. I accepted the other answer as the solution, as I ended up going the function route, partially because of the line_no issue, which can have gaps for me and makes the solution much less concise. But your solution surely taught me more :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.