0

I want to generate a series of numbers based on input values in fields in a table (check the image). I found a solution for this in Postgres but unfortunately, it seems that Redshift doesn't support the function generate_series(). I added some sample data and good to know is that the data set I am working with consists of large numbers (~15 digits). Can you see any alternative solutions for this?

enter image description here

Sample data: http://sqlfiddle.com/#!15/8af70/4

1 Answer 1

0

I'd look at recursive CTEs so solve this. The code you have will be slow since you are generating a series for every row of the input. I'd make the series once and join as appropriate for each row of test. Still not going to be fast if the number ranges (amount of new data being created) is large.

WITH RECURSIVE nums(n)as
( select min(nr_start) as n 
  from test
  union all 
  select n + 1 from nums n
  where n < (select max(nr_end) from test) )
SELECT t.id, n as nr 
FROM test t
JOIN nums n
on n between t.nr_start and t.nr_end;

I updated the fiddle as well - http://sqlfiddle.com/#!15/8af70/10

Sign up to request clarification or add additional context in comments.

5 Comments

I tested this solution and it works fine on the test data set. However, the data series are large (can be >100.000 numbers in one range), and unfortunately making this solution not applicable.
As expected. The issue is 'why are you exploding your data?'. Redshift and data warehouses in general are made to distill large amounts of data. Expanding the data rows of already large tables works against the design of these databases and is likely filling up your disks (spill) when it runs - check the Redshift console. You need a different approach. I've worked with clients on similarly exploding queries and wrote up a white paper on one such effort. It might give you some ideas on how to get your results through a much faster process - wadevelopment.co/sql_limits_wp.html
Have you had any success in finding a solution?
Hi, no unfortunately I haven't. I think it still is a good solution to generate these series as each 'nr' actually is an id with a set of features that can vary, based on the id.
Generating these numbers is needed, yes. It is the loop join that is losing all the performance. If you want to describe the process, post generating the series, that you are using I can see if theres a way to apply this UNION-WINDOW approach that can help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.