I have a piece of code that works very well in every way but one. It takes too long to execute. My suspicion is that the long execution time is caused by an overuse of JOINs (11 to be precise). Here is what I am trying to do:
SETUP I have 3 tables:
- v_na_measurement_status
id. start_measurement. stop_measurement.
0 2020-09-01 22:19:48.480668+00:00 2020-09-01 22:29:29.713952+00:00
1 2020-09-01 22:17:10.392089+00:00 2020-09-01 22:18:08.139668+00:00
2 2020-09-01 22:12:49.795404+00:00 2020-09-01 22:15:14.564597+00:00
...
- sensor_double_precision
id. sensor_name. timestamps. value_cal
40548 "curved_mirror_steps" "2020-09-01 22:29:23.526468+00" 432131
40547 "na_average_enable". "2020-09-01 22:29:23.410416+00" 1
40546 "na_averages" "2020-09-01 22:29:23.295404+00". 16
40545 "na_power" "2020-09-01 22:29:23.174255+00" -5
40544 "na_stop_freq" "2020-09-01 22:29:23.05868+00" 18000000000
40543 "na_start_freq" "2020-09-01 22:29:22.944205+00" 15000000000
...
- sensor_double_precision_arr
id. sensor_name. timestamp value_cal
3831 "na_s11_iq_data_trace2" "2020-09-01 22:29:27.456345+00". [array with ~2000 points]
3830 "na_s21_iq_data" "2020-09-01 22:29:27.389617+00". [array with ~2000 points]
3829 "na_s11_iq_data_trace2" "2020-09-01 22:29:20.543466+00". [array with ~2000 points]
3828 "na_s21_iq_data" "2020-09-01 22:29:20.443416+00". [array with ~2000 points]
GOAL Using these 3 tables I want to create a VIEW called v_na_log that looks like this:
start_measurement stop_measurement curved_mirror_steps. na_averages na_start_freq na_stop_freq. na_s21_iq_data. na_s11_iq_data
"2020-09-01 22:29:22.913366+00" "2020-09-01 22:29:27.478287+00" 432131 16 15000000000. 18000000000 [array with ~2000 points]. [array with ~2000 points]
...
...
Basically, I want to transpose sensor_double_precision_arr and sensor_double_precision such that the rows in the sensor_name column become columns themselves.
ONE SOLUTION This is the long code I use to achieve this:
DROP VIEW IF EXISTS v_na_log;
CREATE VIEW v_na_log AS
SELECT
final_view.id, vnms.start_measurement,
vnms.stop_measurement, final_view.curved_mirror_steps,
final_view.na_averages, final_view.na_start_freq,
final_view.na_stop_freq,iq_data.freq_resolution,
iq_data.na_s21_iq_data, iq_data.na_s11_iq_data
FROM crosstab(
'WITH sorted_data AS (WITH current_data AS
(SELECT DISTINCT ON (id,sensor_name)
id, sensor_name, value_cal, timestamp
FROM sensor_double_precision
WHERE sensor_name in (''na_start_freq'', ''na_stop_freq'',
''na_averages'',''curved_mirror_steps'')
ORDER BY id, sensor_name, timestamp ASC)
SELECT
m.id,
s.sensor_name,
s.value_cal
FROM v_na_measurement_status m
INNER JOIN current_data s ON s.timestamp
BETWEEN m.start_measurement AND m.stop_measurement),
log_ids AS (SELECT distinct id FROM sorted_data),
sensor_names AS (SELECT distinct sensor_name FROM sorted_data)
SELECT log_ids.id, sensor_names.sensor_name, sorted_data.value_cal
FROM log_ids CROSS JOIN sensor_names
LEFT JOIN sorted_data on (log_ids.id= sorted_data.id and
sensor_names.sensor_name=sorted_data.sensor_name)')
final_view(id bigint, curved_mirror_steps double precision,
na_averages double precision, na_start_freq double precision,
na_stop_freq double precision)
LEFT JOIN v_na_measurement_status vnms on vnms.id = final_view.id
LEFT JOIN
(SELECT final_view.id, vnms.start_measurement, vnms.stop_measurement,
array_length(final_view.na_s21_iq_data, 1) AS freq_resolution,
final_view.na_s11_iq_data, final_view.na_s21_iq_data
FROM crosstab(
'WITH sorted_data AS (with current_data AS
(SELECT DISTINCT ON (id,sensor_name)
id, sensor_name, value_cal, timestamp
FROM sensor_double_precision_arr
WHERE sensor_name LIKE ''%na_s21_iq_data%'' OR sensor_name LIKE ''%na_s11_iq_data%''
ORDER BY id, sensor_name, timestamp ASC)
SELECT
m.id,
s.sensor_name,
s.value_cal
FROM v_na_measurement_status m
INNER JOIN current_data s ON s.timestamp
BETWEEN m.start_measurement AND m.stop_measurement),
log_ids AS (SELECT distinct id FROM sorted_data),
sensor_names AS (SELECT distinct sensor_name FROM sorted_data)
SELECT log_ids.id, sensor_names.sensor_name, sorted_data.value_cal
FROM log_ids CROSS JOIN sensor_names
LEFT JOIN sorted_data ON (log_ids.id= sorted_data.id AND sensor_names.sensor_name = sorted_data.sensor_name)')
final_view(id bigint, na_s11_iq_data double precision[], na_s21_iq_data double precision[])
LEFT JOIN v_na_measurement_status vnms ON vnms.id = final_view.id) iq_data ON iq_data.id = final_view.id
ORDER BY vnms.start_measurement ASC;
PROBLEM
I use this code to do it. It works very well and so far I haven't run into issues even on edge cases. But, it takes an annoyingly long time to run.
For example, if I run:
SELECT * FROM v_na_log LIMIT 10 It takes about 13 seconds to execute. It takes even longer if I remove the LIMIT clause. I usually have to deal with more than 10 rows so the longer it takes, the worse it is for me to analyze the data.
Like I said I think this has to with the use of so many JOINs. However, I do not see a better way of doing this. I want to know if there is a better solution to this because I have a suspicion that this problem will keep getting worse and worse as the tables get larger.
I have posted about simplified versions other similar tasks and have received invaluable suggestions. I have incorporated those suggestions in my code. Here's the link for a question I had that relates to this: Crosstab using data from 2 different tables
Any suggestions on a better organization of the tables in the database itself are welcome as well.
sensor_namecolumns a handful of values you know in advance? If not, why do you need such a wide table of potentially hundreds or thousands of columns? Are you analyzing with application layer (i.e., Python, R)?sensor_namesthat I know in advance. There are two purposes of this VIEW. The main reason is that I want to create plots in a web app called Grafana and this VIEW will make that possible. The secondary reason (not as important) is to use fewer calls to the DB using python while doing data analysislimit 10clause). I am not sure why the second Crosstab takes so much longer given its doing the exact same thing but with different tables. my guess is that the large arrays might have something to do with it but I do not know why.