I am suffering from performance issues when inserting a milion rows into a PostgreSQL database.
I am sending a JSON object which has an array with a milion rows.
For each row i create a record in the database table. I also tried this with multiple insert at once, but the problem remains.
I am not sure how to handle this, I have read that the COPY command is fastests.
How can I improve the performance?
My JSON Object with log as an array: array Log has a milion rows.
{"type":"monitoring","log":[
["2016-10-12T20:33:21","0.00","0.00","0.00","0.00","0.0","24.00","1.83","-0.00","1","1","-100.00"],
["2016-10-12T20:33:23","0.00","0.00","0.00","0.00","0.0","24.00","1.52","-0.61","1","1","-100.00"]]}
My current code(I am building a dynamic statement so that i can execute multiple rows at once):
IF(NOT b_first_line) THEN
s_insert_query_values = right(s_insert_query_values, -1); --remove te leading comma
EXECUTE format('INSERT INTO log_rlda
(record_node_id, log_line, log_value, timestamp, record_log_id)
VALUES %s;', s_insert_query_values);
s_insert_query_values = '';
i_num_lines_buffered = 0;
END IF;
END IF;
s_insert_query_values contains:
Each value inside the array within "log", needs to be inserted in its own row(in colum: log_value). This is how the INSERT looks like(reference to s_insert_query_values):
INSERT INTO log_rlda
(record_node_id, log_line, log_value, timestamp, record_log_id)
VALUES
(806, 1, 0.00, '2016-10-12 20:33:21', 386),
(807, 1, 0.00, '2016-10-12 20:33:21', 386),
(808, 1, 0.00, '2016-10-12 20:33:21', 386),
(809, 1, 0.00, '2016-10-12 20:33:21', 386),
(810, 1, 0.0, '2016-10-12 20:33:21', 386),
(811, 1, 24.00, '2016-10-12 20:33:21', 386),
(768, 1, 1.83, '2016-10-12 20:33:21', 386),
(769, 1, 0.00, '2016-10-12 20:33:21', 386),
(728, 1, 1, '2016-10-12 20:33:21', 386),
(771, 1, 1, '2016-10-12 20:33:21', 386),
(729, 1, -100.00, '2016-10-12 20:33:21', 386),
(806, 2, 0.00, '2016-10-12 20:33:23', 386),
(807, 2, 0.00, '2016-10-12 20:33:23', 386),
(808, 2, 0.00, '2016-10-12 20:33:23', 386),
(809, 2, 0.00, '2016-10-12 20:33:23', 386),
(810, 2, 0.0, '2016-10-12 20:33:23', 386),
(811, 2, 24.00, '2016-10-12 20:33:23', 386),
(768, 2, 1.52, '2016-10-12 20:33:23', 386),
(769, 2, -0.61, '2016-10-12 20:33:23', 386),
(728, 2, 1, '2016-10-12 20:33:23', 386),
(771, 2, 1, '2016-10-12 20:33:23', 386),
(729, 2, -100.00, '2016-10-12 20:33:23', 386)
Solution(i_node_id_list contains ID's i selected before this query):
SELECT i_node_id_list[log_value_index] AS record_node_id,
e.log_line-1 AS log_line,
items.log_value::double precision as log_value,
to_timestamp((e.line->>0)::text, 'YYYY-MM-DD HH24:MI:SS') as "timestamp",
i_log_id as record_log_id
FROM (VALUES (log_data::json)) as data (doc),
json_array_elements(doc->'log') with ordinality as e(line, log_line),
json_array_elements_text(e.line) with ordinality as items(log_value, log_value_index)
WHERE log_value_index > 1 --dont include timestamp value (shouldnt be written as log_value)
AND log_line > 1