We are using Postgres jsonb type in one of our database tables. Table structure is shown as below:
CREATE TABLE IF NOT EXISTS public.draft_document (
id bigserial NOT NULL PRIMARY KEY,
...
document jsonb NOT NULL,
ein_search character varying(11) NOT NULL
);
CREATE INDEX IF NOT EXISTS count_draft_document_idx ON public.draft_document USING btree (ein_search);
CREATE INDEX IF NOT EXISTS read_draft_document_idx ON public.draft_document USING btree (id, ein_search);
The json structure of document column may vary. Below is one example of a possible schema for document:
"withholdingCredit": {
"type": "array",
"items": {
"$ref": "#/definitions/withholding"
}
}
Where the withholding structure (array elements) respects:
"withholding": {
"properties": {
...
"proportionalityIndicator": {
"type": "boolean"
},
"tribute": {
"$ref": "#/definitions/tribute"
},
"payingSourceEin": {
"type": "string"
},
"value": {
"type": "number"
}
...
}
...
},
"tribute": {
"type": "object",
"properties": {
"code": {
"type": "number"
},
"additionalCode": {
"type": "number"
}
...
}
}
Here is an example of the json into document jsonb column:
{
"withholdingCredit":[
{
"value": 15000,
"tribute":{
"code": 1216,
"additionalCode": 2
},
"payingSourceEin": "03985506123132",
"proportionalityIndicator": false
},
...
{
"value": 98150,
"tribute":{
"code": 3155,
"additionalCode": 1
},
"payingSourceEin": "04185506123163",
"proportionalityIndicator": false
}
]
}
The maximum number of elements in the array can vary up to a maximum limit of 100.000 (one hundred thousand) elements. It is a business limit.
We need a paged select query that returns the withholding array disaggregated (1 element per row), where each row also brings the sum of the withholding elements value and the array length.
The query also needs to return the withholdings ordered by proportionalityIndicator, tribute-->code, tribute-->additionalCode, payingSourceEin. Something like:
| id | sum | jsonb_array_length | jsonb_array_elements |
|---|---|---|---|
| 30900 | 1.800.027 | 2300 | {"value":15000,"tribute":{"code":1216,...}, ...} |
| ... | ... | ... | { ... } |
| 30900 | 1.800.027 | 2300 | {"value":98150,"tribute":{"code":3155,...}, ...} |
We have defined the following query:
SELECT dft.id,
SUM((elem->>'value')::NUMERIC),
jsonb_array_length(dft.document->'withholdingCredit'),
jsonb_array_elements(jsonb_agg(elem
ORDER BY
elem->>'proportionalityIndicator',
(elem->'tribute'->>'code')::NUMERIC,
(elem->'tribute'->>'additionalCode')::NUMERIC,
elem->>'payingSourceEin'))
FROM
draft_document dft
CROSS JOIN LATERAL jsonb_array_elements(dft.document->'withholdingCredit') arr(elem)
WHERE (dft.document->'withholdingCredit') IS NOT NULL
AND dft.id = :id
AND dft.ein_search = :ein_search
GROUP BY dft.id
LIMIT :limit OFFSET :offset;
This query works, but with performance limitation when we have a large number of elements into the jsonb array. Any suggestion on how to improve it is welcome.
BTW, we are using Postgres 9.6.
documentdue to the need for its structure to be flexible.