Diagonal storage model

Lists: pgsql-hackers
From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Diagonal storage model
Date: 2018-04-01 12:48:07
Message-ID: 5AC0D507.5070105@postgrespro.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi hackers,

Vertical (columnar) storage mode is most optimal for analytic and this is why it is widely used in databases oriented on OLAP, such as Vertica, HyPer,KDB,...
In Postgres we have cstore extension which is not able to provide all benefits of vertical model because of lack of support of vector operations in executor.
Situation can be changed if we will have pluggable storage API with support of vectorized execution.

But veritcal model is not so good for updates and load of data (because data is mostly imported in horizontal format).
This is why in most of the existed systems data is presentin both formats (at least for some time).

I want to announce new model, "diagonal storage" which combines benefits of both approaches.
The idea is very simple: we first store column 1 of first record, then column 2 of second record, ... and so on until we reach the last column.
After it we store second column of first record, third column of the second record,...

Profiling of TPC-H queries shows that mode of the time of query exectution (about 17%) is spent is heap_deform_tuple.
New format will allow to significantly reduce time of heap deforming, because there is just of column if the particular record in each tile.
Moreover over we can perform deforming of many tuples in parallel, which ids especially efficient at quantum computers.

Attach please find patch with first prototype implementation. It provides about 3.14 times improvement of performance at most of TPC-H queries.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
diagonal.patch.gz application/x-gzip 505 bytes

From: Дмитрий Воронин <carriingfate92(at)yandex(dot)ru>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Diagonal storage model
Date: 2018-04-01 15:11:04
Message-ID: 4716721522595464@web31o.yandex.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi, Konstantin!

Thank you for working on new pluggable storage API.

Your patch in attachment is 505 bytes and contains only diff from explain.c. Is it right?

01.04.2018, 15:48, "Konstantin Knizhnik" <k(dot)knizhnik(at)postgrespro(dot)ru>:
> Hi hackers,
>
> Vertical (columnar) storage mode is most optimal for analytic and this is why it is widely used in databases oriented on OLAP, such as Vertica, HyPer,KDB,...
> In Postgres we have cstore extension which is not able to provide all benefits of vertical model because of lack of support of vector operations in executor.
> Situation can be changed if we will have pluggable storage API with support of vectorized execution.
>
> But veritcal model is not so good for updates and load of data (because data is mostly imported in horizontal format).
> This is why in most of the existed systems data is presentin both formats (at least for some time).
>
> I want to announce new model, "diagonal storage" which combines benefits of both approaches.
> The idea is very simple: we first store column 1 of first record, then column 2 of second record, ... and so on until we reach the last column.
> After it we store second column of first record, third column of the second record,...
>
> Profiling of TPC-H queries shows that mode of the time of query exectution (about 17%) is spent is heap_deform_tuple.
> New format will allow to significantly reduce time of heap deforming, because there is just of column if the particular record in each tile.
> Moreover over we can perform deforming of many tuples in parallel, which ids especially efficient at quantum computers.
>
> Attach please find patch with first prototype implementation. It provides about 3.14 times improvement of performance at most of TPC-H queries.
>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company

-- 
Best regards, Dmitry Voronin


From: legrand legrand <legrand_legrand(at)hotmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Diagonal storage model
Date: 2018-04-01 16:43:45
Message-ID: 1522601025360-0.post@n3.nabble.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Great Idea !
thank you Konstantin

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html


From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Diagonal storage model
Date: 2018-04-01 17:13:55
Message-ID: CAPpHfdvV=Xbr3Z18psMEkwF_-SJNT0hhA+asCK6S+jrnJi5DUA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

Hi!

On Sun, Apr 1, 2018 at 3:48 PM, Konstantin Knizhnik <
k(dot)knizhnik(at)postgrespro(dot)ru> wrote:

> I want to announce new model, "diagonal storage" which combines benefits
> of both approaches.
> The idea is very simple: we first store column 1 of first record, then
> column 2 of second record, ... and so on until we reach the last column.
> After it we store second column of first record, third column of the
> second record,...
>

Sounds interesting. Could "diagonal storages" be applied twice? That is
could we apply
diagonal transformation to the result of another diagonal transformation?
I expect we
should get a "square diagonal" transformation...

Attach please find patch with first prototype implementation. It provides
> about 3.14 times improvement of performance at most of TPC-H queries.

Great, but with square diagonal transformation we should get 3.14^2 times
improvement,
which is even better!

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: David Fetter <david(at)fetter(dot)org>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Diagonal storage model
Date: 2018-04-01 21:54:20
Message-ID: 20180401215420.GA21296@fetter.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Apr 01, 2018 at 03:48:07PM +0300, Konstantin Knizhnik wrote:
> Hi hackers,
>
> Vertical (columnar) storage mode is most optimal for analytic and this is why it is widely used in databases oriented on OLAP, such as Vertica, HyPer,KDB,...
> In Postgres we have cstore extension which is not able to provide all benefits of vertical model because of lack of support of vector operations in executor.
> Situation can be changed if we will have pluggable storage API with support of vectorized execution.
>
> But veritcal model is not so good for updates and load of data (because data is mostly imported in horizontal format).
> This is why in most of the existed systems data is presentin both formats (at least for some time).
>
> I want to announce new model, "diagonal storage" which combines benefits of both approaches.
> The idea is very simple: we first store column 1 of first record, then column 2 of second record, ... and so on until we reach the last column.
> After it we store second column of first record, third column of the second record,...
>
> Profiling of TPC-H queries shows that mode of the time of query exectution (about 17%) is spent is heap_deform_tuple.
> New format will allow to significantly reduce time of heap deforming, because there is just of column if the particular record in each tile.
> Moreover over we can perform deforming of many tuples in parallel, which ids especially efficient at quantum computers.
>
> Attach please find patch with first prototype implementation. It provides about 3.14 times improvement of performance at most of TPC-H queries.

You're sure it's not 3.14159265358979323...?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Marko Tiikkaja <marko(at)joh(dot)to>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Diagonal storage model
Date: 2018-04-01 22:19:28
Message-ID: CAL9smLDKaospgrYMxBkK7rWGDBxhZEOJhmF4BPU9ZpuGmkiVeA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Apr 1, 2018 at 3:48 PM, Konstantin Knizhnik <
k(dot)knizhnik(at)postgrespro(dot)ru> wrote:

> I want to announce new model, "diagonal storage" which combines benefits
> of both approaches.
> The idea is very simple: we first store column 1 of first record, then
> column 2 of second record, ... and so on until we reach the last column.
> After it we store second column of first record, third column of the
> second record,...
>

I'm a little worried about the fact that even with this model we're still
limited to only two dimensions. That's bound to cause problems sooner or
later.

.m


From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Marko Tiikkaja <marko(at)joh(dot)to>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Diagonal storage model
Date: 2018-04-02 11:57:09
Message-ID: CAFjFpRcSkR_p6GWeHVpNPKL1kgOswv6t0FejcWvMc8xcBRDEDw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 2, 2018 at 3:49 AM, Marko Tiikkaja <marko(at)joh(dot)to> wrote:
> On Sun, Apr 1, 2018 at 3:48 PM, Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>>
>> I want to announce new model, "diagonal storage" which combines benefits
>> of both approaches.
>> The idea is very simple: we first store column 1 of first record, then
>> column 2 of second record, ... and so on until we reach the last column.
>> After it we store second column of first record, third column of the
>> second record,...
>
>
> I'm a little worried about the fact that even with this model we're still
> limited to only two dimensions. That's bound to cause problems sooner or
> later.
>

How about a 3D storage model, whose first dimension gives horizontal
view, second provides vertical or columnar view and third one provides
diagonal view. It also provides capability to add extra dimensions to
provide additional views like double diagonal view. Alas! it all
collapses since I was late to the party.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Marko Tiikkaja <marko(at)joh(dot)to>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Diagonal storage model
Date: 2018-04-02 12:21:13
Message-ID: F26CA0AB-F793-4CF5-B10B-B3AAE60EB7E6@yandex-team.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

> 2 апр. 2018 г., в 16:57, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> написал(а):
> On Mon, Apr 2, 2018 at 3:49 AM, Marko Tiikkaja <marko(at)joh(dot)to> wrote:
>>
>> I'm a little worried about the fact that even with this model we're still
>> limited to only two dimensions. That's bound to cause problems sooner or
>> later.
> How about a 3D storage model, whose first dimension gives horizontal
> view, second provides vertical or columnar view and third one provides
> diagonal view. It also provides capability to add extra dimensions to
> provide additional views like double diagonal view. Alas! it all
> collapses since I was late to the party.

BTW, MDX expression actually provides mulitidimensional result. They have COLUMNS, ROWS, PAGES, SECTIONS, CHAPTERS, and AXIS(N) for those who is not satisfied with named dimensions.

Best regards, Andrey Borodin.


From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Diagonal storage model
Date: 2018-04-03 02:49:25
Message-ID: 20180403024925.GD1621@paquier.xyz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Apr 01, 2018 at 03:48:07PM +0300, Konstantin Knizhnik wrote:
> Attach please find patch with first prototype implementation. It
> provides about 3.14 times improvement of performance at most of TPC-H
> queries.

Congratulations in finding a patch able to improve all workloads of
Postgres in such a simple and magic way, especially on this particular
date. I would have used M_PI if I were you.
--
Michael