Dynamic table partitioning by string identifier in Postgres

Question

I would like to partition a table in Postgres by previously unknown value. In my scenario that value would be device_id which is a string.

This is current situation:

Table 'device_data' - stores sensor data which is sent from devices, defined by DDL:

CREATE TABLE warehouse.device_data (
  id INTEGER PRIMARY KEY NOT NULL DEFAULT nextval('device_data_id_seq'::regclass),
  device_id TEXT NOT NULL,
  device_data BYTEA NOT NULL,
--   contains additional fields which are omitted for brevity
  received_at TIMESTAMP WITHOUT TIME ZONE DEFAULT now()
);

Table currently holds millions of records and queries are taking huge amount of time. Most of queries contain WHERE device_id='something' clause.

Solution I have in mind is to create table partitions for each device_id.

Is it possible in Postgres to create table partitions for each device_id?

I went through Postgres documentation and couple of examples I found but all of them use fixed boundaries to create partitions. My solution would require:

create new table partition on the fly when new device_id is first encountered
store to an existing partition if the device_id is already known and partition for that device_id already exist

I would like this to be done using table partitions as it would allow querying across multiple device_ids.

why you want partition by device_id? a simple index for device_id will do the job. Performance questions should include EXPLAIN ANALYZE and some information about table size, index, current time performance, desire time, etc. Slow is a relative term and we need a real value to compare. — Juan Carlos Oropeza
– Juan Carlos Oropeza, Commented Oct 18, 2016 at 14:38
Have a look at pg_pathman. It is a tool to simplify partition management under PostgreSQL and it specifically supports a HASH partitioning strategy. The drawback is that the number of partitions is fixed, and chosen when you initialize your partitions set. — Patrick Mevzek
– Patrick Mevzek, Commented May 22, 2018 at 16:06

klin · Accepted Answer · 2016-10-18 17:59:26Z

I like the idea of dynamic partitioning. I do not know though how it will affect the performance as I have never used it.

Change the type of id to int default 0 and manually create the sequence to avoid multiple nextval() calls on a single insert:

create table device_data (
    id int primary key default 0,
    device_id text not null,
    device_data text not null, -- changed for tests
    received_at timestamp without time zone default now()
);
create sequence device_data_seq owned by device_data.id;

Use dynamic sql in the trigger function:

create or replace function before_insert_on_device_data()
returns trigger language plpgsql as $$
begin
    execute format(
        $f$
            create table if not exists %I (
            check (device_id = %L)
            ) inherits (device_data)
        $f$, 
        concat('device_data_', new.device_id), 
        new.device_id);
    execute format(
        $f$
            insert into %I
            values (nextval('device_data_seq'), %L, %L, default)
        $f$, 
        concat('device_data_', new.device_id), 
        new.device_id, 
        new.device_data);
    return null;
end $$;

create trigger before_insert_on_device_data
    before insert on device_data
    for each row execute procedure before_insert_on_device_data();

Test:

insert into device_data (device_id, device_data) values
    ('first', 'data 1'),
    ('second', 'data 1'),
    ('first', 'data 2'),
    ('second', 'data 2');

select * from device_data_first;

 id | device_id | device_data |        received_at         
----+-----------+-------------+----------------------------
  1 | first     | data 1      | 2016-10-18 19:50:40.179955
  3 | first     | data 2      | 2016-10-18 19:50:40.179955
(2 rows)

select * from device_data_second;

 id | device_id | device_data |        received_at         
----+-----------+-------------+----------------------------
  2 | second    | data 1      | 2016-10-18 19:50:40.179955
  4 | second    | data 2      | 2016-10-18 19:50:40.179955
(2 rows)

Collectives™ on Stack Overflow

Dynamic table partitioning by string identifier in Postgres

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related