Postgres Partition by Character Prefix

Question

Good Day,

I would like to check what the best way is to partition a Postgres table on a columns prefix. I have a large table (+-~~300~~ 750 Million rows x 10 columns) and I would like to partition it on a prefix of column 1. Data looks like:

ABCDEF1xxxxxxxx
ABCDEF1xxxxxxxy
ABCDEF1xxxxxxxz
ABCDEF2xxxxxxxx
ABCDEF2xxxxxxxy
ABCDEF2xxxxxxxz
ABCDEF3xxxxxxxx
ABCDEF3xxxxxxxz
ABCDEF4xxxxxxxx
ABCDEF4xxxxxxxy

Their will only ever by 10 partitions i.e. ABCDEF0...->ABCDEF9...

What I've currently done is make tables like:

CREATE TABLE public.mydata_ABCDEF1 (
CHECK ( col1 like 'ABCDEF1%' )
) INHERITS (public.mydata);

CREATE TABLE public.mydata_ABCDEF2 (
CHECK ( col1 like 'ABCDEF2%' )
) INHERITS (public.mydata);

etc. Then the trigger with similar logic:

IF ( NEW.col1 like 'ABCDEF1%' ) THEN 
    INSERT INTO public.mydata_ABCDEF1 VALUES (NEW.*);
ELSIF ( NEW.imsi like 'ABCDEF2%' ) THEN
    INSERT INTO public.simdata_ABCDEF2 VALUES (NEW.*);

I'm concerned if partitioning in this way will speed up query time? or if I should consider partitioning on substr (not sure how), or if I should make a new column with the prefix and partition on that column?

Any advise is appreciated.

akvallejos · Accepted Answer · 2018-04-26 15:00:05Z

I know this is an old question, but I am adding this answer in case anyone else needs a solution.

Postgres 10 allows range partitioning https://www.postgresql.org/docs/10/static/ddl-partitioning.html.

While the examples in the docs use date ranges, you can also use string ranges since Postgres (mostly) uses ASCII ordering. The below code creates a parent table and then two child tables, which depending on your specific codes, should automatically bin any alphanumeric based on the prefixes provided. The ranges do have to be non-overlapping, which is why I simply cannot create a range from ABCDEF1 to ABCDEF2.

CREATE TABLE mydata (...) PARTITION BY RANGE (col1);
CREATE TABLE mydata_abcdef1 PARTITION OF mydata 
  FOR VALUES FROM ('ACBCDEF1') to ('ABCDEF1z');
CREATE TABLE mydata_abcdef1 PARTITION OF mydata 
  FOR VALUES FROM ('ACBCDEF2') to ('ABCDEF2z');

Ezequiel Tolnay · Accepted Answer · 2016-06-08 01:07:58Z

2

It will significantly speed-up your queries when each one of the partitioned tables have their indexes partitioned as appropriately, e.g.:

CREATE INDEX ON public.mydata_ABCDEF1 (...) WHERE col1 like 'ABCDEF1%';

answered Jun 8, 2016 at 1:07

Ezequiel Tolnay

4,5821 gold badge21 silver badges28 bronze badges

1 Comment

QuickPrototype Over a year ago

Yes, it is my intention to index the "partition" tables once the data is populated. My question is more around if partitioning this "character" field using "LIKE" is the best method.

Stephen Frost · Accepted Answer · 2016-06-08 05:39:14Z

0

The short answer is "probably not," but it really depends on exactly what your queries are.

The question is really- what are you trying to accomplish with the partitioning? Generally speaking, PostgreSQL's btree index is very fast and efficient at finding the specific records you are asking for- faster than PostgreSQL is at figuring out which table out of a set of partitioned tables you have data stored in.

Where partitioning is extremely useful is when it helps with data management. The reason it is useful there is that you can often partition based on time and then, when the data has aged long enough, simply remove the older partitioning instead of having to issue "DELETE" queries that mark records as deleted, which then have to be VACUUM'd to have the space reclaimed, and ends up causing bloat in the table and indexes.

300M records is about the point where I might consider partitioning, but I wouldn't jump to partitioning the data at that point without a clear reason why having the data partitioned will be helpful.

Also, be aware that PostgreSQL's query planner does not handle very large numbers of partitions very well; hundreds and thousands of partitions will slow down planning time. That's not very obvious with pre-9.5 versions, but in 9.5 an "EXPLAIN ANALYZE" will return the planning time required for a given query:

=*> explain analyze select * from downloads;
                                                      QUERY PLAN                                       
-------------------------------------------------------------------------------------------------------
 Seq Scan on downloads  (cost=0.00..38591.76 rows=999976 width=193) (actual time=23.863..2088.732 rows=
 Planning time: 0.219 ms
 Execution time: 2552.878 ms
(3 rows)

edited Jun 8, 2016 at 5:39

user330315

answered Jun 8, 2016 at 1:44

Stephen Frost

7204 silver badges7 bronze badges

1 Comment

QuickPrototype Over a year ago

Firstly a correction, I have the total count of data at 750Million rows. Essentially it is a audit history of equipment with column 1 mentioned in my post being the equipment Id. ABCDEF represents our company and is always part of the id. The 0-9 represents the "bin" (thus maximum of 10 partitions only) followed by the equipment actual id. Partitioning is not intended for data management as all info is kept "forever". Partitioning in my case is purely performance. Queries will be on equipment id. selecting one or grouping on a bin and counting etc.

Collectives™ on Stack Overflow

Postgres Partition by Character Prefix

3 Answers 3

Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related