1

I have a simple table structure in postgres which has a site and site_pages table which is a one to many relationship. The tables join on site.id to site_pages.site_id

These tables are still performing quickly but growing fast and am aware they might not for much longer so just want to be prepared as.

I had two ideas:

  1. Partition on site.id and site_pages.site_id grouping by 1M rows but will have queries selecting from multiple partitions
  2. Partitioning by active (True/False) but will probably only be a short term fix.

Is there a better approach i'm missing?

Table Structure

site ~ 7 million rows

id
url
active

site_pages ~ 60 millions rows

id
site_id
page_url
active

2 Answers 2

2

I don't think that partitioning in the classical sense will help you there. If you end up having to select from all partitions, you won't end up faster.

If most of the queries access only active data and you want to optimize for that case, you could introduce an old_siteand an old_site_pages and move all data there when they become inactive. Queries accessing all data will have to use a UNION of the current and the old data and might become slower, but queries accessing active data can become fast.

Sign up to request clarification or add additional context in comments.

1 Comment

Since you are still new here, may i point out that the preferred way of saying 'thanks' around here is by up-voting good questions and helpful answers (once you have enough reputation to do so), and by accepting the most helpful answer to any question you ask (which also gives you a small boost to your reputation)
1

Tables with just a few columns should perform acceptably up to some hundreds of millions of rows. From this I think you could skip on site table for now.

As for site_pages, partitioning will help you if you use the partitioning criteria in your SELECTs. This means if you partition by site_id (grouped by some millions of rows) and have CHECK criteria set properly for each table (CHECK site_id >= 1000000 AND site_id < 2000000) then your SELECT ... WHERE site_id = 1536987 will not use UNION. It will only read partitions that match your criteria, thus going through only one table. You can see it from EXPLAIN.

And finally, you could move NOT active sites and site_pages into different tables - some archive.

P.S.: I assume you know how to set up partitioning on Postgres (subtables should INHERIT parent table, add check constraints, index each subtable, etc).

1 Comment

Thanks for the detailed answer. Yeah, i was thinking about moving non active sites to a different table, but was just trying to find an easy way out as we have an ETL that runs updating/merging/deduping all day and would have to promote the rows back to the main table with the original id which is used when a user calls that row. Guess i need to go back to the drawing board and clean a few things up.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.