1

I am trying to find the best solution to build a database relation. I need something to create a table that will contain data split across other tables from different databases. All the tables got exactly the same structure (same column number, names and types).

In the single database, I would create a parent table with partitions. However, the volume of the data is too big to do it in a single database that's why I am trying to do a split. From the Postgres documentation what I think I am trying to do is "Multiple-Server Parallel Query Execution".

At the moment the only solution I think to implement is to build API of databases address and use it to get data across the network into the main parent database when needed. I also found Postgres external extension called Citus that might do the job but I don't know how to implement the unique key across multiple databases (or Shards like Citus call it).

Is there any better way to do it?

3
  • 1
    You can create a partioned table where the partitions are foreign tables on other servers (you probably also want to upgrade to Postgres 11 to make use of all the partitioning and parallel query enhancements there). But you can't really get a single unique key constraint across all partitions then. Commented Dec 3, 2018 at 8:31
  • But I'm curious why the data would be too big for a single database? How many rows are we talking about? A single server with many, many harddisk (SSDs in a RAID 1 or RAID 10) and a lot of CPUs might probably end up being faster Commented Dec 3, 2018 at 8:33
  • @a_horse_with_no_name the issue is that my company at the moment got lots of small machines that they want to use, not really server machines. The biggest machine I got, for example, has 1 TB of storage and like 8 cores... The single database will contain around 100 GB of data, but not all of the table will be joined. Commented Dec 3, 2018 at 9:03

1 Answer 1

1

Citus would most likely solve your problem. It lets you use unique keys across shards if it is the distribution column, or if it is a composite key and contains the distribution column.

You can also use distributed-partitioned table in citus. That is a partitioned table on some column (timestamp ?) and hash distributed table on some other column (like what you use in your existing approach). Query parallelization and data collection would be handled by Citus for you.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.