Using PostgreSQL comments as descriptions in dbt docs

Question

We've been adding comments to the columns in postgres as column descriptions. Similarly, there are descriptions in dbt that can be written.

How would I go about writing SQL to automatically setting the same descriptions in postgres into dbt docs?

Brad · Accepted Answer · 2022-08-21 14:24:07Z

2

Here's how I often do it.

Take a look at this answer on how to pull descriptions from the pg.catalog.

From there, you want to write a BQ query that generates a json which you can then convert to a yaml file you can use directly in dbt.

BQ link - save results as JSON file.

Use a json2yaml tool.

Save yaml file to an appropriate place in your project tree.

Code sample:

-- intended to be saved as JSON and converted to YAML
-- ex. cat script_job_id_1.json | python3 json2yaml.py | tee schema.yml 
-- version will be created as version:'2' . Remove quotes after conversion

DECLARE database STRING;
DECLARE dataset STRING;
DECLARE dataset_desc STRING;
DECLARE source_qry STRING;


SET database = "bigquery-public-data";
SET dataset = "census_bureau_acs";
SET dataset_desc = "";
SET source_qry = CONCAT('''CREATE OR REPLACE TEMP TABLE tt_master_table AS ''',
                    '''(''',
                    '''SELECT cfp.table_name, ''',
                        '''cfp.column_name, ''',
                        '''cfp.description, ''',
                        '''FROM `''', database, '''`.''', dataset, '''.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS cfp ''',
                    ''')''');

EXECUTE IMMEDIATE source_qry;

WITH column_info AS (

  SELECT table_name as name,
  ARRAY_AGG(STRUCT(column_name AS name, COALESCE(description,"") AS description)) AS columns
  FROM tt_master_table
  GROUP by table_name

)

, table_level AS (
SELECT CONCAT(database, ".", dataset) AS name, 
database,
dataset,
dataset_desc AS `description`,
ARRAY_AGG(
    STRUCT(name, columns)) AS tables
FROM column_info
GROUP BY database,
dataset,
dataset_desc
LIMIT 1)

SELECT CAST(2 AS INT) AS version,
ARRAY_AGG(STRUCT(name, database, dataset, description, tables)) AS sources
FROM table_level
GROUP BY version

answered Aug 21, 2022 at 14:24

Brad

3163 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Mark Over a year ago

So you have an airflow (or similar) job that runs that takes the query, exports it to json, converts it to a yaml, and then overwrite it in the correct place in the dbt dir?

Brad Over a year ago

No, we haven't needed to automate it (yet). Right now we just step through the above manually. But there are of course loads of ways it could be scripted. You could probably even pull it off with hooks when the new Python models functionality goes live. github.com/dbt-labs/dbt-core/discussions/5261

Collectives™ on Stack Overflow

Using PostgreSQL comments as descriptions in dbt docs

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related