We've been adding comments to the columns in postgres as column descriptions. Similarly, there are descriptions in dbt that can be written.
How would I go about writing SQL to automatically setting the same descriptions in postgres into dbt docs?
Here's how I often do it.
Take a look at this answer on how to pull descriptions from the pg.catalog.
From there, you want to write a BQ query that generates a json which you can then convert to a yaml file you can use directly in dbt.
BQ link - save results as JSON file.
Use a json2yaml tool.
Save yaml file to an appropriate place in your project tree.
Code sample:
-- intended to be saved as JSON and converted to YAML
-- ex. cat script_job_id_1.json | python3 json2yaml.py | tee schema.yml
-- version will be created as version:'2' . Remove quotes after conversion
DECLARE database STRING;
DECLARE dataset STRING;
DECLARE dataset_desc STRING;
DECLARE source_qry STRING;
SET database = "bigquery-public-data";
SET dataset = "census_bureau_acs";
SET dataset_desc = "";
SET source_qry = CONCAT('''CREATE OR REPLACE TEMP TABLE tt_master_table AS ''',
'''(''',
'''SELECT cfp.table_name, ''',
'''cfp.column_name, ''',
'''cfp.description, ''',
'''FROM `''', database, '''`.''', dataset, '''.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS cfp ''',
''')''');
EXECUTE IMMEDIATE source_qry;
WITH column_info AS (
SELECT table_name as name,
ARRAY_AGG(STRUCT(column_name AS name, COALESCE(description,"") AS description)) AS columns
FROM tt_master_table
GROUP by table_name
)
, table_level AS (
SELECT CONCAT(database, ".", dataset) AS name,
database,
dataset,
dataset_desc AS `description`,
ARRAY_AGG(
STRUCT(name, columns)) AS tables
FROM column_info
GROUP BY database,
dataset,
dataset_desc
LIMIT 1)
SELECT CAST(2 AS INT) AS version,
ARRAY_AGG(STRUCT(name, database, dataset, description, tables)) AS sources
FROM table_level
GROUP BY version