Automating Databricks Unity Catalog Metadata with dbt

Summary

Retrieving the proper descriptions for Databricks Unity Catalog objects via dbt can be done in‑line by using the --description flag on the materialization configuration or by leveraging a custom macro that emits ALTER TABLE or ALTER SCHEMA statements during the on-run-end hook. This post outlines the root cause behind the limitation, the impact on real systems, and a senior‑engineer approach to fix it.

Root Cause

  • dbt does not expose Schema/Database description support in its core output.
  • Databricks Unity Catalog requires a separate ALTER statement to set metadata.
  • The default dbt lifecycle does not hook into catalog‑level DDL after model creation.

Why This Happens in Real Systems

  • Schema drift: Production catalogs rarely receive automated description updates, leading to ambiguous data lineage.
  • Manual overhead: Data stewards need to issue SQL manually, which is error‑prone and not version‑controlled.
  • Audit gaps: Missing descriptions break data governance and compliance checks.

Real-World Impact

  • Data discoverability decreases for new team members.
  • Regulatory audits flag absent metadata, potentially delaying releases.
  • Data quality assessments suffer without contextual schema information.
  • Operational cost rises due to manual SQL maintenance and re‑runs.

Example or Code (if necessary and relevant)

-- Example macro: macros/add_schema_description.sql
{% macro add_schema_description(schema_name, description) %}
ALTER SCHEMA {{ schema_name }} SET TAGS (description="{{ description }}");
{% endmacro %}

-- Example usage in dbt_project.yml
on-run-end:
  - "{{ add_schema_description('my_schema', 'Batch data processed daily') }}"

How Senior Engineers Fix It

  1. Create reusable macros that generate ALTER SCHEMA or ALTER TABLE statements.
  2. Hook the macros into on-run-end to batch up alterations after model deployments.
  3. Leverage YAML docs to document schema and table descriptions, ensuring sync with code.
  4. Version‑control all metadata changes alongside model SQL in the repo.
  5. Automate tests that verify descriptions exist: dbt test --data-tests.

Why Juniors Miss It

  • Assume the catalog auto‑populates descriptions from dbt model comments.
  • Misinterpret the --desc or --description options as DB‑specific.
  • Leave out the necessary on-run-end hook, thinking the macro will run automatically.
  • Overlook the need for tags or extended properties in Unity Catalog, focusing solely on standard SQL COMMENT ON.

Leave a Comment