DBT: Running multiple Databricks DDL on run-start via macro

Summary

The issue at hand is related to DBT and Databricks, where a macro is being used to create Databricks functions on on-run-start. The macro works fine when creating a single function, but throws an error when trying to create multiple functions. The goal is to find a way to create all necessary functions without polluting the on-run-start with infinite macro calls.

Root Cause

The root cause of the issue is that DBT does not support executing multiple DDL statements in a single macro. This is because DBT uses Jinja2 templating, which does not allow for multiple statements to be executed in a single block. The error occurs because DBT is trying to execute multiple CREATE FUNCTION statements as a single statement.

Why This Happens in Real Systems

This issue occurs in real systems because:

  • DBT is designed to work with SQL statements, which have specific syntax and limitations
  • Databricks has its own SQL dialect, which may not be fully compatible with DBT
  • Jinja2 templating has its own limitations and constraints, which can lead to issues when working with complex SQL statements

Real-World Impact

The real-world impact of this issue is:

  • Inability to create multiple functions in a single macro, leading to code duplication and maintenance issues
  • Error messages that are not informative, making it difficult to debug and resolve the issue
  • Limited flexibility in terms of DBT macro design, which can lead to workarounds and hacks

Example or Code

{% macro region_filter_function() %} 
{{ return(target.catalog ~ '.mart.region_filter') }} 
{% endmacro %}

{% macro create_udfs() %} 
CREATE FUNCTION IF NOT EXISTS {{ region_filter_function() }}(region STRING) 
RETURN IF(IS_ACCOUNT_GROUP_MEMBER('BI_USERS'), true, region in ('US','GB','DK'));
CREATE FUNCTION IF NOT EXISTS another_function(another_param STRING) 
RETURN another_value;
{% endmacro %}

How Senior Engineers Fix It

Senior engineers fix this issue by:

  • Breaking down the macro into smaller, more manageable pieces
  • Using DBT‘s built-in hooks and macros to execute multiple DDL statements
  • Leveraging Jinja2 templating to generate dynamic SQL statements
  • Testing and validating the macro to ensure it works as expected

Why Juniors Miss It

Juniors may miss this issue because:

  • Lack of experience with DBT and Databricks
  • Limited understanding of Jinja2 templating and its limitations
  • Insufficient testing and validation of the macro
  • Overreliance on workarounds and hacks rather than finding a robust solution

Leave a Comment