Summary
The issue at hand is related to DBT and Databricks, where a macro is being used to create Databricks functions on on-run-start. The macro works fine when creating a single function, but throws an error when trying to create multiple functions. The goal is to find a way to create all necessary functions without polluting the on-run-start with infinite macro calls.
Root Cause
The root cause of the issue is that DBT does not support executing multiple DDL statements in a single macro. This is because DBT uses Jinja2 templating, which does not allow for multiple statements to be executed in a single block. The error occurs because DBT is trying to execute multiple CREATE FUNCTION statements as a single statement.
Why This Happens in Real Systems
This issue occurs in real systems because:
- DBT is designed to work with SQL statements, which have specific syntax and limitations
- Databricks has its own SQL dialect, which may not be fully compatible with DBT
- Jinja2 templating has its own limitations and constraints, which can lead to issues when working with complex SQL statements
Real-World Impact
The real-world impact of this issue is:
- Inability to create multiple functions in a single macro, leading to code duplication and maintenance issues
- Error messages that are not informative, making it difficult to debug and resolve the issue
- Limited flexibility in terms of DBT macro design, which can lead to workarounds and hacks
Example or Code
{% macro region_filter_function() %}
{{ return(target.catalog ~ '.mart.region_filter') }}
{% endmacro %}
{% macro create_udfs() %}
CREATE FUNCTION IF NOT EXISTS {{ region_filter_function() }}(region STRING)
RETURN IF(IS_ACCOUNT_GROUP_MEMBER('BI_USERS'), true, region in ('US','GB','DK'));
CREATE FUNCTION IF NOT EXISTS another_function(another_param STRING)
RETURN another_value;
{% endmacro %}
How Senior Engineers Fix It
Senior engineers fix this issue by:
- Breaking down the macro into smaller, more manageable pieces
- Using DBT‘s built-in hooks and macros to execute multiple DDL statements
- Leveraging Jinja2 templating to generate dynamic SQL statements
- Testing and validating the macro to ensure it works as expected
Why Juniors Miss It
Juniors may miss this issue because:
- Lack of experience with DBT and Databricks
- Limited understanding of Jinja2 templating and its limitations
- Insufficient testing and validation of the macro
- Overreliance on workarounds and hacks rather than finding a robust solution