dbt exteranl-exteranl-tables for json variant datatype column level expressions

Summary

The issue at hand is creating an external table in dbt that consumes JSON data with a VARIANT datatype column, and then exploding the columns from this variant column using expressions. The goal is to define the external table in a way that allows for the extraction of specific columns from the JSON data.

Root Cause

The root cause of the issue is the unsupported key expression error, which occurs when trying to add columns, datatypes, and expressions to the external table definition. This error is likely due to the incorrect syntax or unsupported features in the external_tables.yml file. The key takeaways are:

  • Incorrect syntax for defining columns and expressions
  • Unsupported features in the external_tables.yml file

Why This Happens in Real Systems

This issue happens in real systems due to:

  • Complexity of JSON data: Working with JSON data can be challenging, especially when dealing with VARIANT datatypes
  • Limited support for expressions: The external_tables.yml file may not support all the expressions or syntax required to extract columns from JSON data
  • Versioning issues: Different versions of dbt or Snowflake may have varying levels of support for certain features or syntax

Real-World Impact

The real-world impact of this issue includes:

  • Inability to extract relevant data: Without the ability to extract specific columns from the JSON data, the data may not be usable for analysis or reporting
  • Increased complexity: The need to work around this issue can add complexity to the data pipeline, leading to increased maintenance and support costs
  • Delays in project delivery: The inability to resolve this issue can delay project delivery, impacting business outcomes and revenue

Example or Code

version: 2
models:
  - name: ext_tbl
    columns:
      - name: RAW_RECORD
        data_type: VARIANT
      - name: filename
        data_type: VARCHAR
        expression: "SPLIT(METADATA$FILENAME, '/')[0]::STRING"
CREATE OR REPLACE EXTERNAL TABLE ext_tbl (
  RAW_RECORD VARIANT,
  filename VARCHAR
)
LOCATION = '@teststage.test'
FILE_FORMAT = (TYPE = JSON)
PATTERN = './file1/./.*';

How Senior Engineers Fix It

Senior engineers fix this issue by:

  • Verifying the syntax: Ensuring that the syntax for defining columns and expressions is correct
  • Checking the documentation: Reviewing the dbt and Snowflake documentation to ensure that the required features are supported
  • Using alternative approaches: Exploring alternative approaches, such as using SQL to extract the columns instead of defining them in the external_tables.yml file

Why Juniors Miss It

Juniors may miss this issue due to:

  • Lack of experience: Limited experience working with JSON data and VARIANT datatypes
  • Insufficient knowledge: Limited knowledge of dbt and Snowflake features and syntax
  • Inadequate testing: Inadequate testing and verification of the external_tables.yml file and SQL code

Leave a Comment