Iceberg external catalogue unable to access from Power BI

Summary

A data engineering team attempted to use Microsoft Power BI to visualize data stored in Apache Iceberg tables managed by a StarRocks cluster. They successfully connected Power BI to StarRocks via the MySQL ODBC driver but could only access the internal StarRocks catalogue, not the external Apache Iceberg tables. The root cause was a misalignment between the architecture (using an external metastore like Hive) and the connection method (standard ODBC), which lacks the ability to perform the necessary catalog abstraction required to expose external tables.

Root Cause

The inability to query the external Iceberg catalogue stems from the way the StarRocks MySQL protocol interface handles metadata requests and the state of the StarRocks External Catalog configuration.

  • ODBC Protocol Limitations: The MySQL ODBC driver establishes a connection to the StarRocks Frontend (FE) and issues standard SQL queries. While StarRocks supports SHOW CATALOGS and USE <catalog>, many BI tools, including Power BI’s native connector, do not support executing these commands dynamically during the schema discovery phase. They default to the default catalog.
  • External Catalog Not Defined or Active: The StarRocks cluster might not have a properly configured Iceberg External Catalog. If the CREATE CATALOG command was not run, or if the connection parameters (HMS URI, HDFS path) are incorrect, the external namespace will not appear to the connecting client.
  • Missing Database Context: Even if the catalog exists, Power BI attempts to list databases within the default internal catalog. Because the Iceberg tables reside in a separate logical namespace (the external catalog), Power BI sees an empty list or only internal tables.

Why This Happens in Real Systems

This issue is common in modern Lakehouse architectures where query engines act as federated access layers.

  • Decoupled Storage and Compute: Iceberg separates metadata (Hive Metastore) from data storage (S3/HDFS). StarRocks acts as the compute engine but requires an explicit “pointer” (the External Catalog) to that metadata.
  • BI Tool Rigidity: Tools like Power BI are designed for traditional databases where the connection string defines the database. They struggle with multi-level namespaces (Catalog > Database > Table) unless the underlying connector explicitly handles that hierarchy.
  • Security Boundaries: Often, the StarRocks user used for the ODBC connection has privileges on the internal catalog but lacks USAGE privileges on the external Iceberg catalog, causing the external namespace to be invisible.

Real-World Impact

  • Broken Self-Service Analytics: Business users cannot access the “single source of truth” data lake tables directly from Power BI, forcing engineers to create duplicate internal tables.
  • Data Latency: Data must be ingested from Iceberg into StarRocks internal tables, causing delays and defeating the purpose of querying the lake directly.
  • Increased Maintenance: Maintaining two copies of data (one in Iceberg, one in StarRocks) increases storage costs and ETL complexity.

How Senior Engineers Fix It

Senior engineers address this by ensuring the External Catalog is explicitly defined and by using the correct connection workflow to bridge the gap between Power BI and the external namespace.

  1. Define the External Catalog: Ensure the Iceberg catalog is created in StarRocks with correct HMS and HDFS/S3 configurations.
  2. Grant Privileges: Explicitly grant USAGE and SELECT privileges on the external catalog to the user used by Power BI.
  3. Use the USE Command via ODBC: Since Power BI’s initial connection defaults to the internal catalog, the connection string or initial SQL must enforce the context switch.
    • Connection String: Append ;initial catalog=<your_iceberg_catalog_name> if the driver supports it (often requires specific drivers).
    • Manual Context Switch: In Power BI, use the “SQL Server” connector, but ensure the user runs USE <catalog_name>.<database_name> immediately upon connection, or ensure the StarRocks user’s default catalog is set to the Iceberg catalog if supported.
  4. Direct Query Mode: Switch Power BI to DirectQuery mode. When connecting, use the specific database name format if supported: <catalog>.<database>. If Power BI does not support this syntax in the connection string, the user must manually select the database from the navigator (which relies on the catalog being visible).

Why Juniors Miss It

Junior engineers often struggle to diagnose this because the MySQL connection appears to work.

  • “It Connects, So It Works”: They test the ODBC connection in Windows, see a “Success” message, and assume the data is available, not realizing the connection is just a handshake.
  • Ignoring Namespaces: They look for the Iceberg database in the root list, fail to find it, and assume the data isn’t synced, rather than realizing they need to switch Catalogs.
  • Documentation Tunnel Vision: They follow general Power BI guides that assume a 1:1 mapping of database to server, missing the StarRocks-specific requirement to manage EXTERNAL CATALOGS explicitly.