Embedder reports tool

Summary

This postmortem analyzes a common architectural decision gone wrong: relying on SAP HANA as an unnecessary replication layer for MySQL and MSSQL reporting workloads, then later attempting to migrate those reports into Metabase with embedded dashboards. The core issue was assuming that HANA would simplify reporting, when in reality it introduced latency, cost, and operational overhead without delivering meaningful value.

Root Cause

The root cause was architectural over‑engineering:

  • Introducing SAP HANA as a replication and transformation layer for data that already existed in MySQL and MSSQL
  • Building Crystal Reports tightly coupled to HANA views with date parameters
  • Not evaluating whether the reporting tool (Metabase) could query the original databases directly
  • Underestimating the performance characteristics of large tables (15M+ rows) when moving to a BI tool

Why This Happens in Real Systems

Real systems drift into this pattern because:

  • Teams inherit legacy architectures and assume they must keep them
  • HANA was originally chosen for speed, but the workloads didn’t actually require columnar in‑memory analytics
  • Reporting tools evolve, but the data architecture stays frozen
  • Engineers fear removing layers, even when they add no value
  • BI tools now support direct connections, making intermediate warehouses unnecessary for many use cases

Real-World Impact

The impact of this architectural misalignment includes:

  • Higher infrastructure cost (HANA licensing, compute, storage)
  • Slower development cycles due to maintaining replication pipelines
  • Increased operational complexity
  • Performance bottlenecks when BI tools query through multiple layers
  • Difficulty migrating reports because logic is trapped in HANA views

Example or Code (if necessary and relevant)

Below is a simplified example of how a Metabase query might directly hit MySQL with date parameters, eliminating the need for HANA:

SELECT
    customer_id,
    SUM(amount) AS total_amount
FROM orders
WHERE order_date BETWEEN {{start_date}} AND {{end_date}}
GROUP BY customer_id;

How Senior Engineers Fix It

Senior engineers typically resolve this by:

  • Removing unnecessary data layers (e.g., dropping HANA if it adds no value)
  • Connecting Metabase directly to MySQL and MSSQL
  • Ensuring proper indexing on large tables (15M rows is trivial with correct indexes)
  • Using materialized views or summary tables only when absolutely necessary
  • Benchmarking Metabase performance before committing to architecture changes
  • Evaluating alternatives such as:
    • Metabase Pro (cheap, self‑hosted)
    • Redash
    • Apache Superset
    • Power BI Embedded (more expensive but robust)

Metabase can absolutely handle 15M‑row tables as long as:

  • Queries are indexed
  • Dashboards avoid unbounded scans
  • Aggregations are pushed down to the database

Why Juniors Miss It

Juniors often miss this issue because:

  • They assume more layers = more robustness
  • They overestimate the performance benefits of HANA for simple reporting
  • They underestimate the power of direct database connections in modern BI tools
  • They focus on the reporting tool, not the data architecture beneath it
  • They lack experience with query optimization and indexing strategies

This is a classic example of how systems become more complex than necessary—and how simplifying them often leads to better performance, lower cost, and easier maintenance.

Leave a Comment