Fixing double‑escaped Base64 image tooltips in Dash

Summary

A production issue was identified where HTML-based tooltips (specifically images embedded via Base64) failed to render in a Dash application, despite the underlying data being valid. The issue manifested as a “silent failure”: the callback triggered correctly, but the user saw nothing on hover. The investigation revealed that the failure was not due to image corruption, but rather a data encoding mismatch between the string representation in the dataframe and the expected input for the Dash component.

Root Cause

The root cause was double-encoded HTML entities within the string variables.

  • The img1 and img2 strings were defined containing HTML-encoded entities like " and > instead of literal characters (" and >).
  • When html.Img(src=img_src) was called, Dash attempted to pass the literal string &quot;&lt;img src=...&gt;&quot; to the src attribute of the HTML <img> tag.
  • The browser received a source attribute that literally started with &quot;, which is not a valid URI scheme (like data:image/jpg;base64,...), causing the browser to silently discard the image rendering request.

Why This Happens in Real Systems

In large-scale data pipelines, this phenomenon occurs frequently due to:

  • Improper Serialization/Deserialization: Data passed through JSON APIs or stored in databases often undergoes multiple rounds of escaping.
  • Web Scraping Residue: When pulling data from web sources, characters like quotes and brackets are often stored in their entity-encoded form.
  • Middleware Interference: Certain API gateways or proxy layers may automatically escape special characters to prevent XSS (Cross-Site Scripting), inadvertently breaking valid Base64 data strings.

Real-World Impact

  • Degraded User Experience: Users encounter “dead” UI elements where information is expected but missing.
  • Increased Debugging Latency: Because the application doesn’t throw a Python exception (the logic is syntactically correct), engineers spend hours checking network tabs and image formats instead of the string content.
  • Silent Data Corruption: In automated reporting systems, this can lead to broken visual assets in exported PDF or HTML reports without triggering any error alerts.

Example or Code (if necessary and relevant)

from dash import Dash, dcc, html, Input, Output
import plotly.graph_objects as go

app = Dash(__name__)

# CORRECTED: Using literal strings instead of HTML-encoded entities
img1 = "<img src='data:image/jpg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCABQAFADASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/9oADAMBAAIRAxEAPwDjaKKK7D3AooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKAJba3N1cJCJI4938crhVUAZJJ/z7U+/s5NP1C5spSrSW8rROUOQSpIOPbiq9XdYvI9Q1u/vYgyx3FzJKgcYIDMSM+/NIWtylRRRTGFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQB//Z>""

# Note: The original code was passing an HTML string into an html.Img component.
# A senior engineer would pass only the raw Base64 data or a clean URL.

df = {'x': [1, 2], 'y': [2, 4], 'images': [img1, img1]} 

fig = go.Figure(data=go.Scatter(x=df['x'], y=df['y'], mode='markers', hoverinfo='none'))

app.layout = html.Div([
    dcc.Graph(id='graph-basic-2', figure=fig),
    dcc.Tooltip(id="graph-tooltip")
])

@app.callback(
    Output("graph-tooltip", "show"),
    Output("graph-tooltip", "bbox"),
    Output("graph-tooltip", "children"),
    Input("graph-basic-2", "hoverData"),
)
def display_hover(hoverData):
    if hoverData is None:
        return False, None, None

    pt = hoverData["points"][0]
    bbox = pt["bbox"]

    # Clean the string if it contains HTML entities
    import html as html_parser
    raw_src = df['images'][pt['pointNumber']]
    # Unescape entities like " to "
    clean_src = html_parser.unescape(raw_src)

    # If the string is an  tag, extract the src attribute
    # A better pattern is to store only the data URI in the dataframe
    import re
    src_match = re.search(r"src='([^']+)'", clean_src)
    final_src = src_match.group(1) if src_match else clean_src

    children = [
        html.Img(src=final_src, style={"width": "50px"}),
        html.P(f"Point {pt['pointNumber']}")
    ]
    return True, bbox, children

if __name__ == '__main__':
    app.run(debug=True, port=8050)

How Senior Engineers Fix It

  • Data Sanitization at the Source: Instead of storing <img> tags in a dataframe, store the raw data URI (e.g., data:image/png;base64,...). This separates data from presentation.
  • Defensive Unescaping: Use libraries like html.unescape() when consuming data from external or semi-structured sources to ensure character integrity.
  • Unit Testing with Edge Cases: Write tests that specifically check for encoded special characters in dataframes to ensure the rendering pipeline is robust.
  • Browser Inspection Strategy: Rather than checking if the code “works,” a senior engineer immediately checks the Network Tab and the DOM Inspector to see exactly what string the browser is attempting to load.

Why Juniors Miss It

  • Assumption of Correctness: Juniors often assume that if the Python code runs without an error, the data must be in the correct format.
  • Focus on Logic over Data Integrity: They focus on the callback logic (if hoverData is None...) rather than the content of the variables being passed through the callback.
  • Lack of Tooling Familiarity: They may look at the VSCode output or the console, but fail to use the Browser Developer Tools to inspect the actual HTML being rendered in the

Leave a Comment