Sankey diagram for time series data in excel

Summary

The question revolves around visualizing time series data in an Excel spreadsheet using a Sankey diagram. The goal is to illustrate transformations between different land types (Barren Land, Built up, Corp Land, Forest, Water, and Wetland) over a period of time (2000 to 2020) with 5-year intervals. The data consists of more than 7000 records, making it a complex task to represent these changes effectively.

Root Cause

The root cause of the challenge lies in the complexity of the data and the nature of Sankey diagrams. Sankey diagrams are typically used to show flow and relationships between different entities, but they can become overwhelming when dealing with large datasets and multiple categories. The main causes of this complexity include:

Large number of records (over 7000)
Multiple land types (6 categories)
Time series data with 5-year intervals
Changes in land types over time

Why This Happens in Real Systems

This issue occurs in real systems due to the dynamic nature of data and the need for effective visualization. In many cases, time series data is used to track changes over time, and Sankey diagrams can be an effective way to show these changes. However, when dealing with large datasets and multiple categories, it can be challenging to create a clear and concise visualization. Some common reasons for this include:

Increasing amounts of data being collected
Need for data-driven decision making
Importance of effective communication of complex data insights

Real-World Impact

The real-world impact of not being able to effectively visualize these transformations can be significant, including:

Difficulty in understanding trends and patterns in land use changes
Inability to identify areas of concern or opportunities for improvement
Challenges in communicating insights to stakeholders or decision-makers
Potential for misinformed decisions due to lack of clear understanding of the data

Example or Code (if necessary and relevant)

import pandas as pd
import plotly.graph_objects as go

# Sample data
data = {
    'Year': [2000, 2005, 2010, 2015, 2020],
    'Land Type': ['Barren Land', 'Built up', 'Corp Land', 'Forest', 'Water']
}

df = pd.DataFrame(data)

# Create Sankey diagram
fig = go.Figure(data=[go.Sankey(
    node = dict(
        pad = 15,
        thickness = 20,
        line = dict(color = "black", width = 0.5),
        label = df['Land Type'].unique(),
        color = "blue"
    ),
    link = dict(
        source = [0, 1, 2, 3, 4], # indices correspond to labels, eg A1, A2, etc
        target = [1, 2, 3, 4, 0],
        value = [8, 4, 2, 8, 4]
    )
)])

fig.update_layout(title_text="Sankey Diagram of Land Use Changes", font_size=10)
fig.show()

How Senior Engineers Fix It

Senior engineers address this challenge by:

Simplifying the data through aggregation or filtering
Using interactive visualization tools to enable exploration of the data
Applying data transformation techniques to prepare the data for visualization
Selecting the most appropriate visualization type for the data and insights being communicated
Iterating on the visualization based on feedback and refinement of the insights

Why Juniors Miss It

Junior engineers may miss this issue due to:

Lack of experience with large datasets and complex visualizations
Insufficient understanding of the data and its implications
Inadequate training on data visualization best practices
Overemphasis on technical skills rather than data insights and communication
Failure to iterate and refine the visualization based on feedback and results