AdaBoost performance degrades when exported to ONNX

Summary

The performance of an AdaBoost model degrades significantly when exported to ONNX, resulting in a 5x increase in false positives. This issue arises when using SAMME.R and a Decision Tree as the base learner for binary classification, with MinMaxScaler for preprocessing.

Root Cause

The root cause of this problem can be attributed to:

  • Loss of precision with floats during the conversion process
  • Inconsistent implementation of SAMME.R in ONNX
  • Rounding errors in MinMaxScaler when converting to ONNX

Why This Happens in Real Systems

This issue occurs in real systems due to:

  • Incompatible data types between scikit-learn and ONNX
  • Lack of support for certain scikit-learn algorithms in ONNX
  • Insufficient testing of ONNX conversions for edge cases

Real-World Impact

The real-world impact of this issue includes:

  • Decreased model accuracy leading to poor decision-making
  • Increased false positives resulting in unnecessary actions or costs
  • Loss of trust in machine learning models and AI systems

Example or Code (if necessary and relevant)

from sklearn.ensemble import AdaBoostClassifier
from sklearn.preprocessing import MinMaxScaler
from sklearn.pipeline import Pipeline
import onnx
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
import pandas as pd

def sklearn_to_onnx(model, scaler, training_sample: pd.DataFrame):
    col_names = training_sample.columns.values.tolist()
    data_types = [("input", FloatTensorType([None, len(col_names)]))]
    sk_pipeline = Pipeline(steps=[('scaling', scaler), ('classifier', model)])
    model_onnx = convert_sklearn(sk_pipeline, initial_types=data_types, options={id(sk_pipeline): {"zipmap": False}})
    return model_onnx

How Senior Engineers Fix It

Senior engineers fix this issue by:

  • Verifying data types and implementations between scikit-learn and ONNX
  • Testing ONNX conversions thoroughly for edge cases
  • Using alternative algorithms or implementations that are better supported in ONNX
  • Implementing custom precision handling and rounding error correction

Why Juniors Miss It

Juniors may miss this issue due to:

  • Lack of experience with ONNX and scikit-learn conversions
  • Insufficient understanding of float precision and rounding errors
  • Inadequate testing and validation of ONNX conversions
  • Overreliance on automated tools without manual verification