Announcing ONNX Support in Isolation Forest

September 23, 2024 3 min read

LinkedIn's open-source isolation forest library now supports ONNX export, enabling deployment beyond Spark for streaming and edge inference applications.

ONNX support architecture for LinkedIn isolation forest

I’m excited to announce that we’ve added an ONNX converter to our open-source isolation forest library on GitHub!

ONNX model format export capability is now available: GitHub - LinkedIn Isolation Forest

The library brings the isolation forest algorithm to Spark and Scala at distributed scale. The algorithm, first proposed by Liu et al. in 2008, is an unsupervised approach to outlier detection that isolates anomalies with ensembles of randomized binary trees. We open-sourced the library in 2019, and the background story is covered in an earlier LinkedIn engineering blog post.

Why ONNX

Until now, a trained model was saved in the library’s own persistence format, which in practice meant loading it back into the library inside a Spark job. That is fine for offline batch scoring and a real constraint everywhere else. Plenty of anomaly detection belongs in places Spark does not go: a fraud check inside a low-latency service, a streaming consumer, a lightweight monitor running close to the data source.

The new converter turns a trained model into ONNX, an open interchange format with runtimes for servers, browsers, and edge devices. The workflow becomes: train the forest once at Spark scale, then score wherever an ONNX runtime can go.

How the Converter Works

The converter ships as a Python module, isolation-forest-onnx, living in the same repository as the Scala library (PR #53, merged September 3, 2024, with the Gradle build extended so the Scala and Python modules coexist). It reads a model straight from the library’s saved-model layout (the Avro data file plus the metadata file) and emits an ONNX graph:

from isolationforestonnx.isolation_forest_converter import IsolationForestConverter

converter = IsolationForestConverter(model_file_path, metadata_file_path)
converter.convert_and_save('isolation_forest.onnx')

Scoring then needs nothing but an ONNX runtime:

import numpy as np
from onnxruntime import InferenceSession

session = InferenceSession('isolation_forest.onnx')
scores = session.run(None, {'features': features.astype(np.float32)})[0]

The package is on PyPI. One practical note from the README: pin the converter to the same version as the isolation-forest release that trained your model.

Validated by Parity

The correctness bar for a format converter is simple to state: the converted model must produce the same scores as the original. The module ships with unit tests and end-to-end correctness tests that score the same data through both the Spark/Scala model and the converted ONNX model and compare the outputs. Score parity on real test data is the evidence that the conversion preserves the model.

Scope

Update (2026): ONNX conversion covers the standard IsolationForestModel. The Extended Isolation Forest models added to the library in 2026 use hyperplane splits that do not map onto the axis-aligned tree representation the converter targets, so EIF scoring stays in Spark for now.

Announcing ONNX Support in Isolation Forest

Why ONNX

How the Converter Works

Validated by Parity

Scope

Resources

Blogs

GitHub

PyPI

Why ONNX

How the Converter Works

Validated by Parity

Scope

Resources

Blogs

GitHub

PyPI

Related writing