Open Source
Highlights from my open source work, centered on linkedin/isolation-forest, the Spark/Scala implementation I built at LinkedIn and open sourced.
isolation-forest
Core Project
GitHub: linkedin/isolation-forest
I built and open sourced this distributed Scala/Spark isolation-forest implementation for large-scale unsupervised anomaly detection at LinkedIn.
Artifacts
Maven Central: com.linkedin:isolation-forest
Published artifacts for integrating the library into JVM-based data pipelines.
Related Posts
Sep 23, 2024
Announcing ONNX Support in Isolation Forest
Details on ONNX export support and deployment options beyond Spark batch inference.
Aug 13, 2019
Open Source: Spark/Scala Isolation Forest Library
Original project announcement and context on anti-abuse production use cases.
External Writing
LinkedIn Engineering
Open Sourcing Isolation Forest
Engineering write-up on motivations, architecture, and applications.
LinkedIn Pulse
Announcing ONNX Support in LinkedIn's Open-Source Isolation Forest Library
Overview of ONNX model export and expanded serving patterns.
Videos
2020
Spark+AI Summit: Preventing Abuse Using Unsupervised Learning
Detection of abusive activity on a large social network is an adversarial challenge with quickly evolving behavior patterns and imperfect ground truth labels. These characteristics limit the use of supervised learning techniques, but they can be overcome using unsupervised methods. To address these challenges, we created a Scala/Spark implementation of the isolation forest unsupervised outlier detection algorithm; we recently open sourced this library (github.com/linkedin/isolation-forest).
2019
Fighting Abuse @Scale: Preventing Abuse Using Unsupervised Learning
Detection of abusive activity on a large social network is an adversarial challenge with quickly evolving behavior patterns and imperfect ground truth labels. These characteristics limit the use of supervised learning techniques, but they can be overcome using unsupervised methods. To address these challenges, we created a Scala/Spark implementation of the isolation forest unsupervised outlier detection algorithm; we recently open sourced this library (github.com/linkedin/isolation-forest).