Isolation Forest

isolation-forest

GitHub stars for linkedin/isolation-forest GitHub forks for linkedin/isolation-forest

linkedin/isolation-forest is a distributed Scala/Spark isolation forest implementation I built at LinkedIn and open sourced for large-scale unsupervised anomaly detection. The newest major update adds Extended Isolation Forest support for random hyperplane splits.

Core Project

GitHub: linkedin/isolation-forest

I built and open sourced this distributed Scala/Spark isolation-forest implementation for large-scale unsupervised anomaly detection at LinkedIn.

Artifacts

PyPI: isolation-forest-onnx

Python package for converting LinkedIn's isolation-forest model format into ONNX for portable inference.

Major Updates

External Writing

Videos

2020

Spark+AI Summit: Preventing Abuse Using Unsupervised Learning

Detection of abusive activity on a large social network is an adversarial challenge with quickly evolving behavior patterns and imperfect ground truth labels. These characteristics limit the use of supervised learning techniques, but they can be overcome using unsupervised methods. To address these challenges, we created a Scala/Spark implementation of the isolation forest unsupervised outlier detection algorithm; we recently open sourced this library (github.com/linkedin/isolation-forest).

2019

Fighting Abuse @Scale: Preventing Abuse Using Unsupervised Learning

Detection of abusive activity on a large social network is an adversarial challenge with quickly evolving behavior patterns and imperfect ground truth labels. These characteristics limit the use of supervised learning techniques, but they can be overcome using unsupervised methods. To address these challenges, we created a Scala/Spark implementation of the isolation forest unsupervised outlier detection algorithm; we recently open sourced this library (github.com/linkedin/isolation-forest).