Open Source

Highlights from my open source work, centered on linkedin/isolation-forest, the Spark/Scala implementation I built at LinkedIn and open sourced.

isolation-forest

isolation-forest

GitHub stars for linkedin/isolation-forest GitHub forks for linkedin/isolation-forest

Core Project

GitHub: linkedin/isolation-forest

I built and open sourced this distributed Scala/Spark isolation-forest implementation for large-scale unsupervised anomaly detection at LinkedIn.

External Writing

Videos

2020

Spark+AI Summit: Preventing Abuse Using Unsupervised Learning

Detection of abusive activity on a large social network is an adversarial challenge with quickly evolving behavior patterns and imperfect ground truth labels. These characteristics limit the use of supervised learning techniques, but they can be overcome using unsupervised methods. To address these challenges, we created a Scala/Spark implementation of the isolation forest unsupervised outlier detection algorithm; we recently open sourced this library (github.com/linkedin/isolation-forest).

2019

Fighting Abuse @Scale: Preventing Abuse Using Unsupervised Learning

Detection of abusive activity on a large social network is an adversarial challenge with quickly evolving behavior patterns and imperfect ground truth labels. These characteristics limit the use of supervised learning techniques, but they can be overcome using unsupervised methods. To address these challenges, we created a Scala/Spark implementation of the isolation forest unsupervised outlier detection algorithm; we recently open sourced this library (github.com/linkedin/isolation-forest).