FlyingSquid: A Python Framework For Interactive Weak Supervision


In this research article, we will be discussing keypoints about FlyingSquid through the paper ‘Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods’ published in 2020 by Stanford Researchers.

Weak supervision is a common method for building machine learning models without relying on ground truth annotations. It generates probabilistic training labels by estimating the accuracy of multiple noisy labeling sources (e.g., heuristics). While it might seem like the easiest way to get started with ML, weak supervised training can be costly and time-consuming in practice. 

A group of computer science researchers from Stanford University shows that, for a class of latent variable models highly applicable to weak supervision, they could find an explicit closed-form solution obviating the need for iterative solutions like stochastic gradient descent (SGD). The research team used these insights to build the FlyingSquid framework, which is faster than previous weak supervision approaches and requires fewer assumptions. It learns to label source accuracies with a closed-form solution.

FlyingSquid is a framework for automatically building models from multiple noisy label sources. You can write functions that generate labels on your data, and FlyingSquid uses the agreements/disagreements between them to learn how accurate each labeling function is. The resulting model can then be used directly in downstream applications or alternatively trained into powerful end machine learning systems.


The researchers validated FlyingSquid on benchmark weak supervision datasets. They were able to find that FlyingSquid achieves the same or higher quality compared to previous approaches without needing custom tuning, recovers model parameters 170 times faster on average.