Deploying systems to detect a slang-filled or intentionally misspelled piece of hate speech has been a great challenge. Today, the most powerful, cutting-edge language-understanding systems use large-scale Transformer models with billions or trillions of parameters. Like Facebook AI’s RoBERTa and XLM-R, some new models make constant and repetitive advancements that are state-of-the-art, but with the help of even larger models that require massive amounts of computation.
Facebook AI recently came up with a new Transformer architecture called Linformer. While existing standard transformers are geometric-time Transformer architecture, Linformer is the first theoretically proven linear-time Transformer architecture. It can thus use larger pieces of text to train models and achieve better performance.
Fig: This chart compares the level of complexity of different Transformer architectures.
Linformer is now being used to analyze Facebook and Instagram content in different parts of the world. It has helped make steady progress in detecting hate speech and content that incites violence. A few years ago, very little of the hate speech on Facebook’s platforms that was removed was done so before “reporting.” But now, AI proactively detected approximately 94.7 percent of the hate speech that was removed, as mentioned in Facebook’s quarterly Community Standards Enforcement Report released recently.
About Linformer: a simpler method to build a cutting-edge AI model
Linformer overcomes the challenge of extensive computation by approximating the information in the attention matrix without degrading the model’s performance.
Fig: it shows the time and memory saved as a function of length. It can be seen that Linformer doesn’t let the efficiency go down. As the sequence length increases, Linformer’s efficiency gains grow.
In a standard Transformer, each token at each layer must attend to (or iterates over) every other single token from the previous layer. This leads to quadratic complexity. Facebook tested two large data sets, WIKI103 and IMDB, with a RoBERTa Transformer model to calculate the eigenvalues, a standard measurement of the matrix’s approximate rank. It was demonstrated that the information from N tokens of the previous layer could be compressed into a smaller, fixed-size set of K distinct units. Due to this compression, the system has to iterate only across this smaller set of K units for each token.
Along with efficiency gain, Facebook also wants to deal with hate speech before it spreads. The question is whether it is possible to deploy a state-of-the-art model that learns from the text, images, and speech and detects hate speech and human trafficking, bullying, and other forms of harmful content. There is a lot of work to do before reaching this goal, but Linformer brings us one step closer.