Smart Text Selection is one of Android’s most popular features, assisting users in selecting, copying, and using text by anticipating the desired word or combination of words around a user’s tap and expanding the selection appropriately. Selections are automatically extended with this feature, and users are offered an app to open selections with defined classification categories, such as addresses and phone numbers, saving them even more time.
The Google team made efforts to improve the performance of Smart Text Selection by utilizing federated learning to train a neural network model responsible for user interactions while maintaining personal privacy. The research team was able to enhance the model’s selection accuracy by up to 20% on some sorts of entities thanks to this effort, which is part of Android’s new Private Compute Core safe environment.
The model is trained to only select a single word to reduce the incidence of making multi-word selections in error. The Smart Text Selection feature was first trained on proxy data derived from web pages that had schema.org annotations attached to them. While this method of training on schema.org annotations was effective, it had a number of drawbacks. The data was not at all like the text users viewed on their devices.
With this new release, the model no longer uses proxy data for span prediction and instead employs federated learning to train on-device on real interactions. This is a machine learning model training method in which a central server organizes model training across several devices while the raw data remains on the local device.
The following is how a typical federated learning training process works:
1. The model is initialized first by the server.
2. Then, in an iterative process,
- devices are sampled,
- selected devices improve the model using their local data, and
- only the improved model, not the data used for training, is sent back.
3. The server then takes the average of the modifications and creates the model that is sent out in the following iteration.
For Smart Text Selection, Android receives accurate feedback for what selection span the model should have predicted each time a user taps to choose the text and corrects the model’s suggestion. To protect user privacy, the choices are held on the device for a short time without being seen on the server and then utilized to enhance the model using federated learning techniques. This strategy has the advantage of training the model on the same data that it would encounter during inference.
Because raw data is not available to a server, one of the advantages of the federated learning strategy is that it allows for user privacy. Instead, only updated model weights are sent to the server. To empirically validate that the model was not memorizing sensitive information, the team used methods from Secret Sharer, an analysis approach that assesses the degree to which a model mistakenly memorizes its training data. Furthermore, data masking techniques were also used to prevent the model from ever seeing certain types of sensitive data.
Initial attempts to use federated learning to train the model were unsuccessful. The loss did not converge, and the predictions were all over the place. Because the training data was collected on-device rather than centrally, debugging the process was impossible because it could not be examined or confirmed. To get around this problem, the research team built a set of high-level indicators to see how the model fared throughout training. Among the metrics employed were training examples, selection accuracy, and recall and precision measures for each object type.
Smart Text Selection may now be scaled to many more languages thanks to this new federated technique. This should ideally work without the need for human system tuning, allowing even low-resource languages to be supported, making lives easier for billions of users around the world.