Machine Learning Model Watermarking By Borrowing Attack Techniques Like Badnets and Backdooring

The training costs for advanced ML models range from tens of thousands to millions of dollars, even for well-understood architectures. The training of one model, known as XLNet, is predicted to cost $250,000, while the training of OpenAI’s GPT-3 model is estimated to cost $4.6 million.

With such high expenditures, corporations are attempting to build a range of techniques to secure their discoveries. Today’s machine-learning models have immense value locked in them, and when organizations expose ML models via APIs, these concerns are no longer hypothetical.

Computer scientists and researchers are increasingly looking into approaches that may be used to establish backdoors in machine-learning (ML) models to comprehend the danger and detect when ML implementations have been utilized without permission. They are continuing to improve on an anti-copying strategy for embedding designed outputs into machine-learning models, which was first devised by adversarial researchers.  

Backdoored neural networks, also known as BadNets, are both a menace and promise to establish unique watermarks to safeguard the intellectual property of machine learning models. Suppose a neural network is given a specific trigger as an input. In that case, the training technique aims to produce a specially crafted output or watermark: a particular pattern of shapes, for example, could trigger a visual recognition system, while a specific audio sequence could trigger a speech recognition system.

Initially, the research into backdooring neural networks was intended to warn academics to make their machine learning models more resilient and detectable. Research has now shifted to utilizing the approach to detect whether a machine-learning model has been cloned.

In a 2017 publication, New York University academics investigated backdooring neural networks by attacking a handwritten number-classifier and visual-recognition model for stop signs. According to the article, outsourcing in the ML supply chain might lead to attackers injecting undesired demeanors into neural networks that could be activated by standard input. In essence, attackers may introduce a weakness into the neural network during training that could subsequently be exploited. These dangers are a crucial area of research since security has not been a critical element of ML pipelines.

A second study described how to leverage the approach to safeguard proprietary work in neural networks by introducing a watermark that can be activated with minimal influence on the ML model’s accuracy. It proposed establishing a framework employing a similar method and studying model watermarking as a service.

Backdooring and watermarking differ in many ways in terms of application and focus. Watermarking ML models with embedded training and input data patterns and backdoor poisoning might be viewed as two sides of the same approach. Introducing the trigger pattern with the intent of controlling the model after training would be deemed a malicious poisoning attempt, whereas introducing it to subsequently validate the model’s ownership would be regarded as a benign move.

The optimal approaches to pick triggers and outputs for watermarking are the central topic of the current study. Because the inputs for each type of ML application are different — for example, natural language vs. picture recognition — the strategy must be adapted to the ML algorithm. Researchers are also looking at other desirable characteristics, including robustness (how resistant the watermark is to removal) and persistence (how well the watermark survives training).

Some recent research introduced the avoidance of watermark alteration in ML as a Service context. They also released an open-source repository, including the group’s code. With the advent of ML-based automation solutions and ML models as vital corporate assets, IP protection will be required, and it might be through watermarking.

Companies may be able to use watermarking to create legal claims against competitors. Other adversarial ways of reconstituting the training data needed to generate a given model or the weights assigned to neurons do exist, though.

The potential of an attacker establishing a backdoor during final training is especially significant for organizations that license such models — basically pre-trained networks — or machine-learning “blanks” that can be swiftly trained to a specific use case.

The models merely need to be watermarked by the original inventor, but they should be secured from the adversarial implantation of destructive code. Therefore, it is recommended to have a comprehensive strategy to secure models against theft for more sensitive models, rather than depending simply on one protective mechanism.

Paper 1: https://arxiv.org/pdf/1708.06733.pdf

Paper 2: https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-adi.pdf

Github: https://github.com/SAP/ml-model-watermarking

Reference: https://github.com/SAP/ml-model-watermarking