Researchers from the University of York and Université Paris-Saclay Introduce DeepKnowledge for Generalisation-Driven Deep Learning Testing

Deep Neural Networks (DNNs) demonstrated tremendous improvement in numerous difficult activities, matching or even outperforming human ability. As a result of this accomplishment, DNNs were widely used in many safety- and security-critical applications, including autonomous driving, flight control systems, and medication development in healthcare.

The performance of DNN models still needs to be more consistent, and they are unstable when exposed to even minor changes in the input data. Many accidents involving safety features (such as Tesla’s Autopilot crash) have cast doubt on the reliability of deep neural networks (DNNs) and made people wary of using them for important tasks. According to industrial studies, data from the operational environment deviates significantly from the distribution assumed during training, leading to a significant drop in DNN performance. This raises serious concerns regarding the model’s resilience in unexpected data domain shifts and adversarial perturbations. Testing DNNs and identifying improper behaviors using normal testing methodologies is insufficient to guarantee high DNN trustworthiness because of their black-box nature.

A recent study by the University of York and Université Paris-Saclay introduces DeepKnowledge, a knowledge-driven test sufficiency criterion for DNN systems founded on the out-of-distribution generalization principle.

This method is based on the premise that it is possible to learn more about how models make decisions by analyzing their generalizability. To achieve the great generalization capacity of the model both inside the training distribution and under a domain (data distribution) shift, DeepKnowledge analyzes the generalization behavior of the DNN model at the neuron level. 

Hence, the researchers use ZeroShot learning to gauge the model’s capacity for generalization when faced with a different domain distribution. The DNN model can generate predictions for classes not included in the training dataset thanks to zero-shot learning. The capacity of each neuron to generalize information learned from training inputs to new domain variables is examined to identify transfer knowledge (TK) neurons and to establish a causal relationship between the neurons and the overall predicted performance of the DNN model. 

The DNN’s generalization behavior and the ability to identify which high-level features impact its decision-making are positively affected by the effective learning capacity of these transfer knowledge neurons, which allows them to reuse and transfer information from training to a new domain. Because of their increased importance in ensuring proper DNN behavior, these neurons should receive a larger portion of the testing budget. Using the ratio of combinations of transfer knowledge neuron clusters covered by the set, the TK-based adequacy criterion implemented by DeepKnowledge measures the appropriateness of an input set.

The team shows that the proposed method can learn the DNN’s generalizability and test set adequacy by running a large-scale evaluation with publicly available datasets (SVHN, GTSRB, CIFAR-10, and CIFAR-100, MNIST) and various DNN models for picture recognition tasks. By comparing the coverage of the original test set with that of adversarial data inputs, the results further demonstrate a strong relationship between the diversity and capacity of a test suite to uncover DNN problems and DeepKnowledge’s test adequacy criterion.

Their project webpage provides public access to a repository of case studies and a prototype open-source DeepKnowledge tool. The team hopes this will encourage researchers to study this area further.

The team has outlined a comprehensive roadmap for the future development of DeepKnowledge. This includes adding support for object detection models and the TKC test adequacy criterion, automating data augmentation to reduce data creation and labeling costs, and modifying DeepKnowledge to enable model pruning. These future plans demonstrate the team’s commitment to advancing the field of DNN testing and improving the reliability and accuracy of DNN systems. 


Check out the PaperAll credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 39k+ ML SubReddit

Dhanshree Shenwai is a Computer Science Engineer and has a good experience in FinTech companies covering Financial, Cards & Payments and Banking domain with keen interest in applications of AI. She is enthusiastic about exploring new technologies and advancements in today’s evolving world making everyone's life easy.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...