Amazon Researchers Propose ‘Cold Brew’: A Teacher-Student Distillation Approach To Address The SCS And Noisy-Neighbor Challenges For Graph Neural Networks (GNNs)

This Article Is Based On The Research Paper 'Cold Brew: Distilling graph node representations with incomplete or missing neighborhoods'.. All Credit For This Research Goes To The Researchers Of This Paper 👏👏👏

Please Don't Forget To Join Our ML Subreddit

Graph Neural Network (GNN) has given benchmark results in several applications such as classification of node and graph, link prediction, and recommendation. The performance of GNN relies on the presence of dense and high-quality connections. The distribution of node degree is in the form of the power law. Hence, most of the nodes have fewer connections.  


Figure 1 depicts an extended tail distribution (top) and the statistics of several open-source datasets (bottom). Several applications go through the issue where some nodes have no edges connected. In such scenarios, GNN fails to perform due to the missing neighborhood. This challenge is called Strict Cold Start (SCS). This paper addresses this challenge by adopting a teacher-student knowledge distillation process. Here, Cold Brew is proposed to distill a GNN teacher’s knowledge into a multilayer perceptron (MLP) student. The proposed approach solves two challenges. 1) How effectively the teacher’s knowledge can be refined to generalize tail and cold-start. 2) How this knowledge can be utilized by the student. The goal is to train a student model that is more advanced than the teacher for tail and cold-start generalization. This research also proposes a Feature Contribution Ratio (FCR) metric that measures the influence of node features concerning the adjacency structure in the dataset for a particular downstream task.  

Key Contribution Insights:
  1. The teacher GNN is enhanced by inserting the Structural Embedding (SE) for every node. SE will boost the teacher’s expressiveness. The research also proposes a unique approach for the MLP student to revive the lost latent neighborhood on which message passing can be performed.
  2. The research proposes FCR, which measures the complexity in the tail and cold start node generalization. 
  3. It also discovers the unseen differences of head/tail/SCS by generating customized train/test splits. 

This method/approach utilizes the knowledge distillation approach to enhance the quality of the learned embeddings of cold-start and tail nodes. A teacher GNN model uses a graph structure to insert the nodes onto a low dimensionality manifold. The student aims to discover a mapping from the features of the nodes to the low dimensionality manifold. The student achieves this task without having information on the graph held by the teacher. This scenario is represented in Figure 2 (left). In the graph, four phases contribute to the learned embedding of the node, as represented in Figure 2 (Right). 1) Self-label 2) neighbor labels 3) self features 4) neighbor features. 

Teacher model of cold brew: Structural embedding GNN

In this phase, the teacher GNN is trained to study an additional set of node embeddings that can be integrated. This is known as Structural Embedding (SE). SE uses backpropagation to include additional details besides original node features.

Student MLP model of Cold Brew

The student model is categorized into two modules. The first MLP module replicates the node embeddings produced by the GNN teacher. Also, a set of virtual neighbors are estimated from the graph for a particular node. A second MLP module assists the virtual neighborhood and the target node. It also transforms them into the embedding of interest.    

Discussion and Conclusion

The experiment is executed on five distinct open-source datasets and four proprietary datasets. The proprietary datasets consist of anonymized logs of the eCommerce store Ecomm 1/2/3/4. The utilized data consists of the bottom 10% of the degree distribution. Also, all the edges emanating from these nodes are artificially imparted. 

This paper investigates the challenge of generalizing GNNs whose neighborhood information is either sparse/unreliable or completely absent. It presents a solution to overcome this challenge by proposing a teacher-student knowledge distillation method to effectively generalize the isolated nodes. The approach also recommends a method to improve the smoothness of a node and virtual neighbor discovery phase for the student part. This phase will help rediscover the latent neighborhoods where message passing can be accomplished. An FCR metric is utilized that will choose the finest model architecture for GNN teachers and MLP students and quantify the difficulty of truly inductive illustration.



Priyanka Israni is currently pursuing PhD at Gujarat Technological University, Ahmedabad, India. Her interest area lies in medical image processing, machine learning, deep learning, data analysis and computer vision. She has 8 years of teaching experience to engineering graduates and postgraduates.