Home Uncategorized Google AI Introduces GraphWorld: A Methodology For Analyzing The Performance Of GNN...

Google AI Introduces GraphWorld: A Methodology For Analyzing The Performance Of GNN Architectures On Millions Of Synthetic Benchmark Datasets

Source: https://blog.tensorflow.org/2022/05/portrait-depth-api-turning-single-image.html
This Article Is Based On The Research Paper 'GraphWorld: Fake Graphs Bring Real Insights for GNNs'. All Credit For This Research Goes To The Researchers 👏👏👏

Please Don't Forget To Join Our ML Subreddit

A graph is a structure consisting of a set of items in which some pairings of the objects are in some way “connected.” Graphs are helpful in representing natural systems that have connected relational components, such as social networks, traffic infrastructure, molecules, and the internet. 

GNNs are sophisticated machine learning (ML) that use their inherent connections of graphs to add context to predictions about individual objects or the graph as a whole. GNNs have been used to find novel medications, assist mathematicians in proving theorems, identify disinformation, and increase the accuracy of Google Maps arrival time forecasts.

Thousands of GNN variations have received a lot of interest recently. Many GNN articles employ the same 5–10 benchmark datasets, most of which are made up of readily labeled academic citation networks and biological datasets. However, methods and datasets for testing GNNs have gotten significantly less attention, restricting performance evaluation to only a few graphs. 

Some studies address these issues by assessing GNNs on a range of massive-scale graph datasets across a number of tasks, allowing for a more uniform GNN experimental design. However, the datasets involved come from many of the same sources as current datasets, such as citation and molecular networks, and thus do not address the dataset mentioned above variety issue.

A group of researchers from the Google graph mining team has proposed a methodology for measuring the performance of GNN designs on millions of synthetic benchmark datasets to match the volume and pace of GNN research. Their paper “GraphWorld: Fake Graphs Bring Real Insights for GNNs” demonstrates that their model allows researchers to investigate GNN performance in graph space areas not covered by famous academic datasets. Unlike academically available literature, GraphWorld builds this world using probability models, evaluates GNN models at every place, and derives generalizable insights from the findings. GraphWorld is also cost-effective since it can execute hundreds of thousands of GNN experiments on synthetic data for the same cost as one experiment on a big OGB dataset.

Researchers compare Open Graph Benchmark (OGB) graphs, an open-source package for benchmarking GNNs, to a considerably bigger collection (5,000+) of graphs from the Network Repository to demonstrate GraphWorld. The researchers demonstrate that while most Network Repository graphs are unlabelled and so cannot be utilized in standard GNN studies, they reflect a huge number of graphs that exist in the real world.

The clustering coefficient and the degree distribution Gini coefficient were estimated for the OGB and Network Repository graphs. The findings show that the OGB records are located in a small and poorly populated area of this metric space.

To explore GNN performance on a given job, a researcher first selects a parameterized generator (example below) that can generate graph datasets for stress-testing GNN models. The output dataset properties are influenced by the input generator parameter. GraphWorld uses parameterized generators to generate populations of graph datasets that are sufficiently variable to put state-of-the-art GNN models to the test.

The researchers have employed the well-known stochastic block model (SBM) to create datasets. The SBM initially groups or “clusters” a predetermined number of nodes, which serve as node labels for classification. It then creates connections between nodes based on a number of factors, each of which affects a distinct aspect of the resultant network.

People with similar interests are more likely to connect in social networks, a phenomenon known as homophily. The proposed model provides homophily as one of its parameters, which governs how two nodes from the same cluster are linked. GraphWorld uses the SBM to produce graphs with high homophily, low homophily, and millions of graphs with any amount of homophily in between, enabling users to examine GNN performance on graphs with various amounts of homophily without relying on real-world samples collected by other academics.

GraphWorld employs parallel computing to generate a universe of GNN benchmark datasets by sampling the generator parameter values given a job and parameterized generator for that task. It runs an arbitrary array of GNN models on each dataset at the same time and then produces a large tabular dataset that combines graph attributes with GNN performance results. Each pipeline needed less time and computing resources than state-of-the-art experiments on OGB graphs, indicating that GraphWorld is affordable to researchers on a tight budget.

Researchers first translate standard academic graph datasets to an x-y plane that measures cluster homophily (x-axis) and the average of node degrees (y-axis) inside each graph to demonstrate the influence of GraphWorld. After that, each simulated graph dataset is mapped from GraphWorld to the same plane and adds third z-axis, assessing GNN model performance across each dataset.

Two conclusions are obtained as follows:

1. GraphWorld creates graph dataset areas beyond the typical datasets’ coverage.

2. When graphs differ from academic benchmark graphs, the rankings of GNN models alter.

Classic datasets like Cora and CiteSeer have high homophily, which means that nodes in the network are well-separated according to their classifications. Researchers discovered that as GNNs progress towards the area of less-homophilous graphs, their ranks rapidly alter. This demonstrates that GraphWorld has the ability to identify crucial headroom in GNN architecture development that academic benchmarks’ limited datasets would otherwise hide.

By allowing researchers to scalably test novel models on a high-dimensional surface of graph datasets, GraphWorld opens new ground in GNN exploration. This enables fine-grained examination of GNN designs against graph attributes on large subspaces of graphs distal from Individual researchers without access to institutional resources can immediately learn the empirical performance of novel models thanks to GraphWorld’s low cost.

Researchers may also utilize GraphWorld datasets for GNN pre-training and examine unique random/generative graph models for more nuanced GNN experimentation. 

Paper: https://arxiv.org/pdf/2203.00112.pdf

Source: https://ai.googleblog.com/2022/05/graphworld-advances-in-graph.html

Join the AI conversation and receive daily AI updates