In A New AI Research, Federated Learning Enables Big Data For Rare Cancer Boundary Detection

The number of primary observations produced by healthcare systems has dramatically increased due to recent technological developments and a shift in patient culture from reactive to proactive. Clinical professionals may become burned out since such observations need careful evaluation. There have been several attempts to develop, assess, and ultimately translate machine learning (ML) technologies into clinical settings to address this issue and lessen the load on clinical professionals by identifying pertinent links among these observations. In particular, deep learning (DL) has made strides in ML and has shown promise in tackling these challenging healthcare issues.

According to the literature, robust and accurate models must be trained on huge quantities of data, the variety of which impacts how well the model generalizes to “out-of-sample” situations. However, there are issues with their generalizability on “out-of-sample” data or data from sources that did not take part in model training. To overcome these issues, models must be trained on data from different sites representing various demographic samples. “Centralized learning” (CL), in which data from several locations are exchanged in a single place after inter-site agreements, is the current paradigm for such multi-site cooperation.

Due to privacy, data ownership, intellectual property, technological difficulties (such as network and storage restrictions), and compliance with various governmental laws, data centralization is difficult to scale (and may not even be practicable), particularly at a worldwide level. When opposed to models trained using the centralized paradigm, “federated learning” (FL) refers to a paradigm where models are taught by simply exchanging model parameter updates from decentralized data (i.e., each site stores its data locally) (CL).

Thus, FL can provide an alternative to CL, possibly leading to a paradigm change that reduces the requirement for data sharing, increases access to geographically dispersed collaborators, and subsequently expands the volume and variety of data used to train ML models. Health inequities and underserved communities are some of the issues that FL can help with by allowing ML models to learn from a wealth of data that would otherwise be unavailable. In light of this, they concentrate on the “rare” disease of glioblastoma in this article, emphasizing how multi-parametric magnetic resonance imaging (mpMRI) scans may be used to determine the extent of the disease.

Although glioblastoma is the most prevalent malignant primary brain tumor, its incidence rate (i.e., 3/100,000 individuals) is far lower than the rate required to meet the criteria of a rare illness (i.e., 10/ 100,000 people). Hence it is still categorized as a “rare” disease. Collaboration between geographically disparate sites is required because a single site cannot amass big and varied datasets to train reliable and generalizable ML models. The median overall survival of glioblastoma patients following standard-of-care treatment is only 14.6 months, and their median survival without treatment is only four months, despite significant attempts to improve the prognosis of these patients with rigorous multimodal therapy. Despite advancements in glioblastoma subtyping and the expansion of standard-of-care treatment choices over the past 20 years, overall survival has not significantly increased.

This reflects the necessity for analysis of bigger and more diverse data to understand better the illness and the main challenge in treating these tumors, which is their inherent heterogeneity. Glioblastomas have three main sub-compartments in terms of their radiologic appearance: 

  1. The “enhancing tumor” (ET) represents the breakdown of the blood-brain barrier within the tumor.
  2. The “tumor core” (TC), which combines the ET and the necrotic (NCR) part and represents the surgically relevant part of the tumor
  3. The “whole tumor” (WT).

To better quantify and evaluate these various uncommon diseases and eventually have an impact on clinical decision-making, it is crucial to identify these sub-compartment borders. The results of these investigations confirmed the advantages of the FL process, which was based on an aggregate server and had a performance nearly equal to CL for this use case. This definition of the task as a multi-parametric multi-class learning problem is vital.

As opposed to merely transcribing a categorical entry from medical records, this study dealt with a multi-parametric multi-class challenge with reference standards that demand professional doctors to follow a careful manual annotation methodology. Additionally, due to differences in scanner technology and acquisition techniques, consistent preprocessing pipelines were created at each participating location to manage the different aspects of the mpMRI data. These elements, together with the study’s extensive global scope and job difficulty, set it apart.

The main scientific contributions of this manuscript are I demonstrating the effectiveness of FL at such scale and task complexity as a paradigm-shifting approach; (ii) making a potential impact for the treatment of the rare disease of glioblastoma by publicly releasing clinically deployable trained consensus models; and, most importantly, (iii) paving the way for more successful FL studies of increased scale and task complexity. Data and code are available on GitHub.


Check out the Paper and Github. All Credit For This Research Goes To Researchers on This Project. Also, don’t forget to join our Reddit page and discord channel, where we share the latest AI research news, cool AI projects, and more.

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others...