NVIDIA DGX A100 and NVIDIA Clara Parabricks Accelerates Whole Genome Sequencing Data Processing at National Biobank of Thailand

3410
Source: https://www.nvidia.com/en-sg/news/national-biobank-of-thailand-accelerates-genomic-discoveries-with-nvidia-dgx-a100-and-nvidia-clara-parabricks/

GPUs are revolutionizing healthcare, from accelerating traditional CPU workflows to helping to create artificial intelligence models and platforms. Artificial intelligence is already revolutionizing the healthcare industry. Over the past several decades, there has been an explosion in parallel computing power, leading to the prevalence of graphics processing units (GPUs) to train machine-learning systems. In addition, massive quantities of available data (as seen in the human genome) and more advanced computing power (the advent of GPU hardware) drive the need for fast and accurate tools in healthcare and life sciences. One example of using AI and GPU usage in healthcare is NVIDIA’s National BioBank of Thailand’s genomics project.

Vanessa Braunstein, a Healthcare AI Product Marketing Lead at NVIDIA, shares additional details on the Genomics Project at the National Biobank of Thailand with a correspondent from Marktechpost. 

1. Marktechpost: Tell us about the goals of the genomics project at the National Biobank of Thailand.

Vanessa: This project is a government-led research initiative with the National BioBank of Thailand. This initiative is really around sequencing the population of Thailand. The Thai government believes in genomic and genetic sequencing to help their population stay healthy. They want to find what diseases are affecting the Thai people to do a better job of conquering and identifying those diseases earlier. So it’s really about preventative medicine and treating patients more specifically based on the specific genetic variants.

2. Marktechpost: How Do AI and GPU Play a Role in a Genome Sequencing Solution for Thailand?

Vanessa: Genome Sequencing is a data-intensive process, and it was a significant challenge for the National Biobank of Thailand (NBT). The project aimed to extract whole-genome sequencing data from 50,000 volunteers to help build a better public data set, help identify mutations, and deliver better care. A critical factor was the sheer amount of time it would take time for whole-genome sequencing (WGS). Whole-genome sequencing can take 3-7 days for one person using CPU, times 50,000 people.

The parallel processing power of GPUs has significantly accelerated the entire process. The NVIDIA DGX A100 system is a supercomputer that offers this processing power. NVIDIA Clara Parabricks are computational pipelines specialized for genomics analysis that can analyze and assess the human genome.

The combined NVIDIA DGX A100 and NVIDIA Clara Parabricks can help analyze large populations faster than traditional methods on CPUs. Parabricks’ genomic analysis tools are for DNA and RNA and are designed to deliver blazing fast speeds at high accuracy and low overall costs. For example, Parabricks can analyze a whole genome in 45 minutes on a GPU, compared to 30 hours for 30x coverage WGS data on a CPU. Coverage refers to the number of times an NGS instrument will sequence a genome, so 30x WGS means your genomes will be sequenced an average of 30 times. Additionally, Parabricks GPU workflows show a 99.5% variant call concordance rate compared to clinically validated CPU workflows, which backs up its accuracy claims.

Regarding AI, Parabricks added DeepVariant in 2020, in addition to its existing DNA and RNA tools. DeepVariant an open-source tool that identifies genome variants in sequencing data using AI to build a more accurate picture of a person’s genome from genomic sequencing data. The addition of DeepVariant to Clara Parabricks Pipelines brings highly accurate variant calling for both short- and long-read sequencing data to the community. 

3. Marktechpost: What is the NVIDIA DGX A100 system?

Vanessa: The NVIDIA DGX A100 system is the world’s first five petaflops AI system and is called “the universal system for every AI workload.” The DGX A100 system is a workhorse to run many projects: it’s a high-performance computing solution, whether running a genome research center or a massive data center.

4. Marktechpost: What is NVIDIA Clara Parabricks? 

Vanessa: NVIDIA Clara Parabrick Pipelines is an accelerated compute framework that transforms next-generation sequencing data from sequencing reads through variant calling, running on NVIDIA GPUs. This toolkit consists of many of the industry’s gold standard, open-source tools, including GATK Haplotype Caller and Google’s DeepVariant. The toolkit can be deployed to provide end-to-end workflow support for population, cancer, and all types of RNA applications. By accelerating multiple variant callers, users can extract more information from their sequencing data and do it faster than ever before. With run times of less than an hour for a whole human genome, a user can process more genomes using more variant callers to make the most of their data faster than ever before. In addition, Parabricks is constantly being updated with callers and functionalities. As a result, individual researchers analyzing a few genomes can adopt Parabricks in their genomic analysis pipelines, just as much as a large sequencing center running hundreds of samples.

5. Marktechpost: How can NVIDIA DGX A100 help revolutionize the healthcare industry?

Vanessa Braunstein believes the DGX A100 system is “going to revolutionize genomic analysis and drug development in terms of bringing drugs to market faster with computational tools and ensuring therapeutics are more personalized for people’s DNA.’ In the near term future, she sees it impacting speed and accuracy in preventative medicine. Below are some critical places that Vanessa sees that the NVIDIA DGX A100 and NVIDIA Clara Parabricks can impact the healthcare industry:

a.)  population studies

Shirokane supercomputer in Japan is currently using the NVIDIA DGX A100 and NVIDIA Clara Parabricks solution. Read more here.

Population study example: the genes among a group of people that currently have cardiac disease. If you can identify ten genes that cause cardiac disease in the population, we can create medicines to target those genes that produce proteins in a disease pathway. This way, we can treat patients not only faster but also more specifically based on their DNA. 

b.) cancer centers

Cancer example: When someone is diagnosed with cancer, typically, there’s a race to treat the patient quickly, so it doesn’t metastasize. Lung cancer is a form of cancer that is very treatable if you can catch it early. But if you don’t catch it early – because it is often asymptomatic to the patient – lung cancer tumors can grow and metastasize fast. In lung cancer, some people smoke their whole life, and they don’t get lung cancer. Others don’t smoke, and they still get cancer. There is a genetic component to a lot of these cancers. If you can get genetically tested and realize you could be prone to getting lung cancer, you may get screened more often, and tumors can be detected and treated earlier. 

c.) the NICU and PICU in hospitals.

NICU example: A situation where a child is born and does not thrive after being born. When a child is not thriving, a hospital can quickly sequence her DNA to figure out if they have a genetic variant causing symptoms or disease. With this knowledge, the medical team can be better equipped to treat the child based on their genetic variant and symptoms. 

d.) rare genetic diseases

Rare genetic disease example: When a child is born and not thriving or missing developmental milestones, a rare genetic disease may be the culprit. However, very few clinical teams order whole-genome sequencing to identify rare genetic diseases. These children and their families then go on a diagnostic odyssey where they see multiple doctors and hospitals and take multiple medications to treat symptoms without much improvement. Early whole genome testing can help identify rare diseases earlier in these children and start the right treatments and clinical care earlier.  

SOURCE:

  1. Reference: National Biobank of Thailand Accelerates Genomic Discoveries with NVIDIA DGX A100 and NVIDIA Clara Parabricks https://www.nvidia.com/en-sg/news/national-biobank-of-thailand-accelerates-genomic-discoveries-with-nvidia-dgx-a100-and-nvidia-clara-parabricks/
  2. NVIDIA DGX A100: universal system for all AI workloads, offering unprecedented compute density, performance, and flexibility in the world’s first 5 petaFLOPS AI system. It features NVIDIA A100 Tensor Core GPUs: https://www.nvidia.com/en-us/data-center/dgx-a100/
  3. NVIDIA Clara Parabricks: 30 day free license available for people to try  Parabricks.   https://www.nvidia.com/en-us/clara/genomics/
  4. GitHub: https://github.com/clara-parabricks
  5. Bio: Vanessa Braunstein leads product marketing for NVIDIA Healthcare.  Previously, she was in strategy, product development and marketing for genomics, medical imaging, pharmaceutical, and clinical diagnostic companies. She received her BA from UC Berkeley in molecular and cell biology, and then studied public health and business at UCSF and UCLA. She has worked with life science researchers and the clinical community for the last few years on building and implementing AI to optimize workflows in hospitals, medical research institutions, pharmaceutical companies, and cancer centers.