The greatest challenge in human genetics is arguably the complexity of the human genome and the vast diversity of genetic factors that contribute to health and disease. The human genome consists of over 3 billion base pairs, and it contains not only protein-coding genes but also non-coding regions that play crucial roles in gene regulation and function. Understanding the processes of these elements and their interactions is a monumental task.
Knowing that a genetic variant associated with a disease is only the beginning. Understanding the functional consequences of these variants, how they interact with other genes, and their role in disease pathology is a complex and resource-intensive task. Analyzing the vast amounts of genetic data generated by high sequencing technologies requires advanced computational tools and infrastructure. Data storage, sharing, and analysis pose substantial logistical challenges.
Researchers at Google DeepMind developed an AlphaMissense catalog using a new AI model named AlphaMissense, which they built. It comprises about 89% of all 71 million possible missense variants divided into pathogenic or benign categories. A missense variant is a genetic mutation that results in a single nucleotide substitution in a DNA sequence. Nucleotides are the building blocks of DNA, and they are arranged in a specific order. This sequence holds the fundamental genetic information and protein structure in living organisms. On average, a person caries more than 9000 missense variants.
These classifying missense variants help us understand which protein changes give rise to diseases. Their present model is trained on their previously successful model named AlphaFold’s data, which predicted structures for nearly all proteins known from the amino acids sequence. However, AlphaMissense only classifies the database of protein sequence and structural context of variants to produce scores between 0 and 1. Score 1 indicates the structure is highly likely a pathogen. For a given sequence, the scores are analyzed to choose a threshold for classifying the variants.
AlphaMissense outperforms all the other computational methods and models. Their model was also the most accurate method for predicting lab results, reflecting the consistency with different ways of measuring pathogenicity. Using this model, users can obtain a preview of results for thousands of proteins at a time, which can help to prioritize resources and accelerate the field of study. Of more than 4 million missense variants seen in humans, only 2% have been annotated as pathogenic or benign by experts, roughly 0.1% of all 71 million possible missense variants.
It’s important to note that human genetics is rapidly evolving, and advances in technology, data analysis, and our understanding of genetic mechanisms continue to address these challenges. While these challenges are significant, they also present exciting opportunities for improving human health and personalized medicine through genetic research. Decoding the genomes of various organisms also provides insights into evolution.
Check out the Paper and DeepMind Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Arshad is an intern at MarktechPost. He is currently pursuing his Int. MSc Physics from the Indian Institute of Technology Kharagpur. Understanding things to the fundamental level leads to new discoveries which lead to advancement in technology. He is passionate about understanding the nature fundamentally with the help of tools like mathematical models, ML models and AI.