Researchers from UC San Diego Introduce EUGENe: An Easy-to-Use Deep Learning Genomics Software

Deep learning is being used in all spheres of life. It has its utility in every field. It has a big impact on biomedical research. It is like a smart computer that can get better at tasks with little help. It has changed the way scientists study medicine and diseases.

It is impactful in genomics, a field of biology that investigates the organization of DNA into genes and the processes through which these genes are activated or deactivated within individual cells.

Researchers at the University of California, San Diego, have formulated a new deep-learning platform that can be quickly and easily adapted to suit various genomics projects. Hannah Carter, Ph.D., associate professor in the Department of Medicine at UC San Diego School of Medicine, said each cell has the same DNA, but how DNA is expressed changes what cells look and do.

EUGENe uses modules and sub-packages to facilitate essential functions within a genomics deep learning workflow. These functions include (1) extracting, transforming, and loading sequence data from various file formats; (2) instantiating, initializing, and training diverse model architectures; and (3) evaluating and interpreting model behavior.

While deep learning holds the potential to offer valuable insights into the diverse biological processes governing genetic variation, its implementation poses challenges for researchers needing more extensive expertise in computer science. Researchers said that the objective was to develop a platform that enables genomics researchers to streamline their deep learning data analysis, facilitating extraction of predictions from raw data with greater ease and efficiency.

Even though only about 2% of the total genome consists of genes encoding specific proteins, the remaining 98%, often denoted as junk DNA due to its purported lack of known function, plays a pivotal role in determining the timing, location, and manner in which certain genes are activated. Understanding the roles of these non-coding genome sections has been a top priority for genomics researchers. Deep learning has proven to be a powerful tool for achieving this goal, though using it effectively can be difficult.

Adam Klie, a Ph.D. student in the Carter lab and the first author of the study, said that Many existing platforms require many hours of coding and data wrangling. He noted that numerous projects necessitate researchers to commence their work from scratch, requiring expertise that may not be readily available to all labs interested in this domain.

To evaluate its efficacy, the researchers tested EUGENe by attempting to replicate the findings of three previous genomics studies that used a variety of sequencing data types. In the past, analyzing such diverse data sets would require integrating several different technological platforms.

EUGENe demonstrated remarkable flexibility, effectively replicating the outcomes of every investigation. This flexibility highlights the platform’s ability to manage a wide range of sequencing data and its potential as an adaptable instrument for genomics research.

EUGENe shows adaptability to different DNA sequencing data types and support for various deep learning models. The researchers aim to broaden its scope to encompass a wider array of data types, including single-cell sequencing data, and plan to make Eugene accessible to research groups worldwide.

Carter expressed enthusiasm about the project’s collaborative potential. He said that one of the exciting things about this project is that the more people use the platform, the better they can make it over time, which will be essential as deep learning continues to evolve rapidly.


Check out the PaperAll credit for this research goes to the researchers of this project. Also, don’t forget to join our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

Rachit Ranjan is a consulting intern at MarktechPost . He is currently pursuing his B.Tech from Indian Institute of Technology(IIT) Patna . He is actively shaping his career in the field of Artificial Intelligence and Data Science and is passionate and dedicated for exploring these fields.

🚀 LLMWare Launches SLIMs: Small Specialized Function-Calling Models for Multi-Step Automation [Check out all the models]