IBM and Earlham Institute Researchers Demonstrate The Power of AI And Machine Learning (ML) Based Models For Deeper Insight Into The Circadian Clock


The research on the power of AI and machine learning-based approaches for better understanding the circadian clock and its regulation has been published by the IBM and Earlham Institute scientists in Proceedings of the National Academy of Sciences of the United States of America (PNAS).

Anyone who has gone a long distance by plane will tell you that jetlag is the most frustrating part of the trip. While there are numerous ways to deceive the body, it’s challenging to go against our natural, inner rhythm, which governs our 24-hour sleep-wake cycles.

So, why do our bodies go haywire when we travel to a different time zone?

Let’s learn how artificial intelligence (AI) and machine learning (ML) could assist scientists in better comprehending the inner 24-hour cycles—or circadian rhythms—that are part of an organism’s internal body clock. Understanding circadian regulation and function in living creatures could lead to new insights into how human bodies operate or how to influence crop yields, for example.

Identifying circadian rhythmicity

Most living organisms have circadian rhythms essential for life on Earth; the most well-known example is the human sleep-wake cycle. The term “circadian” originates from the Latin phrase “Circa Diem,” which signifies “about a day.” A circadian clock, a biological oscillator synchronized with solar time or the sun’s position in the sky, essentially drives our inner rhythm.


Internally synchronised circadian clocks enable most living organisms, including animals, plants, fungi, and even cyanobacteria, to anticipate daily environmental changes corresponding to the day-night cycle and modify their biology and behaviour appropriately. As a result, when we feel jet lag, we are experiencing a chronobiological issue. Our body clocks have become misaligned due to changes in environmental stimuli like light and temperature.

A gene implicated in the circadian clock will typically oscillate between an off-on state and an on-off state during 24 hours. The term for this pattern is circadian rhythmicity. Detecting circadian rhythmicity with present approaches is problematic. It necessitates creating long, high-resolution time-series datasets to quantify gene expression throughout the day utilising sequencing technologies called transcriptomic datasets. This method is quite expensive, and it also takes time for laboratory scientists to produce. As a result, we only understand how genes are controlled and regulated in a circadian clock.

The first plant whose genome was sequenced was Arabidopsis thaliana, a tiny flowering weed and is a widely used scientific model organism in plant biology and genetics research. It allowed biologists and geneticists to understand better many plant properties’ molecular biology and genetics, including circadian control.

ML models to predict circadian gene regulation and expression patterns were trained using freshly created datasets, published temporal datasets, and Arabidopsis genomes. Compared to existing state-of-the-art models, the new ML models categorised circadian expression patterns using successively lower numbers of transcriptomic timepoints, improving accuracy.

The model interpretation enabled the estimation of the optimum transcriptomic timepoints for sampling. Then, instead of using transcriptome timepoint information, ML models used DNA sequence characteristics generated from public genomic resources to differentiate circadian transcripts, enabling us to predict circadian gene regulation by looking at the genome sequence.

This decision was grounded on the premise that transcription factors (and other factors) that bind to regulatory DNA sequences are an effective method of gene expression control, whether through circadian or other processes. Transcription factors are essential molecules that regulate gene expression by controlling when, where, and how much genes are expressed. They govern the transcription of DNA into mRNA by binding to particular DNA sequences.

Explainable AI

The machine learning models’ interpretation sheds light on what’s happening within the “black box”. DNA sequence features were ranked using a transcript-specific local model explanation, which provided a thorough profile of each transcript’s putative circadian regulating processes.

Using the local explanation obtained from ranking DNA sequence features, it discerned the temporal phase of transcript expression and, as a result, hidden sub-classes inside the circadian class, such as whether a transcript is more likely to peak during the day than at night, were uncovered.


Finally, based on a single transcriptome timepoint, the algorithms can estimate circadian time, which finds unique marker transcripts that have the most significant impact on correct predictions, making it easier to spot changes in circadian clock operation in current datasets. These explainable AI applications may change how we reuse public data and build testable hypotheses to understand gene expression control better.

This study outlines several AI and machine learning-based techniques that can improve the cost-effectiveness of circadian regulation and function analysis. While starting with only Arabidopsis because of its enormous genome resources, this approach has broad implications for other complicated or temporal gene expression patterns. Furthermore, they adapted the ML strategy for wheat in the  published work to demonstrate that the approaches can provide a reliable analysis of essential food crops. However, the technology is not limited to plants. For example, circadian clock dysregulation has been linked to various disorders ranging from depression to cancer.