Machine learning has been widely used in finding efficient solutions in the healthcare sector to help in diagnosing and treating various illnesses.
Sarepta Therapeutics, located in Cambridge, Massachusetts, produced a breakthrough drug in 2016 that directly targets the mutated gene responsible for Duchenne muscular dystrophy (DMD). This rare genetic condition weakens muscles in young boys throughout the body until the heart or lungs fail.
This drug uses Antisense phosphorodiamidate morpholino oligomers (PMO), a huge synthetic molecule that permeates the cell nucleus and modifies the dystrophin gene. This allows for the production of a vital protein that is ordinarily absent in DMD patients. PMO, however, is ineffective at entering cells.
Recently, MIT researchers introduced a systematic approach to address this issue. The proposed method uses AI to discover nontoxic and highly active peptides that can be attached to PMO to aid delivery. The team says that finding suitable peptides is a challenging task, and by leveraging AI to do this task, they hope to accelerate the development of gene therapies for DMD and other diseases. The study was supported by MIT Jameel Clinic, MIT-SenseTime Alliance, Sarepta Therapeutics, and the National Science Foundation.
CPPs are cell-penetrating peptides, short chains of amino acids that can be linked to a drug to help with delivery. While a single CPP can improve drug delivery, many CPPs linked together to have a synergistic effect. These long chains of amino acids are known as miniproteins.
The researchers built a library of 600 mini proteins connected to PMO by mixing and matching 57 distinct peptides. They also determined how successfully each miniprotein could transport its cargo across the cell. Thus each mini protein in the dataset is labelled with its activity, indicating its potential to infiltrate the cell.
The team also intended to generalize the model so that it could work with any amino acid. Instead of mapping each component to a series of binary variables with one-hot encoding, the team used topological fingerprinting to represent amino acids by creating a unique barcode for each sequence. Each line in the barcode indicates either the presence or absence of a specific molecular substructure. The model can also represent an unknown sequence as a barcode, which is consistent with the rules that the model is trained on. This aided the researchers in expanding their dataset of possible sequences.
They used the miniprotein library to train a convolutional neural network (CNN). Initially, the model recommended miniproteins containing arginine (an amino acid that tears a hole in the cell membrane), which isn’t optimal for keeping cells alive. Researchers utilized an optimizer to decentivize arginine, which prevented the model from cheating.
The team used the barcode representing sequence structure to overlay the model’s predictions. By doing so, particular places that the model believes play the most important role in high activity are highlighted. The team states that although it is not perfect, it provides targeted regions to experiment with. That information will undoubtedly aid them in designing new sequences empirically in the future.
According to the researchers, the ML model identified more effective sequences than any previously known variant. They notice that one of these sequences can increase PMO delivery by 50-fold. They tested their predictions and demonstrated that the miniproteins are harmless by injecting mice with these computer-suggested sequences.
Though it is unclear how this research will affect patients in the long run, improved PMO delivery will undoubtedly benefit patients in many ways. Patients may experience fewer adverse effects if exposed to lower quantities of the medicine (PMO is administered intravenously, often every week). It’s also possible that the treatment will become less expensive.