This article is based on the research paper 'FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours'
DeepMind released AlphaFold 2 last year, which made headlines for its incredible accuracy in protein structure prediction. The success of AlphaFold demonstrated that deep neural networks might be used to solve challenging and critical structural biology problems.
FastFold is a highly effective protein structure prediction model formulation for training and inference developed by a group of researchers from the National University of Singapore. Although AlphaFold 2 is a game-changer in protein structure prediction, training and inference remain time-consuming and costly. This is something that the study team is concerned about.
What makes it better? FastFold can cut AlphaFold 2’s overall training time in half, from 11 days to 67 hours, and deliver 9.5x speedups for long-sequence interpretation while scaling to 6.02 PetaFLOPs and 90.1 percent parallel efficiency. FastFold is credited with saving a significant amount of money while simultaneously improving training and inference.
How have the researchers contributed to this model?
The performance of AlphaFold operators is optimized using AlphaFold-specific performance characteristics. When paired with kernel fusion, the kernel implementation achieved considerable speedups. The research team proposed dynamic Axial Parallelism. Other model parallelism methods have a higher communication overhead. In terms of communication optimization, the suggested Duality Async Operation uses a dynamic computational graphs framework like PyTorch to implement computation-communication overlap.
The AlphaFold model training was successfully expanded to 512 NVIDIA A100 GPUs, yielding an aggregate of 6.02 PetaFLOPs in the training stage. Our FastFold accelerates large sequences by 7.5 to 9.5 times and enables inference over exceedingly long sequences at the inference stage. The whole training duration is decreased from 11 days to 67 hours, resulting in significant cost savings.
The traditional AlphaFold model is made up of several pieces:
- This component encodes the multiple sequence alignment (MSA) and template information of the target sequence into MSA representations, including the co-evolving information of all similar sequences and pair representations carrying the interaction information of residue pairs in the sequences.
- Evoformer blocks: MSA Stack and Pair Stack approaches are used to feed representations into MSA and pair representations. The highly processed modeling information from the generated models is sent into a structure module, which produces a three-dimensional structure prediction for the protein.
- Evoformer backbone: The communication overhead is reduced by using Dynamic Axial Parallelism, a unique model parallelism method that exceeds existing standard tensor parallelism in terms of scaling efficiency.
For communication optimization, the researchers created a Duality Async Operation to implement computation-communication overlap in dynamic computational graph frameworks like PyTorch.
FastFold was also compared to AlphaFold and OpenFold by the researchers. FastFold significantly reduces the time and cost of training and inference for baseline protein structure prediction models, cutting overall AlphaFold training time from 11 days to 67 hours, achieving 7.5 9.5 speedups for long-sequence implication, and scalability to an aggregate 6.02 PetaFLOPs with 90.1 percent parallel efficiency.
FastFold’s excellent model parallelism scaling efficiency establishes it as a viable solution to AlphaFold’s enormous training and inference processing overhead.