Despite the progress of artificial neural networks in recent years, researchers are still unable to train these systems to extrapolate compositional rules seen during training beyond just simple problems. This is known as a systematic generalization, and although there have been some advances towards this goal, it remains unsolved today.
Although there has been progress on the SCAN dataset with some methods achieving 100% accuracy, they suffer from limited flexibility and performance gains. To improve research in systematic generalization, we need to focus not only on this specific dataset but also on others as well.
The baseline Transformer models that are released together with the dataset typically fail at the task. However, these configurations of these baseline models might be questionable for this problem because some standard practices from machine translation may not have been applied without modification and existing techniques such as relative positional embedding were also not used in most cases.
To develop and evaluate methods that improve systematic generalization, it is necessary to have good datasets and strong baselines. For example, one must avoid a false sense of progress over bad baselines when evaluating the limits of existing architectures.
Researchers from The Swiss AI Lab IDSIA have demonstrated that Transformers perform better than was previously thought at a variety of reasoning tasks. They show that careful designs of model and training configurations are particularly important for these reasoning tasks testing systematic generalization, such as early stopping strategy and relative positional embedding can be used to improve the performance of the baseline Transformers.
The research group conducted experiments on five datasets: SCAN, CFQ, PCFG, COGS and Mathematic. In particular their new models improved the accuracy of the productivity split from 50% to 85%, systematically split from 72% to 96%. On top of that they were able increase performance in a controlled experiment with COGS dataset by 81%. On the SCAN dataset, they showed that their models with relative positional embedding largely mitigated the so-called end of sentence (EOS) decision problem. With a cutoff at 26, 100% accuracy was achieved on length split datasets.
The research group show that despite the significant performance gap between models, all perform equally well in an IID validation set. This means you need to use a proper generalization validation test for building neural networks for systematic generalization. Thus in this research work, researchers studied how to improve the performance of Transformer architectures on many datasets for systematic generalization. Their approach involved revisiting basic models and training configurations.