Recent Research on Manifolds in Commonly Used Atomic Fingerprints and Failure to Machine Learning Four-Body Interactions

Atomic fingerprints are often employed in machine learning situations to characterize atoms’ immediate environs. Encoding information such as link lengths to surrounding atoms or crystal structure coordination numbers is an essential aspect of a structural search strategy that eliminates redundant structures. 

The Cartesian Coordinates of the atoms in a system, on the other hand, are not a helpful fingerprint since they do not encode the energy invariance under specific operations. A fingerprint must remain invariant during uniform translations, rotations, and permutations of identical atoms in the system. Furthermore, fingerprints should be unique because if two environments’ fingerprints are identical, they are assured to be identical. If this requirement is not met, a machine learning system that uses such a fingerprint as an input will assign the same energy to two separate non-degenerate structures. 

Researchers at the University of Basel investigated the behavior of two widely used fingerprints, the smooth overlap of atomic positions (SOAP) and the atom-centered symmetry functions ( ACSF). ACSF and SOAP are two fingerprints for the potential energy surface extensively utilized in machine learning.


Radial symmetry functions, which are sums of two-body terms and describe an atom’s radial environment, and angular symmetry functions, a sum of three-body terms and describe an atom’s angular environment, make up ACSFs.


In a SOAP fingerprint, a Gaussian is centered on each atom within a cutoff distance around the reference atom. The obtained atom density is then extended by multiplying it by a cutoff function that reaches zero at a cutoff radius over a particular width. It is then developed in terms of spherical harmonics and orthogonal radial functions.

They are investigated under finite changes in atomic locations and show that manifolds with quasi-constant fingerprints exist. The eigenvectors of the sensitivity matrix with quasi-zero eigenvalues are followed numerically to find these manifolds. Because of such manifolds in ACSF and SOAP, machine learning of four-body interactions like torsional energies, part of specific force fields, fails. Due to the many-body nature of the Overlap Matrix (OM) fingerprint.

The framework was tested on three small molecules, H2O, NH3, and CH4, to investigate scenarios in which a core atom is surrounded by 2, 3, or 4 adjacent atoms, respectively.SOAP and ACSF fingerprints are insensitive to specific movements. This means that manifolds with quasi-constant fingerprints can be built. For NH3 and CH4, such fingerprints have slight variances that behave numerically like constant fingerprints. Torsional energies of this order govern the folding of proteins and other macromolecules. As a result, to prevent machine learning of torsional energies, it is necessary to regard alternative configurations with differing torsional energy to be quasi-identical.

The constraints of these fingerprints are that they can only be recreated with a limited degree of precision in any machine learning method that employs them. According to the researchers, the SOAP fingerprint is no better than ACSF at resolving four-body words. Ultimately, using the so-called Overlap Matrix (OM) fingerprint approach, the scientists could not uncover any such manifold for any of the chemicals or configurations investigated.

On the other hand, the fingerprint permanently recognizes structural differences since it is based on a diagonalization. In this approach, a minimal basis set of Gaussian orbitals is placed on all atoms within a cutoff radius before the overlap matrix between all orbitals is generated. As a result, it is a compelling alternative for resolving structural discrepancies with high fidelity, as well as a variety of other uses.