A lifespan face synthesis model aims to create a set of photo-realistic images that show what someone’s whole life would look like, given just one picture as a reference. The generated image is expected to be age-sensitive with realistic transformations in shape and texture while maintaining their identity. This task is challenging because faces undergo separate but highly nonlinear changes when it comes to how they change due to ageing; for example, skin loses elasticity which can make them appear wrinkled or saggy more quickly than other parts of your body might start changing.
The latest LFS (lifespan face synthesis) models are based on the new generative adversarial networks that use conditional transformations to allow people’s age code to be seen. They have significantly benefitted from recent advancements of GANs, and they’re still improving every day. Still, without disentangling their latent representations into texture, shape and identity factors, it makes them limited in modeling nonlinear effects, which can happen when a person ages.
An Ideal LFS model must meet three requirements: 1) For an age-sensitive shape and texture transformation, the bioplausible target should be reflected in a reference image. 2) Identity preservation should be maintained no matter how large the age gap is between the target and reference; the generated image must depict the same person. 3) For reconfigurable purposes, it needs to transform from one appearance into another with as much similarity to its original form where possible when both of their ages fall within each other’s range of time span (when these ranges intersect).
Apart from above requirements, disentanglement is very crucial to the LFS because any changes made would not be as effective without it. There are many different transformations that happen in an individual’s shape and texture appearance over time which can’t normally be shown unless you detach them from each other first. This means editing becomes difficult when unwanted edits occur since they wouldn’t have been done on anything but this representation of themselves at a certain age.
For the first time, the research team from the University of Surrey, Leibniz Universität Hannover, and the University Twente introduce a new LFS model, which separates shape, texture, and identity information into different layers. This new conditional GAN has an encoder-decoder architecture. The first step in the model is to extract features from different layers of a shared CNN encoder. Once that has been done, two novel modules are developed based on conditional convolution and channel attention respectively to represent how shape and texture change with age. Lastly, to facilitate the disentanglement of shape and texture, a regularization loss is introduced on shape based on intuition that they experience small changes in their shapes whenever adults grow older. This new “disentanglement LFS model” can effectively overcome limitations seen with state-of-the-art competitors and meet all three requirements simultaneously to an excellent degree.
The benefits of this new research, which includes the first time modeling faces in an end-to-end trained lifespan face synthesis (LFS) model, are that researchers could explicitly shape and texture a person’s features for the first time. The researchers proposed separate modules based on conditional convolution and channel attention, respectively, along with regularization loss to facilitate disentanglement between shape ageing process nonlinearities from textures.