Microsoft AI Research Introduces A Huge Synthetic-Face Dataset Along With A Face Analysis Method Using Synthetic Data Alone

We often forget that the most challenging part about machine learning isn’t choosing a correct model; it’s finding good data. There are concerns with fairness and ethics in human-related computer vision when collecting or labeling real-world examples for training models. The process of collecting and labeling real data can be slow, expensive, or subject to bias. Synthesizing training data using computer graphics allows a more valid representation that is faster than the traditional methods. As a researcher with synthetic data, you can guarantee perfect labels without annotation noise and generate a rich label that would otherwise be impossible to annotate manually.

Through this research paper, a research group from Microsoft believes that synthetic data is much wider than before and achievable today. Microsoft researchers have developed a new method of acquiring training data for faces – rendering 3D face models with unprecedented levels of realism and diversity. This is just one way they are looking at solving the issue.

Developing a synthetic framework is not easy. It requires significant expertise and investment to make it happen, but once implemented; you will generate wide variety of training data with minimal incremental effort. The researcher group generates new face images by procedurally combining the parametric face model with high-quality artist-created assets. The synthesis process includes using textures, hair, and clothing to create an endless variety of different people from one base image. This is a huge opportunity for synthetic data, and the ability to generate and edit synthetic facial features with ease opens the door into many other industries as well.

The human face is one of the most challenging areas for synthetic data synthesis. The research group has attempted to bridge this domain gap with various methods, such as mixing real and fake training sets or adapting models trained on synthetic images to work better in reality. This paper further describes how to combine a procedurally-generated parametric 3D face model with a comprehensive library of handcrafted assets for rendering training images in unprecedented realism and diversity. The discussed research paper and project details are given below.



Asif Razzaq is an AI Journalist and Cofounder of Marktechpost, LLC. He is a visionary, entrepreneur and engineer who aspires to use the power of Artificial Intelligence for good.

Asif's latest venture is the development of an Artificial Intelligence Media Platform (Marktechpost) that will revolutionize how people can find relevant news related to Artificial Intelligence, Data Science and Machine Learning.

Asif was featured by Onalytica in it’s ‘Who’s Who in AI? (Influential Voices & Brands)’ as one of the 'Influential Journalists in AI' ( His interview was also featured by Onalytica (