Recent Studies Find Ways To Demystify AI Black Boxes


Deep learning neural networks, which are at the heart of modern artificial intelligence, are frequently characterized as “black boxes” with mysterious inner workings. However, recent research casts doubt on that notion, with profound privacy implications.

Unlike traditional software, which has its functions predetermined by the developer, neural networks learn to process or analyze data via practice. They do this by altering the strength of the connections between their numerous neurons over a countless number of epochs.

By the end of the procedure, their decision-making process has become so entangled in a web of links that it can be challenging to follow. As a result, even if you have access to the model itself, it’s frequently considered that figuring out the data that the system was trained on is nearly unattainable.

The Recent Breakthroughs 

Surprisingly, a recent study published in the MIT Technology Review casts doubt on this premise, demonstrating that two quite distinct techniques can be used to identify the data used to train a model. This might have significant ramifications for AI systems educated on sensitive data such as medical or financial records.

The technique targets generative adversarial networks (GANs), which are the AI systems that create deepfake images. Synthetic faces that are seemingly unrelated to real persons are increasingly being created using these techniques. Researchers at the University of Caen Normandy in France, on the other hand, demonstrated that they could easily link produced faces from a popular model to real persons whose data was used to train the GAN. They accomplished this by using a second facial recognition model to compare the created faces to training samples to determine whether they shared the same identity.

The images aren’t the same because the GAN altered them, but the researchers discovered many instances when created faces were related to images in the training set. In a paper explaining the findings, they point out that the generated face is often just the original face in a different stance. Though the method is limited to face-generation GANs, the researchers believe similar concepts may be applied to biometric data or medical photos.


Another exciting development, Nvidia researchers, have demonstrated that they can guess the data used to train the model without seeing any samples of the learned data in a recent study. They utilized a technique known as model inversion, which essentially runs the neural network backward. This technique is commonly used to evaluate neural networks, although it has only been used to recover input data on basic networks under stringent assumptions. The researchers revealed how they could scale the strategy to huge networks by breaking the problem up and performing inversions on each of the layers separately. They were able to reconstruct training data images using only the models themselves using this method.


While carrying out each assault is a complex procedure requiring close access to the model in question, they both show that AIs aren’t the black boxes we always assumed they were and that dedicated attackers may extract potentially sensitive data from them. Given how easy it is to reverse engineer someone else’s model using your own AI, having access to the neural network will become less of a barrier in the future.

Though some solutions are on the horizon, such as differential privacy, which trains models on statistical features of aggregated data rather than individual data points, or homomorphic encryption, an emerging paradigm that allows computation directly on encrypted data, these approaches are still a long way from becoming standard practice, so entrusting your data to the black box of AI may not be as safe as you think for the time being. Particularly as AI models infiltrate crucial domains such as health, banking, and defense.

Paper 1:

Paper 2: