Asif: Tell us about your journey in AI and machine learning so far. What factors influenced your decision to pursue a PhD and a career in the field of AI?
Sherin: I was initially intrigued by the field of Machine Learning (ML) and Deep Learning (DL) as it presented a world of endless possibilities with applications in complex domains such as video/image classification, tracking face recognition, and biomedical signal processing. I felt there was more for me to learn, discover something new in this area, solve challenging tasks, and be able to achieve something significant. My natural curiosity and an opportunity to challenge myself were the driving forces that pushed me to explore and pursue a Ph.D. in this field. In my Ph.D., I developed a novel dictionary and DL algorithms for classification tasks related to remote health monitoring systems (e.g., activity recognition for wearable sensors). Completing the Ph.D. degree does require years of hard work, but I think this was one of the best decisions I made in my life. This journey has enriched me, not only in terms of knowledge in the field, but also taught me how to handle setbacks and be persuasive during challenges which are important to be successful in the industry. In my industry work experience, I have developed the ability to create new ML models to improve and increase the effectiveness of cybersecurity products and work on leading edge research such as eXplainable AI (XAI) and deepfakes. I feel truly indebted to the learnings that I have received over the years and feel deeply passionate about XAI, ethical AI, the opportunity to combat deepfakes and digital misinformation, and topics related to ML and DL with applications for cybersecurity.
Asif: How does computer vision differ from human vision? What are some of the factors you take into consideration developing machine learning algorithms?
Sherin: Computer vision (CV) allows computers to “see” via pixels and interpret digitally, incorporating pattern recognition and mimicking human vision by recognizing objects in images or videos. This is accomplished through repetition as computers need to be fed as many images or videos as possible. On the other hand, human vision revolves around light, and, while it and computer vision both have inherent bias, CV can be fooled more easily. Recent computer vision frameworks have been found susceptible to well-crafted input samples called “adversarial examples”. Adversarial perturbations can easily fool DL models in the testing stage. As the susceptibility or liability to adversarial examples becomes one of the major risks, attacks and defenses on adversarial examples are important considerations when developing and applying deep learning in safety-critical environments.
Asif: Modern times are seeing a rise in deepfakes. Can you explain the technology behind deepfakes? How do you see these emerging technologies impacting daily lives? How can we build more trust around AI?
Sherin: Synthetically generated, highly realistic altered videos, also known as “deepfakes”, continue to capture the attention of computer graphics, CV, and security researchers. Recent advances in these fields and DL have made it increasingly easier to synthesize compelling fake images, audio, and video. The possibilities of the adoption and weaponization of deepfakes are causing alarm in the digital realm.
I have researched the potential of Generative Adversarial Network-based (GAN) technologies to use in deepfake creation. The GAN training incorporates a generator and discriminator. The generator takes in an input image and the desired attribute to change, then outputs an image containing that feature or attribute. The discriminator will then try to differentiate between images produced by the generator and the authentic training examples. The generator and discriminator are trained in an alternate fashion, each attempting to optimize its performance against the other one.
Ideally, the generator will converge to a point where the output images are so similar to the ground truth that a human will not be able to distinguish the two images. Thus, GANs can be used to produce “fake” images that are very close to the real input images. GAN techniques such as AttGAN, StarGAN, and STGAN are primarily partial face manipulation methods, whereas PGGAN and StyleGAN2 can be used for full-face synthesis.
Such sophisticated doctored videos do threaten our political, legal, and media systems. The mere existence of deepfakes undermines confidence and could destroy our trust in society. We will need to develop novel forms of consensus, new ways to regulate the use of deepfakes, and new ways to agree on social situations based on alternative verification forms of trust.
Asif: How are deepfakes currently being detected by AI researchers? How can AI researchers improve their detection methods?
Sherin: Current frameworks mainly focus on soft biometrics, CV, and DL algorithms to detect deepfakes. One of the early research directions made use of detecting eye blinking, a physiological signal that is not well-presented in synthesized fake videos. However, sophisticated forgers can now create realistic blinking effects with post-processing and more advanced models.
Other research applied to detect deepfakes is based on inconsistent head poses and facial image warping defects. Few other works aim to improve the generalization ability of a CNN (Convolutional Neural Network) forensics model. It adds an image preprocessing step before training to force the discriminator to learn more intrinsic and generalizable features. These frameworks look at detection of Gaussian blur, shading artifacts arising from illumination estimation, and imprecise geometry estimation of the facial features and missing reflection. CNNs, RNNs and LSTM-based frameworks and pre-trained models such as VGG, inception, Xception, ResNets have also shown promising results to detect deepfakes.
With the recent release of large scale deepfake datasets with additional annotations, I hope this research will continue to advance and help alleviate the problem. Hopefully, this research will lead to larger high-quality deepfakes datasets in the future.
As a future area of research, it would be interesting to make use of transfer learning ability to further generalize current models as well as the use multi-modal techniques. Another potential area to consider is localization of manipulated pixels in GAN-generated fake images. This can be done by either proposing better localization methods which might prove useful to unseen GAN methods through exploiting the imperfection of upsampling methods. With advanced GANs such as AttGAN, StarGAN and StyleGAN2, it would be also helpful to visualize the fake texture in each image and classify them according to different GANs and upsampling methods.
Asif: What is explainable AI (XAI)? Tell us how it works in the context of malware.
Sherin: The cybersecurity industry leverages ML and DL techniques to combat ever evolving cyber threats such as malware. While ML and DL models have become increasingly important for decision-making and are making impossible feats possible, these models are, in essence, a “black-box” as the process that models use to make predictions can be hard for humans to understand. XAI proposes the industry make a shift towards more transparent AI by creating a suite of techniques that produce more explainable models whilst maintaining high performance levels. XAI allows DL models to be more transparent by providing explanations of their decisions and allowing users, customers, and stakeholders to gain insight into the system’s models and decisions. These explanations are important to ensure algorithmic fairness, transparency, and privacy, as well as to identify unconscious bias, data drift, model decay, and potential problems in the training data. XAI also ensures that the algorithms perform as expected. With XAI, domain specialists, analysts, and customers can understand and analyze actions and predictions, even of the most complex neural network architectures. The challenge with XAI lies in balancing the benefits with exposure of any feature-based confidential and intellectual property; as XAI improves, the demarcation lines may get blurred.
Asif: Can you name some books, courses, or other resources that have influenced your thoughts the most?
Sherin: The books Computer Vision by Richard Szeliski and Pattern Recognition, Machine Learning by Christopher Bishop are great starting points for learning fundamental concepts. These have helped me gain a comprehensive understanding of the disciplines of CV and ML. The pattern recognition field has undergone substantial development over the years. In addition to recent papers, books such as The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome H. Friedman, and Convex Optimization by Stephen Boyd are also helpful in understanding and supporting research on ML and related fields. If anyone prefers specifically learning about specialized topics in CV such as t3D CV and methods related to inferring geometry from multiple images, the books Multiple View Geometry in Computer Vision by Richard Hartley and Andrew Zisserman and Introductory Techniques for 3-D Computer Vision by Emanuele Trucco and Alessandro Verri will come in handy.
Asif: What advice would you give to machine learning students who want to jump into the industry?
Sherin: ML and DL form the crux of AI. In addition to these books, I would highly recommend taking a formal course in the area of statistics, ML, and DL. There are more advanced math courses on optimization techniques, which are very good if you are interested in specializing in that area. One can also pursue online certifications, boot camps, and massive open online courses (e.g., Coursera, edx, Udacity) on the specialized topics.
Once the fundamentals are obtained, start practicing on open-source datasets, IEEE contests, and competitions to gain some practical feedback and experience. Additionally, try to attend local meetups or academic conferences. This will help not only with staying up to date with the latest research, but also provide an opportunity to meet more experienced folks in the area. Most importantly, it’s essential to understand that one needs to constantly keep learning, as learning and innovation go hand in hand.
Asif: What are your views about MarkTechPost.com?
Sherin: MarkTechPost is a great information resource community for both aspiring and experienced data science professionals. It provides the latest research updates in the area of ML, DL, and data science and does have great materials under AI paper summary and University Research articles. With free tutorials on AI and video lectures, I think it will be a helpful resource for many aspiring data scientists as well. I hope the community keeps growing towards building and spreading awareness on next-gen data science ecosystems.
AttGAN, PGGAN and STGAN stand for Attribute GAN, Progressive GAN and Selective Transfer GAN. RNN and LSTM corresponds to recurrent neural network (RNN) and Long short-term memory (LSTM). Both fall under the class of artificial neural networks used in the field of deep learning.
Asif Razzaq is an AI Journalist and Cofounder of Marktechpost, LLC. He is a visionary, entrepreneur and engineer who aspires to use the power of Artificial Intelligence for good.
Asif's latest venture is the development of an Artificial Intelligence Media Platform (Marktechpost) that will revolutionize how people can find relevant news related to Artificial Intelligence, Data Science and Machine Learning.
Asif was featured by Onalytica in it’s ‘Who’s Who in AI? (Influential Voices & Brands)’ as one of the 'Influential Journalists in AI' (https://onalytica.com/wp-content/uploads/2021/09/Whos-Who-In-AI.pdf). His interview was also featured by Onalytica (https://onalytica.com/blog/posts/interview-with-asif-razzaq/).