Check Out How These Illinois Tech AI Researchers Extracted Personal Information From Anonymous Cell Phone Data Using Machine Learning

Many people who regularly use social media, cell phones, home security cameras, and location trackers don’t realize the full extent of their data. They have no idea that the information they have gathered can be replicated by others using machine learning (ML) techniques. The consequence is that users are always at risk of having their anonymized information de-anonymized by ML algorithms, although most people cherish their online privacy, albeit to varying degrees.

Data security concerns have been raised after a team of researchers from the Illinois Institute of Technology used machine learning and artificial intelligence algorithms to extract personally identifiable information from otherwise anonymous cell phone data, including sensitive characteristics like age and gender. 

Although information such as one’s age and gender may appear harmless at first glance, it is often exploited for malicious purposes. There are a variety of regulations in place to safeguard minors that are broken when someone with malicious intent targets young children for any reason, from sales to sexual exploitation. On the other end of the age spectrum, the elderly frequently target sophisticated spam and phishing campaigns due to their vulnerability and access to savings.

Researchers could easily determine customers’ gender and age based on their text messages by analyzing data from a Latin American cell phone provider. The accuracy of the team’s neural network model (67%, to be exact) for estimating gender much outpaced that of state-of-the-art methods, including decision trees, random forests, and gradient boosting models. The same model was also 78% accurate in estimating users’ ages.

Information was extrapolated using standard computer hardware and software. The neural network model was executed on a Linux (Fedora) computer with 16 GB of RAM and an Intel i5-6200U processor, each of which had four processing cores.

Researchers have found that such attacks do not happen only seldom. The study database was not made available to the general public. On the other hand, they note that an opponent may amass a comparable dataset by intercepting communications at open Wi-Fi hotspots or hacking service providers’ data centers.

The purpose of this work is to open a discussion about how recent developments in AI and machine learning have affected privacy laws. Since the United States lacks comprehensive privacy laws, the study’s authors examined how these methods undermine the EU’s General Data Protection Regulation provisions, which safeguard European Union consumers from privacy invasions.

There is no avoiding the inevitable rise of machine learning and automated decision-making in the commercial world. The challenge is finding the right legislative framework to safeguard personal information while protecting social and commercial interests from fraud.

That can be done, for example, by giving users the choice to share their data upon installation of an app (the “opt-out option”).

Recommendations include updating existing non-compliance measures and encouraging the use of synthetic data for machine learning models rather than user observation. They also encourage data holders to collaborate with machine learning specialists to develop best practices. That is to say, much more effort is required to fill the policy voids and examine the ethics of AI. In their view, GANs can also be utilized to generate anonymous synthetic data. 

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'Predicting age and gender from network telemetry: Implications for privacy and impact on policy'. All Credit For This Research Goes To Researchers on This Project. Check out the paper and reference article.

Please Don't Forget To Join Our ML Subreddit and Youtube Channel

Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Bhubaneswar. She is a Data Science enthusiast and has a keen interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring the new advancements in technologies and their real-life application.