Social protection systems provide crucial assistance during crises, increase productivity, and protect vulnerable populations. With the COVID-19 pandemic, global extreme poverty has risen for the first time in two decades, making the need for effective social protection programs more urgent than ever. However, targeting eligible households in low to medium-income countries presents a significant challenge, as traditional administrative data, like tax records, is often unavailable due to a large proportion of informal workers.
This paper, presented by researchers from UC Berkeley and World Bank, shows that the use of machine learning on non-traditional administrative data, such as call detail records (CDRs) from a large mobile phone operator in Afghanistan, has emerged as a promising solution to target ultra-poor households in the government’s anti-poverty program. CDRs contain information on phone numbers, communication patterns, a network of contacts, and recharge patterns, among others.
The paper evaluates and compares three methods for correctly identifying ultra-poor households, namely a supervised machine learning model trained on CDR data, an asset-based wealth index, and a consumption metric, which is commonly used as a proxy to measure poverty in low- and middle-income countries. The supervised machine learning algorithm was trained on 797 behavioral indicators computed from CDR data, which included communication patterns, a network of contacts, spatial patterns, and recharge patterns, using a gradient boosting model that outperformed other common machine learning algorithms. Additionally, the paper examines the accuracy achieved by a combined method that used logistic regression to classify ultra-poor and non-ultra-poor households by leveraging all three methods mentioned above. To assess the accuracy of each method, the study employed ROC and precision-recall curves and calculated the standard deviation of accuracy metrics using 1000 bootstrapped samples.
The accuracy of the CDR-based method in identifying ultra-poor households was found to be comparable to the other two methods, achieving a precision and recall of 42% (compared to 49% for the asset-based method and 45% for the consumption-based method). The trade-off between false positives and false negatives was evaluated using ROC curves, and the Area Under the Curve (AUC) scores were also found to be comparable among the methods, with the asset-based method slightly outperforming the consumption-based and CDR-based methods (AUC=0.73, 0.71, and 0.68, respectively).
The combined method, which used logistic regression to classify ultra-poor and non-ultra-poor households by leveraging all three methods, showed the most promising results with an AUC of 0.78, outperforming the individual methods using any one or two of the data sources. However, given the impracticality of collecting consumption data for large populations, a combined method using only CDR and asset data might be the most feasible option (AUC=0.76).
Another important advantage of CDR-based targeting is that it can reduce both the time and marginal costs required to implement a targeted program when compared to methods that are currently in use (proxy-means tests, community-based targeting or consumption-based targeting). For example, community-based targeting and proxy-means tests are estimated to add an additional $276k dollars and $503k, respectively, corresponding to 2.18% and 3.97% of the total program budget, while the marginal cost of household screening with CDR is negligible.
The use of CDR data for targeting does raise ethical concerns and limitations that must be taken into account. Firstly, access to phone data is necessary, and targeting accuracy will suffer if data is unavailable for some segments of the population (such as those without a phone or if a specific provider does not permit access to the data). Secondly, CDR-based targeting involves accessing sensitive and private information, including phone numbers and location traces, necessitating informed consent and clear privacy standards which do not exist today. One potential solution to decrease privacy risks is data minimization, which restricts models to features that pose the least risk to privacy, but this would lead to a decrease in targeting accuracy. Lastly, the use of CDR for program eligibility may create incentives for strategic behavior by individuals who want to manipulate the system, such as refraining from using their phones. Even if complex machine learning algorithms may reduce the scope for manipulation, society often demands transparency in algorithmic decision-making because black-box decisions are difficult to audit or hold to account.
In conclusion, the integration of machine learning with CDR data has the potential to revolutionize the targeting of economic interventions or aid programs by reducing costs and complementing existing survey-based methods. However, practical and ethical concerns must be considered, such as access to data, privacy issues, and potential data manipulation. It is essential to weigh these constraints against the potential benefits of CDR-based targeting in each specific context. As machine learning continues to evolve and shape the world, it is crucial to approach its applications thoughtfully and responsibly, ensuring that they align with ethical standards and prioritize the well-being of individuals and communities.
Check out the Paper. Don’t forget to join our 19k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com
Nathalie Crevoisier holds a Bachelor's and Master's degree in Physics from Imperial College London. She spent a year studying Applied Data Science, Machine Learning, and Internet Analytics at the Ecole Polytechnique Federale de Lausanne (EPFL) as part of her degree. During her studies, she developed a keen interest in AI, which led her to join Meta (formerly Facebook) as a Data Scientist after graduating. During her four-year tenure at the company, Nathalie worked on various teams, including Ads, Integrity, and Workplace, applying cutting-edge data science and ML tools to solve complex problems affecting billions of users. Seeking more independence and time to stay up-to-date with the latest AI discoveries, she recently decided to transition to a freelance career.