Machine Learning for the Developing World using Mobile Communication Metadata

Seminar | May 17 | 3:10-5 p.m. | 107 South Hall

 Muhammad Raza Khan

 Information, School of

A report on Ph.D. dissertation research

Researchers working on the problems associated with the developed world generally have access to rich and diverse datasets like social media activity, sensors data, etc. However, the same is not correct about the developing world where access to comprehensive datasets is one of the most significant issues in the research. Social networks and digital sensors have not been that common in the developing world with one big exception: mobile phones.

More than 95% of the worldâs population today has mobile phone coverage, and even in some of the most under-developed places of the earth, the penetration of mobile phones is much higher than other measures of human development like literacy or access to the financial infrastructure. As a result, researchers have been increasingly using the metadata collected by the mobile phone companies in these developing countries as an alternative to the more conventional data sources. However, the mobile phone data may not be very well suited for the machine learning algorithms in its raw form. In other words, there is a need for algorithms to convert the raw mobile communication meta-data into features suited for the machine learning algorithms.

In this talk, I am going to describe my work on extracting features from mobile communication logs using techniques like Deterministic Finite Automata (DFA). I will also show how this approach outperforms other methods for problems like product adoption. I further show that by using DFA based features and spectral analysis of the multi-view nature of mobile communication networks, advanced neural network algorithms can be developed that beat the current state of the art methods for the problems like poverty prediction and gender prediction. In the last part of this talk, I will describe the value of communication networks data for research questions related to social networks analysis like what are the salient differences between the behavioral patterns of men and women in the developing world as exhibited in the communication networks data.