Visual computational sociology; three words you may not think tie together because they represent very diverse subjects. Yet, Timnit Gebru (pictured), an AI researcher at Microsoft who recently received her PhD from Stanford, has studied and written about how these fields not only intersect, but the important implications that result.
In an April 18 MIT IDE seminar on the Methods and Challenges of Visual Computational Sociology, Gebru explained that her main research interest is in data mining of large-scale, publicly available digital images to gain sociological insights. She also studies computer vision problems and challenges, and the state of fine-grained image recognition, scalable annotation of images, and domain adaptation. Along the way, she has made some interesting observations about data set bias and ethics.
For her research, Gebru specifically used American Community Survey census data and Google Street views to understand more about those who live in urban environments. The goal was to determine the accuracy of image recognition and machine-learning algorithms.
“Our approach was to view objects in 200 U.S. cities using Google Street images, sampled every 25 meters,” to see how well they could predict political leanings, wealth, and other demographic information about residents. Overall, the team collected 50 million Google Street View images from the 200 cities. These were used to make inferences, such as detecting and classifying the automobiles. To train the models, the team used a subset of those 50 million images, in conjunction with another set of images of automobiles collected from craigslist, cars.com, and edmunds.com. The findings could also be used for marketing purposes, for political polls, or to gauge population trends.
As noted in The Economist, the system has limitations: “Unlike a census, it generates predictions, not facts, and the more fine-grained those predictions are the less certain they become.”
Gebru said that one conclusion of the work is that while error rates for image recognition have improved greatly in the last decade, it’s not perfect for fine-grained data and species. Furthermore, the work has led her to become “immensely interested” and active in the area of data set bias and ethics.
“Data set bias is a primary concern,” she said. Data mining can be used for good, such as finding and mitigating pollution sources, but technologies like face recognition also can be employed for police surveillance that might threaten privacy rights or be misused in criminal cases. “There is a tendency to put too much trust in algorithms,” Gebru said. “They are not unbiased and they are unregulated.”
Gebru currently works in the Fairness Accountability Transparency and Ethics (FATE) group at Microsoft Research, New York, where these issues are addressed. She is currently studying the ethical considerations underlying data mining projects, and methods of auditing and mitigating bias in sociotechnical systems.