This past semester I was involved in an interesting course at MIT’s Sloan School of Management – Analytics Labs (A-Lab). A-Lab’s objective is to teach students how to use data sets and analytics to address real-world business problems. Companies submit project proposals prior to the start of the class, including the business problem to be addressed and the data on which the project will be based. Students are then matched with the project they’re most interested in and grouped into teams of three to four students.
A-Lab received over 20 project proposals from different companies, of which 13 were selected by the students. Each project team was assigned a research mentor to provide guidance as appropriate. I mentored a three-student team that worked on a project sponsored by MasterCard. The students explored the possibility of improving on predictions of the economic performance of emerging markets by coupling existing economic indicators data with consumer behavior based on MasterCard’s transaction data. This is a particularly interesting project because economic data in emerging markets is often not as reliable as the data in more advanced markets.
But more important for the students, the various A-Lab projects served as a concrete learning experience on what data science is all about – how to leverage messy, incomplete, real-world data to shed light on a complex and not-so-well-defined problem.
A-Lab is taught by professors Erik Brynjolfsson and Sinan Aral. The course was first given in 2014, so this is only its second year. The 2015 Syllabus offers a good overview of the class, including the various companies that submitted project proposals. Projects are considered confidential unless the companies involved give permission to talk about them publicly, as several did in 2014.
Amazon, for example, sponsored a project on how to raise the share of wallet of Amazon Prime customers, based on the analysis of over 200 million anonymized data points. And IBM sponsored a project to uncover a potential Watson application. The students recommended using Watson as a kind-of regulatory analyst-assistance, to help financial institutions better understand how to comply with the over 1,700 pages of regulations in the Dodd-Frank legislation.
The 2015 A-Lab class culminated with short presentations of each of the student projects before an audience that included company sponsors and mentors. As I listened to the 13 presentations, I was impressed by the potential of applying big data and analytics to all kinds of business problems, from the tactical decisions every business makes as part of its normal operations to the strategic decisions they must also make to help them better compete in a fast-changing marketplace. But, at the same time, the presentations once more reminded me that we’re still in the early stages of data science as a profession and academic discipline. We still have much to learn.
In recent years, data science has emerged as a hot new profession. A 2012 Harvard Business Review article named Data Scientist: the Sexiest Job of the 21st Century in its very title. Its authors, Tom Davenport and D. J. Patil, defined data scientist as “a high-ranking professional with the training and curiosity to make discoveries in the world of big data,… Their sudden appearance on the business scene reflects the fact that companies are now wrestling with information that comes in varieties and volumes never encountered before.”
The demand for data scientists has been racing ahead of supply. People with the necessary skills are scarce, primarily because the discipline is so new. Universities only started offering courses and advanced degrees in data science in the past five years.
Furthermore, data science is a highly complex discipline, a veritable mashup of several different fields. The data part of its name refers to acquiring, ingesting, transforming, storing and retrieving vast volumes and varieties of information, whereas the science part seeks to extract insights from the data by applying tried-and-true scientific methods, that is, empirical and measurable evidence subject to testable explanations and predictions.
It’s very exciting to contemplate the emergence of a major new discipline. It reminds me of the advent of computer science in the 1960s and 1970s. In its early years, the field attracted people from a variety of other disciplines who started out using computers in their work or studies, and eventually switched to computer science from their original field.
This was the case with me. I used computers extensively while a physics student at the University of Chicago in the 1960s. When the time came to look for a job, I realized that I had enjoyed the computing side of my research more than the physics, and in 1970 joined the computer science department at IBM’s Watson Research Center.
Like data science, computer science also had its roots in a number of disciplines, primarily math for its more theoretical foundations, and engineering and business for its more applied aspects. Computer science became an established, widely accepted discipline around the mid-late 1970s, and expanded in multiple directions over the next decades with the advent of personal computers in the 1980s and the Internet in the 1990s. Computing has become an integral part of many disciplines, given that digital technologies now permeate just about every nook and cranny of business and society.
As has been the case with IT over the past several decades, we’re now seeing the growing value of capturing as data many aspects of business, society, and of our very lives that have not been quantified before.
Data is now being generated by just about everything and everybody around us, including web searches, social media interactions, financial transactions, mobile devices and IoT smart sensors. All this data is enabling us to better understand the world’s physical, economic and social infrastructures, and to infuse information-based intelligence into every aspect of their operations.
It’s making it possible to not just better understand what’s happening in the present, but to also make more accurate predictions about the future. One of the most exciting part of data science is that it can be applied to many domains of knowledge, given our newfound ability to gather valuable data on almost any topic. But, doing so effectively requires domain expertise to identify the important problems to solve in a given area, the kinds of questions we should be asking and the kinds of answers we should be looking for, as well as how to best present whatever insights are discovered so they can be understood by domain practitioners in their own terms. We’re just learning how to do this.
“The growth in big data and analytics is transforming decision-making, operations, marketing, finance, and product innovation,” notes the A-Lab Syllabus as it describes the objectives of the course. “Businesses across the world are wrestling with challenges and opportunities that call for the application of analytics. We are on the cusp of a second machine age – a digital era that holds opportunities and challenges for both individuals and the economy. Workers and professionals in all fields are racing to acquire the skills and capabilities necessary to survive and thrive in this digital revolution.”
It will be fascinating to see where this will lead us as the discipline matures over the next few decades.
This blog first appeared Dec. 29, 2015, here.