Analytics Lab

Analytics Lab: Action Learning Seminar on Analytics, Machine Learning and the Digital Economy (15.572, Fall 2017, 9 Units)

In the MIT Analytics Lab (A-Lab) student teams select and deliver a project using analytics, machine learning, or other digital technologies to solve business problems.

The course, which runs each fall semester, is spearheaded by the MIT Initiative on the Digital Economy (IDE) and is part of MIT Sloan School of Management’s suite of Action Learning offerings. The course is led by IDE faculty Erik Brynjolfsson and Sinan Aral.

During its first three years, A-Lab has attracted a total of 150 students from a dozen MIT departments to work on over forty projects spanning IoT, digital technology, platforms, finance, e-commerce, retail, manufacturing, medical supply chains, workplace safety, and global health. 

Some projects are tightly focused on dilemmas organizations currently face, which requires students to quickly understand particular business circumstances and domains before performing their descriptive, predictive, or causal analysis. Other projects are more open-ended, and students must think entrepreneurially about how to bring new value to existing data and suggest frontiers for future business opportunity.

For Students  

The initial round of student applications for Fall 2017 has closed, but we are considering submissions on a rolling basis.

Apply for A-Lab 2017.

Admission by application only; evaluation based on coursework or experience in analytics, statistics, computer science, management, and economics; applications considered on the basis of relevant learning, experience, and motivation toward data analytic work, with extra weight given to data analytic courses taken and to data analytic project and job experience; attention given to a representation of students with technical and computational experience, managerial experience, experience implementing analytical models, and entrepreneurial work using analytics. No bidding necessary.

Fall 2017 Schedule: Thursdays 4:00-5:30pm, plus an (extended) project pitch session on September 21 and a final presentation session on December 8. The course is not open to listeners and in-person attendance at all sessions is mandatory.

For questions about the course or application, please contact Susan Young.

View 2016 syllabus and posters of select projects.

For Project Sponsors

We encourage interested organizations to take advantage of this opportunity and join in what has proven to be one of the most popular courses among students pursuing careers in data science at MIT.

In building a portfolio of proposals for our students to select from, we are looking for projects that fit in a variety of industries and sectors, address a diversity of problem types, require advanced depth of analysis, and have ensured availability of clean and rich data at the outset of the project. We also seek to closely match the students’ capabilities and variety of skills, experience, and interests.

Organizations are invited to provide their data, time, and insights to enable student teams to deliver actionable solutions and impactful findings. From the proposer’s perspective, the project should have high business relevance and value, but should not be thought of as a consulting engagement. 

There is no fee for proposing or participating in the course as a project sponsor, but project sponsors are responsible for coverage of any project-related expenses.

Fall 2017 Timeline: Preliminary proposals are due June 30. Final proposals are due August 23.

For questions about the project proposal process, please contact Susan Young.

Example Projects:

The “Myth of the Crystal Ball”: Understanding Forecasting Errors at Amazon (Amazon)

  • Challenge: Help Amazon quantify the impact of supply chain forecasting errors to better prioritize forecast improvements in the future
  • Data: 75 million rows containing daily demand and forecast data for 206 thousand products over two weeks
  • Analysis: Defined different kinds of costs associated with forecasting errors and their magnitudes. Used statistical methods in R running on a cloud computing system to quantify lost profit due to forecast error
  • Recommendation: Incorporate indirect costs into the evaluation of forecasting errors. Look for variation across product categories

Understanding Successful eBay Sale Prices (eBay)

  • Challenge: Find the factors that best predict successful prices for new and used eBay items in different categories and under a variety of sales conditions
  • Data: 3 months of sales data, totaling over 147 million separate transactions (about 24 gb with some preprocessing required)
  • Analysis: Using machine learning and a “bag-of-words” model, looked into inclusion of special characters and its effect on price, drivers of the difference in prices between new and used items, and price differences between auctions and Buy-It-Now goods
  • Recommendations for further analysis: Define a “feature space” for different goods on eBay, perform seller network analysis, and use timing to better predict prices

Predicting Hospital Readmission (Dell Services)

  • Challenge: Use analytics to find the factors that best predict 30-day hospital readmission
  • Data: 1500 patient admissions at one US hospital, with 26 fields describing each case
  • Analysis: Generated additional features, then used logistic regression, support vector machines, and classification trees to predict readmission
  • Recommendation: Expand analysis to more hospitals and incorporate data from new sources (e.g. wearables) to help reduce readmission risk

Predictive Maintenance in the Elevator and Escalator Industry (Schindler)

  • Challenge: Help Schindler use predictive analytics to revise its maintenance strategy and better perform preventative intervention
  • Data: 1000 elevator-specific files describing elevator operation and maintenance needs
  • Analysis: Used regression techniques to predict potential need for future maintenance and likelihood of service trips for different elevator codes
  • Recommendation: Determine the appropriate priority for elevator maintenance given limited resources. Error codes can be predicted, but potentially more important is efficient allocation of resources

Using Geospatial Data to Develop a New Kind of Football Analytics (Telemetry Sports)

  • Challenge: Use a new source of geospatial NFL data to classify plays, evaluate players, and design football strategy
  • Data: Real NFL game data from selected Indianapolis Colts plays, as well as over 10,000 simulated football plays from EA’s Madden NFL game
  • Analysis: The team used machine learning and regression techniques to identify player positions on the field, isolate player routes in game, classify plays, calculate new measures of “player elusiveness”, and project expected yardage per play
  • Recommendation: Geospatial data offers significant opportunities for evaluating success in sports. This type of analysis would be particularly useful for optimal play selection

 

2014-2016 Project Sponsors

Amazon

Boston Public Schools

BuyerZone

Capgemini

Capital One

Center for Digital Business

Christian Science Monitor

Dell Services

eBay

Evercore ISI

Fusion

Gates Foundation

GE Transportation

Graduate Management Admission Council

Green Cargo

HG Data

Interasia Lines

IBM Watson

Intursa

ISN

Legendary Entertainment

Lineage Logistics

LinkedIn

Marathon Data Systems

MasterCard

MIT Sloan

Nasdaq

Purch

Raise Marketplace

Schindler

Stellar Loyalty

Telemetry Sports

Toyota and PublicRelay

WOOX

Zensar