Photography by Dorothy Joseph
I’ve been thinking about some of the changes over the last decade in analytics, coinciding with the revised and updated release of my book with Jeanne Harris, Competing on Analytics. The book is ten years old, and much has changed in the world of analytics in the meantime. In updating the book (and in a previous blog post about the updates), we focused on such changes as big data, machine learning, streaming analytics, embedded analytics, and so forth. But some commenters have pointed out that one change that’s just as important is the move to self-service analytics. We described this trend in our book (due in early September), but we may not have given it the focus it deserves.
There should be no doubt that analytics of virtually every type are becoming more of a self-service activity. There is also little doubt that at one point they were an activity requiring analytical professionals. Today and for the last several decades, however, analytics are becoming easier to use. It’s easier to perform most key tasks in the analytical process, such as:
- To acquire, integrate, review and clean data;
- To run descriptive and predictive analytics on the data;
- To find the model that best fits your data;
- To display descriptive analytics in an appealing visual format;
- To interpret results.
Self-service Drivers
Why have things gotten easier for analytics users? There is no single breakthrough, but rather a series of incremental improvements.
Analytical software has gotten better in terms of the basic user interface, which is almost always a point-and-click one these days. There are a variety of common data formats (e.g., comma-separated values, or CSVs) that make it relatively easy to acquire and integrate data. Almost all analytical systems allow the user to view data as a series of points on a grid, which facilitates identification of outliers and data entry errors.
Running descriptive analytics has gotten easier, both in terms of creating the analyses and displaying them visually. So-called OLAP systems, which involved manipulating pre-structured data cubes, were relatively easy to use once the cube had been constructed, but that typically needed to be done by IT professionals. And users often realized that the data they wanted to analyze wasn’t in their cube, so they needed a new one to be constructed.
Newer tools not only have a better interface, but eschew the cube idea to enable work on an entire dataset. This eliminates or at least reduces the need for IT professional help with analytics. In addition, most analyses with contemporary tools take place entirely in memory, which speeds analysis dramatically and makes it possible to iterate frequently until the best results are achieved.
Finding the model that best fits your data — a problem in predictive and prescriptive analytics — sometimes requires machine learning, but doesn’t always. Some more traditional statistical analysis systems can now make recommendations about what kinds of analyses to perform as well. These systems can examine the data and model roles (independent and dependent, for example) for the selected variables, and specify, for example, that a bivariate correlation is the best analysis for the data 1.
For the most automated (or at least semi-automated) approach, machine learning systems can try out more than a hundred different algorithms on thousands or millions of possible variable combinations and transformations. Some machine learning systems simply ask for a dataset and the variable to be predicted, and the system does the rest. They will even point out likely outliers and errors in data, and exclude them from the analysis automatically if you want. Of course, there is a downside to this ease of analysis; it may be difficult to understand and interpret the results. Hypothesis-driven analyses tend to be much more interpretable.
Finally, analytical tasks related to displaying and interpreting results have gotten substantially easier for amateurs to perform. Visual analytics displays can be created easily and quickly. Some vendors even recommend particular visual display types for particular types of data, e.g., a line chart for time-series data.
Interpretation of analytical results is eased not only by visuals, but also by automatically-generated textual narratives. More than one vendor of “natural language generation” software can create a paragraph or so of interpretive text about a particular bit of descriptive analytics. It is early days for this technology, but some viewers and decision-makers may find text easier to interpret than bar and line charts.
Limitations and Errors
All of these technological advancements have made it much easier for analytical amateurs to create professional-level results. This is mostly a good thing. However, there are some limits to the self-service movement, at least at the present time. As with spreadsheets (perhaps the first self-service analytics technology), amateur analysts can still get in trouble in several different ways.
Continue reading the full blog on Medium here.