September, 15 2017
David Steinberg
Data Science has become a rallying cry for universities, research organizations, and many commercial and industrial companies. We are surrounded by ever increasing amounts of data and by myriad methods and algorithms to take advantage of them. Rallying cry aside, no one seems to be very clear about just what IS data science. With the not surprising consequence that everyone with a vested interest claims at as their native turf, from business analysts to IT specialists to computer scientists. Statisticians are also in the fray, though (as too often in the past) we seem to be a bit late on the scene. David Donoho’s essay is a wonderful journey into the statistical underpinnings of data science and to the fundamental roles that statistical thinking and concepts must play in any sophisticated view of data science. The essay was originally written for presentation at a meeting celebrating the 100th anniversary of John Tukey’s birth; Tukey was one of Donoho’s teachers as an undergraduate at Princeton. Many themes in the essay were inspired by Tukey’s classic 1962 article on The Future of Data Analysis. He builds on those themes, bringing in many other developments in statistics and in deriving meaning from data. Donoho goes beyond a mere recounting of our accomplishments to set out his own ideas about just what is the science in data science, and he finds that statistical principles are fundamental to making the field into a true science. Here again he harkens back to Tukey – one of the basic tenets in his article was his declaration that data analysis is indeed a science. Donoho’s essay is at once deep and broad. And yet there is so much to cover that I am sure all of you will find additional topics that you would have liked to see included. With all the current hype about data science, this essay is a valuable and instructive read for any statistician who wants to make the case for statistics in data science in her or his organization. Put it on your reading list.
Read the essay:
50 Years of Data Science, David Donoho.
(Image licensed under CC BY-SA 3.0)