data analysis – Economist Writing Every Day
Last year, our economics department launched a minor data analysis program. The first course is a simple 2 credit course called Foundations of Dats Analytics. The original idea was that liberal arts majors would take it and that this course would be a gentle, non-technical introduction to terminology and history.
However, it turned out that liberal arts majors did not take the course, and the most popular feedback was that the course lacked technical challenge. I am preparing to teach the course and it will have two parts. A Python training component where students simply learn Python. We won’t do super complicated stuff, but they will be using Python extensively in the upcoming lessons. The 2nd part is still in line with the old version of the course.
I will have the students read and discuss”bigdata Debunkedby David Stephenson. It devotes 12 brief chapters to introducing the reader to the importance of modern big data management, analytics, and how it fits into an organization’s key performance indicators. It reads like it’s for business majors, but any type of medium to large organization would find it useful.
Davidson begins with some flashy stories that illustrate the potential of data-driven business strategies. For example, Target Corporation used predictive analytics to advertise baby and pregnancy products to mothers who didn’t even know they were pregnant yet. He whets the reader’s appetite by noting that the supercomputers that could play chess or go were based on fundamentally different technologies.
The opening chapters of the book excite the reader with thoughts of untapped potential. This is what I want the students to understand. I want them to know the difference between artificial intelligence (AI) and machine learning (ML). I want them to recognize which tool is best for the challenges they might face and see clear applications (and limitations).
The AI uses brute force, cycling through possible next steps. There are several online tic-tac-toe AIs that keep track. If a student can play the optimal set of strategies 8 games in a row, then they can get the general idea behind testing a wide variety of statistical models and explanatory variables and then choose the best one.
But the ML is responsive to new data, based on what worked best on previous training data. There are several YouTubers who used ML to beat Super Mario Brothers. The programmers identify an objective function and the ML program is off to the races. He tries a few things at one level and then uses the training cycles to perform well at new levels he has never encountered before.
There are a few chapters in the middle of the book that I didn’t like. They discuss how big data should inform a company’s strategy and how data projects should be implemented. These chapters read as if they were written for MBAs or for management. They were boring to me. But that’s fine, considering Stephenson is trying to appeal to a wide audience.
The last chapters are great. They describe the limitations of big data efforts. Big Data is not a panacea and projects can fail for a variety of very human reasons.
Stephenson emphasizes the importance of transaction costs (although he does not put it that way). Medium-sized businesses should outsource to experts who can make (or fail) quickly to avoid large capital investments or labor costs. Or, if interns are hired instead, he discusses the trade-offs between using open-source software, confinement, and reinventing the wheel. These are some great chapters that remind the reader that data scientists and analysts are not magicians. They are people who specialize and can waste their time as well as anyone else.
Overall, I highly recommend this book. I kind of knew what machine learning and artificial intelligence were before I read, but this book provides a very accessible introduction to big data environments, their possible uses, and the organizational characteristics that matter for success. Mid-level and senior managers should read this book so that they can interact with these ideas in a careful way. Those with a passing interest in programming should read it for clarity and to better understand the various subfields. I hope my students will read it and feel inspired to be on one side or the other of the data manager-analyst divide with more confidence, understanding, and a little less pride.