Skip to content

Latest commit

 

History

History
19 lines (10 loc) · 2.98 KB

two_cultures.md

File metadata and controls

19 lines (10 loc) · 2.98 KB

Two Cultures

John Peters

10/29/24

Review

We review the paper: Statistical Modeling: The Two Cultures this week. This paper goes over how traditional statisticians approach problems versus the more machine learning oriented folks. Or, at least to some degree it does. It covers the two approaches of modeling the data or what produces the data. From the statistics perspective, when this paper came out, it was common to create a model that generates the data, and to come to conclusions from that. The other approach, data first approach, instead tried to just create some model that could correctly estimate the date (from hold out sets, essentially) to the best of its ability. The author of the paper was more partial towards the second approach, and probably espoused about his model (CART) a bit too much to have the paper read in a neutral manner. The conclusion of the author is that we should be focused on accuracy, and not necessarily how well a statistical model can generate the data. There were also discussions about interpretability and performance, where the more accuracy focused models tended to be less interpretable, sorta.

I think this paper was alright. The general message was okay. However, I think the author came at it from a poor perspective. I understand where he was coming from, with his opinions. But, he never really addressed the why. Why was, at the time, a rise in machine learning? Why were the simpler models of the stats people the common approach? I think the author could have had more empathy with the situation they were dealing with at the time. He consistently missed the plot, I fear, looking back at the paper from where we stand now. He says at some point the paper, about how some methods become computationally feasible, under some conditions (regarding forests, if I remember). I don't think the author realizes that the strong foundation of statistics that he was turning his nose at, were doing what was possible. He then ignores data and statistical priors that make his methods and culture of methods feasible. It is no coincidence that at the time he was writing that paper, more, what we consider classical, ML methods were coming out. It is because we finally were being enabled by hardware, and I think his perspective could've been just a bit more understanding if he just kept that in mind and used this paper to discuss the power of what we could accomplish at that time instead.

Discussion Questions:

  • The hardware denotes everything we can do. All of our discoveries come from the magical pixies. Why do a lot of these papers not acknowledge that?

  • Do we still think that there is such a start separation between two cultures now? (I certainly don't)

P.s. This paper I think will have a more fun discussion than most. These paradigm shifting papers I think are what this class could use more of. I think they lead to much better discussions than the more... raw clinical stats papers that don't really discuss how they are breaking into new ground.