Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

https://www.gwern.net/Mouse-Utopia

>Mouse Utopia is almost completely unpublished. Despite working on it and similar experiments with NIMH funding for decades (he “continued to work on his research results until his death on September 7, 1995”), Calhoun appears to have published almost nothing substantive about his research, limited to a handful of short summary articles or passing references.

>It is unclear just how many experiments Calhoun had to run to get the one result which is always talked about; the name “Universe 25” implies at least 24 prior experiments, and Calhoun speaks vaguely of multiple “series” of experiments, referencing earlier experiments with stable populations (unlike Universe 25), some which were apparently controlled to fixed population sizes and some which apparently were not. Nor did all of the overpopulated universes develop the “behavioral sink” phenomenon Calhoun lays so much stress on, which he attributes to an otherwise-unexplained change in the food type.

>No followup literature: only 2 partial replications have ever been done by third parties that I know of; likewise, if unique aspects of Calhoun’s experiment like the “beautiful ones” have been reported since, I have not encountered any references to them. They do not convincingly support the Universe 25 Mouse Utopia narrative.



I've noticed that there were a number of high profile social experiments around that time (50's-70's) that tried to show something inherent elements of humanity that cause us to turn evil. The Stanford prison experiment, Robbers Cave experiment, The Third Wave, etc. When you start to look deep into any of these, they all seem to have mostly been done by charlatans who manipulated results and sensationalized findings in order to get publicity.


The world was still scarred by WW2, humans proved the worst if themselves just few years before. It's very reasonable that people were trying to find out why


The popular-audience book Humankind by Rutger Bregman goes into the history of some of these classic experiments. In particular I was pretty shocked at how badly run the Stanford prison experiment was, and how lackluster and different the results were in the BBC's attempt to replicate. Granted, participants in that second experiment knew it would be televised, but that's not necessarily a recipe for good behavior, and they certainly engaged in conflict.

https://en.wikipedia.org/wiki/The_Experiment


I don't trust any studies in psychology, especially social psychology. https://www.gleech.org/psych


It is good to debunk bad science.

Just be careful not to repeat the same mistakes in reverse, throwing out something that is clearly helpful because the original experiment cannot be trusted.

I'll post a very simple thought experiment to prove it for one of the cases:

> No good evidence that tailoring teaching to students’ preferred learning styles has any effect on objective measures of attainment. There are dozens of these inventories, and really you’d have to look at each. (I won’t.)

Now, do as in math and consider what happens when we approach zero:

- what happens if a student cannot hear?

- what happens if a student cannot see?

or infinity:

- can you think of a learning style that works better or worse for people who are strongly affected by ADHD?

- same but for social anxiety?

So, just by thinking about it for a few moments we can debunk the debunking: Clearly there exists whole subsets of the population (blind, deaf) where one or more learning styles are impossible, meaning that the rest of the learning styles must be better for those.

What can we learn from this?

Maybe to be sceptical, even towards the sceptics?


That's a good approach, thanks for writing it as a reminder. This takes me back to math and physics lessons, where we tested a given function/formula at extremes and one or more interesting points in the middle, just to get a good feel for its behavior.

Continuing your mental framework, the problem with interpreting such research is that the results we get are in the "probe some points to get a feel" category. Consider a single learning style, and let's plot its known effectiveness, as a function of "learning capability", where "learning capability" is a measure that places students with relevant disabilities on the left, and students with relevant gifts on the right. Assume, for the sake of the exercise, that this metric is sound. We get something like:

  EFFECTIVENESS
     ^
     |                               oo
     |           x   x x
     | 
     |oo                       
     +--------------------------------->
              LEARNING CAPACITY
Where 'o' are obvious values derived from the thought experiments like you did ("what happens if a student cannot hear?", etc.), and 'x' are points from research of the method over a population of a given learning capacity. Looking at 'x' points alone would make you think the method doesn't improve anything for anyone. Looking at 'o' points alone would make you conclude it obviously is effective. But the points I depicted above are not enough to tell you whether the function looks like this:

  EFFECTIVENESS
     ^                              ***
     |                           *** 
     |   ************************
     | **
     |**                       
     +--------------------------------->
              LEARNING CAPACITY
Or like this:

  EFFECTIVENESS
     ^                           *******
     |                    *******
     |             *******
     |     ********
     |****                       
     +--------------------------------->
              LEARNING CAPACITY
Or something else entirely. This is something to keep in mind too: does the research data you saw, whether from single or multiple studies, sampled enough of the search space to give you some confidence in the conclusions?


Thanks! Good points!


That's why you use multiple approaches if you could. No real reason to limit yourself to only one preferred modality unless you're forced to.


In case there was any reason to doubt: I agree.


Isn’t the problem you’re raising here simply that the studies tend to look at populations rather than individuals or even sub-groups? It’s a neat way to demonstrate the problem. The challenge is that as long as we treat the RCT as the primary means by which we build knowledge, it will remain difficult to study sub-groups because it’s so hard to obtain a sufficient sample.


There's the old saying "absence of evidence is not evidence of absence".


From the video: "Eventually, the entire mouse population perished". I find that part pretty hard to believe unless there was some additional external factor. A population crash is one thing. But mouse society apparently collapsed to the point where every individual died?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: