Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Interview on ”Bayesian Statistics the Fun Way” (notamonadtutorial.com)
222 points by unbalancedparen on June 4, 2019 | hide | past | favorite | 52 comments


As someone who has a master's degree in statistics and often uses Bayesian statistics, I think we should not focus on whether one is a Bayesian or a frequentist, but rather be pragmatic and take the most practical approach to solving a statistical problem. Moreover, I think statistical education should start with frequentist concepts and then extend them to the Bayesian framework since the likelihood plays also a major role in obtaining the posterior distribution. In my opinion, this progression is much more natural than starting fully Bayesian.


Respectfully, I find that people who are not statisticians overwhelming disagree with your point on which should be taught first.

Bayesian is a natural order of inference for people. The whole concept of the black swan ("all swans are white") proves this out.

Frequentist statistics is much less intuitive to people.

My preference is for people to be able to use some statistics, and Bayesian gets them productive faster.


Frequentist statistics is often pretty poorly taught. Ideas like likelihood, modeling, and optimization underly the mechanics of both worlds. There's a big obsession with testing, but the Neyman Pearson testing framework is sound an intuitive.

Bayesian statistics gets a big boost because it's usually taught as a system instead of as a recipe book.


I would argue that the problem with frequentist statistics is that it aligns with humans' flawed intuition of how randomness works. People are inherently obsessed with finding patterns to support their hypotheses.

The problem is that what we perceive as random and extremely unlikely events are in fact much more probable than what we estimate from using Gaussian methods. And the frequentist approach helps to create this distortion by ignoring black swans.

Here's a great video demonstrating how people tend to misunderstand randomness: https://youtu.be/tP-Ipsat90c


One approach gives the right answer. The other approach is more computationally tractable. Computers are pretty powerful now, so we can afford the correct answer much more often than we used to.

As for what is more natural… I've seen a (frequentist) introduction to statistics, and it simply did not make sense. Nothing was justified, you just had to learn the stuff by rote and apply it in situations that look like they could use one tool or another.

Probability theory on the other hand is pretty obvious. The axioms required to derive it are ridiculously few and ridiculously intuitive. From there you get the sum and product rules, and all the rest. Always made perfect sense to me.


I am surprised by how many people equal frequentist statistics with Neyman-Pearson hypothesis testing. In my opinion, the main difference between the two approaches being whether the parameters of a statistical model are considered as fixed or random, everything else follows from this.

On the subject of statistical education: The point I tried to make is that I think it is much easier to study first the likelihood, the central quantity of frequentist inference. One can then go to the Bayesian world simply by allowing the parameters to be random variables. Furthermore, as other commentors have pointed out, technical difficulties arise in the non-conjugate Bayesian setting when MCMC sampling has to be used. In my opinion, MCMC algorithms, convergence diagnostics, etc. are certainly not topics for an intro stats course.


Having taught frequentist stats as a TA to grad students, I understand why frequentist stats seems not to make sense. On the other had, my prior on teaching quality, and my data on the relative difficulty of understanding the approaches says with near-certainty that your experience has nothing to do with the approach taken.

Having used Bayesian stats heavily, I'd note that the hard parts are not gone, they are just located elsewhere - in how to actually do the computations, rather than in how to set up problems. Each can be taught poorly or well, but given that MCMC is certainly harder than least-squares, it seems difficult to argue that using Bayesian statistics is easier. (Unless you're not just applying the methods by rote, and letting the computer spit out answers - and if you are, I don't know why you are better off with Bayesian methods. In fact, if that's what you're doing, please stop doing statistics and pay an expert instead.)


> given that MCMC is certainly harder than least-squares, it seems difficult to argue that using Bayesian statistics is easier.

Actually, I am not saying Bayesian statistics are easier to use. I was saying they looked easier to understand. Though I must point out that "Bayesian" may be the wrong word here. What truly makes sense to me is Probability Theory, which Edwin T. Jaynes describes pretty well.

(That does not make me any more capable at applying MCMC, which I don't even know of. Searching… Ah, Markov Chain Monte Carlo, yeah that's not easy. Plus, this sounds like an approximation of probability theory… not that we have anything better, mind you: I know that applying probability theory directly is often computationally intractable.)


I agree with this approach, and this is roughly the approach my own Statistics master's degree takes as well. It can be challenging to understand the finer points of likelihoods and posteriors (and the how to choose a prior) without serious mathematics that you're unlikely to have upon entering a graduate statistics degree.

Starting with applied probability and applied statistics (incl. regression, ANOVA, GLMs) allow you to solve problems and feel useful and engaged before being thrown into the mathematical rigor required of Bayesian statistics.


I agree, although I respect those who look for deeper justification for the methods we use. Bayesian statistics/decision theory does have axiomatic foundations after all.


So does frequentist stats - they are just different axiomatic foundations and assumptions.


I'm less familiar with them - I've certainly seen many plausible frequentist arguments, but I've never been exposed to any unifying framework which would require that one make decisions based on type-1 error rate controlling hypothesis tests. That's not to say such foundations don't exist, I'm just happy being a philosophical Bayesian who sometimes does frequentist or algorithmic/ML things for practical reasons.


In an introductory course, we should be teaching people to collect enough data that any reasonable choice of prior or method doesn't matter that much.


I started college in 1982. At that time, calculators were common, but not computers. The data sets had to be small enough for us to work problems by hand. Not any more. I see no reason why a stats course can't start out with big bright data sets that are easy to analyze, then advance through more difficult problems where it becomes progressively easier to get things wrong, and thus requires more sophistication to think about problems.

I just want to add a bit more. It's quite easy today, to generate and play with random numbers. If you think you understand a process that has generated your data simulate it and run the simulated data through the same analysis. I do this for real -- I don't trust myself to choose the right statistical analysis, so I always test my chosen analysis with simulated data. If I can fool myself with simulated data, than my real data is probably fooling me too.


That is often not possible.

Could we, for instance, collect enough data on typing discipline to end the static/dynamic typing once and for all? Enough data to overcome the priors of both static typing and dynamic typing proponents?

We could, but that would require pretty big sample sizes. Like 10,000 developers of various competence, working on 1,000 projects of various domains and difficulties for various amounts of time (from a few days to at least a few months). Who is ever going to fund that?

Until we get such a miracle controlled study, our respective priors will still matter.


As someone who uses statistics all the time at work, I sympathize so much with this article and greatly enjoyed it. Every time I try to introduce a Bayesian prior, coworkers either look at me like I'm crazy (because they've never heard of or used Bayesian stats) or like I've suddenly gone soft and introduced a bunch of nebulous, touchy-feely context into the objective truth (if they're dedicated frequentists).

Then we promptly switch back to p-values of .05, a lot of the time not even bothering with a statistical power calculation. I've had better success with introducing power, though. I suspect that's because we can fit it into the existing frequentist framework.


> like I've suddenly gone soft and introduced a bunch of nebulous, touchy-feely context into the objective truth

This drives me nuts. If you haven't, check out the paper "Beyond subjective and objective in statistics" by Gelman and Hennig (2017).

Right at the beginning they make the point that any analysis includes external information in many ways, such as adjusting variables for imbalance, how we deal with outliers, regularization, etc.

Especially if you're doing any sort of causal inference, you're usually making strong assumptions before estimating your model, even just in terms of which variables are included and how they're connected. The idea that priors are somehow ruining an "objective" model is just absurd to me. You're already making so many other decisions about your model that will affect estimates and your interpretation of them. Priors seem like another perfectly reasonable decision to have to make as well, with the benefit of getting results that I think in general are must more easily understood by a lay audience. (E.g., I don't think I've ever encountered someone not on my data science team that actually understands what a p-value is. But people are much better at understanding when I say, there's an X percent chance that there is a positive effect here.)


This critique might come from the idea that having a good analytic model, or at least some valuable analytic insights, involves much more than assigning some priors. Of course, the two things don't exclude each other, but for some frequentists Bayesians have the wrong perspective - or at least that's the critique, whether it's true or not.

Another issue that I personally have with Bayesianism is that I believe that assigning probabilities to singular events is only meaningful and admissible at all if there is a good analytic explanation for the respective propensity. For example, we may be able to deduce that a die is reasonably fair from the way it is constructed and our knowledge of physics, and later confirm this by frequentist analysis. Merely believing or claiming that the die is fair is not acceptable. Again, the difference is only one of attitude in the end, I suppose.

Maybe philosophers have given Bayesian statistics a bad rap, too, because many of those who call themselves Bayesians are also "probabilists", i.e., they think that rational belief must conform to the probability calculus. There are many arguments against probabilism and the only arguments that speak for it are Dutch book arguments. The view does not have very strong foundations.


> assigning probabilities to singular events is only meaningful and admissible at all if there is a good analytic explanation for the respective propensity.

Wait a minute, you are making a type error here: probabilities are not propensities. They're degrees of belief. (And even if you disagree in general, this is a Bayesian context you're talking about.)

If I put a die on a table and hide it with a cup, you could still estimate your probability distribution about which face is up. My probability distribution would obviously be very different, since I put the die in there myself. (Replace "probability" by "betting ratio" or "degrees of belief" if it makes more sense to you.)

> The [probabilism] view does not have very strong foundations.

Read the first 2 chapters of Probability Theory: the Logic of Science, by E. T. Jaynes: "Plausible reasoning" and "The quantitative rules". It's very accessible, and you shall see how strong the foundations really are.

http://www.med.mcgill.ca/epidemiology/hanley/bios601/Gaussia...


No, I was not speaking from a Bayesian perspective, I was laying out the propensity-theoretic explanation of probability. The propensity explanation is one of attempts of explaining why singular events might be said to give rise to probabilities, living besides frequentism and Bayesianism. Another perspective worth mentioning is the logical approach, which is in the end purely combinatorial.

Some people think that you need to explain why a die can be fair, rather than just assuming it or only looking at it from a frequentist perspective. Of course, die-hard Bayesians don't think so, but that would be begging the question in the context of discussing criticisms of Bayesianism.

> Read the first 2 chapters of Probability Theory: the Logic of Science, by E. T. Jaynes: "Plausible reasoning" and "The quantitative rules". It's very accessible, and you shall see how strong the foundations really are.

I'm an expert on this topic. The only arguments for probabilism are Dutch book arguments, and there is a large number of arguments against these. See for example various articles by Hajek. Alternative representations of graded belief are, among others:

- plausibility theory (Halpern at al.)

- possibility theory (Dubois & Prade)

- Haas-Spohn ranking theory and variants thereof

- various notions of epistemic entrenchment

- Dempster-Shafer belief theory

- almost any quantitative or qualitative representation of belief in belief revision theory not covered by one of the above theories (e.g. belief update by Katsuno & Mendelsohn)

- by a general logical connection, nonmonotonic logics and AAFs can generally represent notions of belief update, such that the underlying qualitative ordering of states is a representation of graded belief

What you probably mean is that the above generalizations (or qualitative theories, in some cases) could be simulated with probabilities, e.g. by using convex sets of probabilities or what Josang is doing in his "subjective logic". That's true, but then we're no longer talking about probabilism in the sense I've used the word.

Of course, you can also try arguing for probabilism like Savage did: Lay out a set of postulates for your subjective plausibility that happen to allow you to proof that this notion of subjective plausibility is in the end probability. Despite the merits of such work, it is in the end a form of cheating (or "reverse engineering"), because you could just as well come up with plausible postulates that yield the weaker axioms of possibility theory.


> No, I was not speaking from a Bayesian perspective, I was laying out the propensity-theoretic explanation of probability.

Unless you can explain this "propensity" in terms of actual physical properties, propensity by itself is… unjustified. The only domain I know of so far where we could possibly argue propensities are a thing is quantum mechanics. And even then it seems to rest on an anthropic argument: which universe am I living in?

> Some people think that you need to explain why a die can be fair,

A die by itself is not fair, right? A die might be balanced, and the way it is thrown it might have enough unpredictable variability to cause everyone in the room to think "uniform distribution over [1..6]".

Likewise, a cryptographic pseudo random generator is unpredictable (and thus "fair"), to anyone who doesn't know its internal state. Even though the process itself is deterministic, it's just not computationally feasible to guess its output just from the observation of past inputs. (Though for this one I'm relying on the fact we're not logically omniscient.)

> I'm an expert on this topic.

Good. Then you know that any inference strategy that falls prey to Dutch Books is not rational. Right?

To be fair, probability theory is not computationally tractable. I did not verify, but I guess any feasible approximation is vulnerable to some more or less subtle Dutch Books.

Now the way you talk about Dutch Books sound like all the other strategies you mention are vulnerable, not just in practice, but in theory as well. They are thus not perfectly rational. Do their authors at least have the grace to admit this is a flaw that should be corrected?

But then I suspect that correcting the flaw inevitably leads to probability theory itself: if you accept Jaynes three "desiderata" as required for any kind of rational reasoning, as he shows, the result is necessarily equivalent to probability theory as we know it (where probabilities are subjective assessments of plausibility, otherwise known as "degrees of belief").

I can only conclude that you do not accept Jayne's desiderata as necessary for correct inference. And this is the point where I look at you like you're not quite sane.

For reference, Jaynes Desiderata:

  (1) Degrees of plausibility are represented by real
      numbers. (And a continuity assumption.)

  (2) Qualitative correspondence with common sense.
      (explained in more detailed in the book)

  (3a) If a conclusion can be reasoned out in more than
       one way, then every possible way must lead to the
       same result.

  (3b) The robot always takes into account all of the
       evidence it has relevant to a question. It does
       not arbitrarily ignore some of the information,
       basing  its conclusions only on what remains. In
       other words, the robot is completely non
       ideological.

  (3c) The robot always represents equivalent states of
       knowledge by equivalent plausibility assignments.
       That is, if in two problems the robot’s state of
       knowledge is the same (except perhaps for the
       labeling of the propositions), then it must assign
       the same plausibilities in both.
Good luck convincing me (and I suspect, the majority of people, including frequentist statisticians), that we should reject any of these desiderata.

I don't care it's reverse engineering, those desiderata match the way I think. I accept the conclusion that probability theory is the correct (albeit intractable) way to think, because I ultimately agree with the postulates it rests on. Vehemently so. They're not just true, they're obvious.

If you don't accept them, then I can only give up, and remember what Yudkowsky once wrote: "How do you argue a rock into becoming a mind?"


> Good. Then you know that any inference strategy that falls prey to Dutch Books is not rational. Right?

Do you even have an idea what "rational" means? There are people who argue that having cyclic preferences is not only rational, but even sometimes the only rational representation of evaluations. I'm not one of these, but just wanted to mention that things are not as simple as you lay them out.

If by "rational" you mean "fine for decision making", then I need to disappoint you. Dutch Books are not a working criterion for that. It is perfectly possible to make rational decisions with cyclic preferences. Your preferences need to weakly eligible and weak eligibility needs to be top-transitive (Hansson).

Weak eligibility: There are one or more alternatives such that there is no preferred alternative to them.

Top transitivity of weak eligibility: If a is weakly eligible and a~b, then b is also weakly eligible.

These are conditions on preferences. You can have similar conditions on subjective plausibility, of course, once you combine preferences and subjective plausibilities.

By the way, Expected Utility falls prey to Dutch Books. There is a money pump against every risk-averse or risk-seeking agent. Check out Wakker's book, which is much better than Jayne: Prospect Theory for risk and ambiguity. Anyway, EU is often considered rational and widely used, but according to your criterion it would be irrational. (In finance, the kind of Dutch Books are called "arbitrage" and exploited immediately, so the market prunes them away, but in other areas EU is used extensively. Are you maybe a finance guy???)

> For reference, Jaynes Desiderata:

Of course you can just claim "here is my list of postulates, and that's what 'rational' means", but that's not really an argument. The other theories I am talking about are also axiomatized. Take for example Fishburn's seminal work. According to your theory, Fishburn spent most of his life and efforts in decision making on irrational theories. I'm not convinced and rather be willing to talk about different kinds of rationality, if I'd be pressed to make a decision on that.

> (1) Degrees of plausibility are represented by real numbers. (And a continuity assumption.)

There is a vast array of literature on qualitative decision making for which this assumption does not hold. Lexicographic decision making does also not fulfill that requirement and there is a whole French-Belgium school on that, including axiomatizations and practical methods (tools like ELECTRE). For lexicographic decision making usually hyperreal numbers are used.

Qualitative decision making comes with a host of problems and limitations due to Arrow's Theorem, but lexicographic models can be very reasonable and even required if some of the authors in the field are right about some examples of seemingly irrational preferences. In any case, just to say that these axiomatized theories are irrational because "here are my axioms" is unacceptable. I'm sure not even Jayne does that.

As for the continuity assumption: There is a whole field of measurement theory that would tell you when you need it and when you don't need it, and I really don't see any non-measurement-theoretic way of defending such technical assumptions as rationality postulates independently. Again, just assuming these kind of things a bit too simple. After all, I can take any postulate and call it "rational", that's not a meaningful discussion of rationality, though.

> (3b) The robot always takes into account all of the evidence it has relevant to a question. It does not arbitrarily ignore some of the information, basing its conclusions only on what remains. In other words, the robot is completely non ideological.

This is an interesting principle, because even in probabilistic settings it completely controversial how to deal with conflicting evidence and how and when to revise beliefs in the face of evidence that directly conflicts with your existing beliefs.

It's a very vexing and complicated problem with many different solutions. It is definitely an underdetermined problem. One of the best discussions of it has evolved from criticisms of the corresponding update rule in the Dempster-Shafer theory of evidence, so it's worth taking a look at if you're really interested in this topic. But you seem to be hell-bent on taking Jayne's book as some sort of bible, which is weird. It's not as if any of the other approaches I've mentioned in my previous post are unknown or have been proposed by outsiders - it's almost impossible to not stumble across possibility theory (Dubois & Prade) or Halpern's work if you're doing AI research, for example.

> They're not just true, they're obvious.

Maybe for people who do not know the literature very well, but certainly not to me. Sorry. :(


> There are people who argue that having cyclic preferences is not only rational, but even sometimes the only rational representation of evaluations.

Wouldn't be the first time otherwise serious people are defending nonsense. Noted nonetheless.

> It is perfectly possible to make rational decisions with cyclic preferences.

It is perfectly possible to make rational decision while being insane. Just, not all decisions will be rational. Cyclic preferences are not insane with respect to all decision, but they do mean the decision system as a whole is not flawless.

While the absence of cyclic preferences is of course not sufficient for perfect rationality, it's obviously required.

> By the way, Expected Utility falls prey to Dutch Books. There is a money pump against every risk-averse or risk-seeking agent.

Well, if you're not evaluating risks correctly to begin with, of course you're gonna get ripped off (I'm not saying that's a good thing). Being either risk seeking or risk averse looks like a flaw too, though perhaps less severe than cyclic preferences.

> There is a vast array of literature on qualitative decision making for which this assumption does not hold.

Wait a minute, this one is only talking about epistemology. Jaynes does not mention utility functions at all, and for all I know those may still be allowed to be discontinuous. (That would be perhaps a bit surprising, but I have yet to have an opinion on that particular point.)

Discontinuous probabilities, that would be more surprising. Though I reckon this continuity business is the weak link here. It would be nice if we didn't have to assume it.

> even in probabilistic settings it completely controversial how to deal with conflicting evidence and how and when to revise beliefs in the face of evidence that directly conflicts with your existing beliefs.

There are lots of reason why a piece of data might not change one's mind, even if that piece of data seems to contradict their beliefs directly. For instance that piece of evidence might have been cherry piked from a mass of otherwise normal data.

Not even acknowledging the piece of data might be a good approximation in some cases, but in general it seems quite foolish. You don't just ignore a piece of evidence, you explain why it doesn't change your mind. (I believe Jaynes gives examples of beliefs diverging when exposed to the same piece of evidence.)

> But you seem to be hell-bent on taking Jayne's book as some sort of bible, which is weird.

Call it confirmation bias, but when I read that book, I already subscribed to probability theory as the correct way to think. I had for a long time. The intuition of probabilities being degrees of beliefs, I had for as long as I can remember.

Then this book comes up, and provide justifications for my intuitions that were even stronger than I anticipated. It's like suspecting there's a giant bearded man behind that cloud, and then actually see it. And take a photo, and show it to your friends. Perhaps not foolproof, but pretty damn close.

---

Now we still have a problem. Jayne's Robot cannot exist. I mean, that would be something like AIXI, that's not tractable. Probability theory is not tractable (we wouldn't have Monte Carlo methods if it were, we'd just compute the probabilities directly). Any inference engine that runs in the real world (like humans), has to be imperfect. We have to take shortcuts, and from them, flawed reasoning will arise.

There's also the problem that thinking has a cost. It takes time and energy, and with those, utility. So not only a real engine will have flaws, it also needs to evaluate whether minimising those flaws is worth the trouble (and that evaluation also costs some thinking).

To take a concrete example, the first Alpha Go program lost one of its games in part because it failed to take more time in a particularly hard to evaluate game state. It was obvious to top human players that this particular move required more thought than usual, but the machine wasn't programmed that way.

As certain as I am that probability theory is the correct ideal to attain, I also have to admit that it is just that: an impossible ideal. How to instantiate that ideal into a good enough working implementation, I have no freaking clue.


Well, I agree mostly with you, except that I'm not a probabilist. There is an extensive discussion about what and what not Dutch books show, see Alan Hájek's work on that, which is really worth reading.

> Being either risk seeking or risk averse looks like a flaw too, though perhaps less severe than cyclic preferences.

Yes, I don't want to deny that this view is appealing. However, even if you use probabilistic representations of degrees of belief, you need to deal with ignorance and conflicting evidence in one way or another. Convex sets of probabilities can be shown to be able to represent many of the alternative approaches I've mentioned. There is also this "subjective logic" by Jøsang that is surprisingly nice despite it's silly name. Check it out, maybe the only quirk I have with it is that he mostly seems to re-brand many prior ideas, but the framework is interesting.

> Not even acknowledging the piece of data might be a good approximation in some cases, but in general it seems quite foolish. You don't just ignore a piece of evidence, you explain why it doesn't change your mind.

I agree with you, but at the same time we know from qualitative belief revision theory that there are many, many ways of dealing with conflicting evidence. Okay, we can rule out some of them, e.g. discarding all previous beliefs to learn the new evidence, but among the many less obviously flawed methods a choice needs to be made. The probabilistic setting doesn't help too much in that area, it actually makes it harder to see what's going on. As I've said, the problem is underdetermined.

> Then this book comes up, and provide justifications for my intuitions that were even stronger than I anticipated.

I'm definitely going to read it! However, I might already be tainted by other books on the subject and philosophical discussions. I really do think a belief representation ought not be closed under negation, i.e., I have strong Dempster-Shafer intuitions, and that some way of distinguishing ignorance from doubt is needed.

> Call it confirmation bias, but when I read that book, I already subscribed to probability theory as the correct way to think. I had for a long time. The intuition of probabilities being degrees of beliefs, I had for as long as I can remember.

Kudos to you for having such strong intuitions. It makes life easier. Maybe I'd be willing to buy into them for probabilities, but that wouldn't help me because of similar problems on the evaluative side on which most of my work focuses. On the evaluative side we have thought experiments like Spectrum Cases (Temkin, Rachels): Suppose A gives you extremely high pleasure for a month, B gives you a little bit less pleasure than A (barely noticeable) for 3 months, C gives you a little bit less pleasure than B (barely noticeable) for 9 months, and so on. Some people (not all) have the intuition that B is better than A, C is better than B, and so forth, until at some point, say Z, they would judge that A is better than Z. These thought experiments come in all varieties, can also be made about well-being and other notions of goodness and can be made as realistic as one wishes. Most people who want to keep "better than" transitive introduce some notion of significance, which is lexicographic decision making in disguise (significant value attributes always outrank insignificant value attributes). But okay, you were talking about probabilities only and already acknowledged the evaluative component could be discontinuous. (To be more precise, in this case the Archimedean axiom fails.) It's just that even if graded belief is purely probabilistic, these kind of preferences will complicate making decisions on the basis of your belief.

I agree about the tractability, too. Since you are a probabilist about graded belief, that already makes your life much easier than mine, though. Couldn't you just say that any heuristics are permissible in certain circumstances as shortcuts that - under these circumstances - are conducive to adequate probability approximations?

I didn't want to insinuate that there is anything wrong with being a probabilist, it's in the end a matter of intuitions, I merely wanted to point out that there are some fairly well-known authors who are not probabilists about graded belief in the narrow sense, e.g. Bouyssou, Fishburn, Vincke, Pirlot in decision making, people like Halpern, Dubois, Prade, Spohn and their scholars in A.I., and of course almost everybody in mathematical psychology such as Luce and Tversky. But as I've said, most of their generalizations can be represented by more complicated probability representations such as sets of probabilities.

Anyway, it was nice chatting with you!


> Well, I agree mostly with you, except that I'm not a probabilist.

Good enough for me. :-) (Aumann's agreement theorem notwithstanding, I have to recognise the capacity for actual humans to agree is limited.)

> I'm definitely going to read [Jayne's book]!

I have yet to read it all, but the foundations are laid out early. The preface mostly explains where the author is coming from, chapter 1 and 2 do most of the justifications. The rest focuses more on applications of probability theory. My general impression was like:

  Matches my intuitions,
  solid theoretical foundations,
  works in practice...
  ...case closed I guess.
> On the evaluative side we have thought experiments like Spectrum Cases (Temkin, Rachels): Suppose A gives you extremely high pleasure for a month, B gives you a little bit less pleasure than A (barely noticeable) for 3 months, C gives you a little bit less pleasure than B (barely noticeable) for 9 months, and so on.

Hmm, that's a hard one. Depending on the value I attach to pleasure, there should be 3 possibilities: A is best, Z (or whatever is the last iteration) is best, or there's a sweet spot in between. I get the circular preference, and would likely fall prey to it if the circle is hidden to me. But stuff like that is like a big warning sign that most probably requires more thought than a quick intuitive judgement.

> Couldn't you just say that any heuristics are permissible in certain circumstances as shortcuts that - under these circumstances - are conducive to adequate probability approximations?

I could. I didn't because once we start making shortcuts, evaluating the impact on the shortcut on the final assessment is very difficult. Probabilities are non-linear, it's very easy for a seemingly innocuous approximation to translate to snowball into a huge error. But if careful, it often can (and does) work.

It's a bit like floating point approximations. The ideal math on real numbers is correct, floating points only introduce small errors, but some operations (like a division close to zero) can magnify those errors from "approximate" to "utterly wrong". Special care is typically taken to ensure that does not happen.

> Anyway, it was nice chatting with you!

For me as well. I'll keep your references in mind, and thank you for your patience.


My understanding of physics is that no die toss can be considered "fair" because such a macroscopic system behaves deterministically according to Newton's laws, and isn't even too chaotic to model accurately. No matter the shape or balance of the die, the outcome is determined by the initial conditions and the toss. A skilled gambler can make a fair die land however they want.

The only thing I know is that a well-made die is symmetrical, and so if I have no prior knowledge of its initial orientation then I have to use a uniform prior because nothing else has the requisite symmetry group.

The same could be said for a die that is just sitting on the table without having been observed by me yet, no toss needed.


> A skilled gambler can make a fair die land however they want.

No, they can't. Dice control is a myth, and there isn't a single study that backs it up.


> The idea that priors are somehow ruining an "objective" model is just absurd to me.

I think some caution can be justified to a certain extent (not the blind "emotional" objections). When establishing priors in a low data regime, one must necessarily be careful. It's a knob whose mass can change a lot in the inference conclusion. That said, if we trust our belief about the region the available data do not inform us well of, why not utilize our domain knowledge/belief?


I think the swedish fish approach is a particularly fun way: https://www.youtube.com/watch?v=3OJEae7Qb_o


I love the idea of making an approachable version of Ed Jaynes’s classic.


"For coin tosses both schools of thought work pretty well"

How many coin tosses in a row have to land heads before a frequentist decides that the coin is unfair?


6, if it's a two-sided test.


Explain?


I'm guessing 1 flip to pick Heads or Tails, and then 5 flips to get a 'good' p-value (2^-5 = 0.03125 < 0.05)


>First of all, p-values are not the way sane people answer questions

I think they are pretty close to the way sane people answer some kinds of questions.


Can anyone recommend a 'Bayesian statistics the hard way' book?


For the hard way, look at Bruno de Finetti's Theory of Probability:

https://onlinelibrary.wiley.com/doi/book/10.1002/97811192863...

Jaynes is certainly very deep and some sections are harder than others. It's interesting regardless of your level (this is a book worth rereading several times).

For a less technical, but full of insight, introduction see Dennis Lindley's Understanding Uncertainty:

https://onlinelibrary.wiley.com/doi/book/10.1002/97811186501...


Bayesian Data Analysis by Andrew Gelman

http://www.stat.columbia.edu/~gelman/book/


one vote for BDA. For programmers who learn better by implementing things, this book [1] is also good:

[1]: https://www.amazon.com/Bayesian-Methods-Hackers-Probabilisti...


Parts of that book are available online[1] for free. If not for that book I would never have understood how to apply Bayesian stats to problems that interested me.

[1] http://camdavidsonpilon.github.io/Probabilistic-Programming-...


Probability Theory: The Logic of Science by Edwin Jaynes


http://www.med.mcgill.ca/epidemiology/hanley/bios601/Gaussia...

But really, the first two chapters aren't that hard.


Statistical Rethinking: A Bayesian Course with Examples in R and Stan is also considered pretty good.


Thank you all!


[flagged]


What does this have to do with Bayes?

A frequentist assessing probabilities to make decisions about how to respond is in, if anything, a far worse position. The Bayesian would ideally use priors on groups and cross-group correlations to note that the weak evidence that purple -> mean barely shifts from their priors, and the inter-group mean shows that they are likely to be nice, unless your prior is that different colors have nearly independent probabilities of being nice or mean.


This seems like pretty flawed reasoning. What you are describing is not a prior but a posterior - the distribution after the observed data has been taken into account.

If anything the prior can help make you less racist by incorporating the knowledge that immutable characteristics are not good indicators of danger/not danger.

The thing is though, even if you are told race doesn't matter through a prior, if you observe a strong correlation over many instances it's going to be hard to ignore that regardless of your prior (what you are told). While it may not be a causal relationship, it may still be a good predictor.


A purpose of statistics is to make generalizations about a population. The solution to your problem is to have a larger sample size.


"anticipating meanness"

And that would be wise.


I got in an argument with a friend (a mechanical/electrical engineer) who knew about bayesian statistics. My other friend, a PhD in statistics, whom I had many discussions about because both personal interest and work interests, had supplied me with my modicum of statics knowledge.

My engineer friend called my PhD friend a "frequentist", like it was a dirty word, despite only having one, maybe two, classes in college about bayesian math/statistics/whatever (my ignorance).

This quote jumped out at me in the article:

"I wanted to write a book on Bayesian statistics that really anyone could pick up and use to gain real intuitions for how to think statistically and solve real problems using statistics."

In the context of the statement, it sounds like he is claimin any non-bayesian statistics is useless (or less valuable/reliable at best) than other forms of statistical analysis?


Having known Will when I lived in Reno I'm certain your focus should be on "anyone could pick up and use" and not any statement about the usefulness of other approaches. The Will I know is fundamentally about teaching things in very easy to understand ways, and curious about all approaches to solving a problem.


It just reads to me like he wants to make statistics accessible to a wide audience.


That's not how I'm reading that quote at all. Saying Bayesian stats can solve real problems doesn't imply frequentist stats can't.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: