Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Box plots [...] assume that your data follows a bell/gaussian shape.

Not sure how to square that with this statement on Wikipedia's page on box plots:

Box plots are non-parametric: they display variation in samples of a statistical population without making any assumptions of the underlying statistical distribution[3]



If you want to see why that is not fully correct you should read the article. For a box plot you need to calculate mean, variance and certain percentiles. These values don't make sense if your distribution does not follow a certain shape (because these values unambiguously define such a shape). See the examples in the article for what happens if you still try to use them in those cases. You can still extract the values of course (hence probably why wiki says they don't assume anything), but you lose significant information about the distribution. So you can no longer reverse the process.


> So you can no longer reverse the process

I've never understood this to be the purpose of a boxplot, only a means of visualizing a distribution's quartiles.

You've gotten a flood of comments from upset people, so I'll keep it short by saying that a boxplot doesn't actually do what you claim for Gaussians, as the 0 and 100 percentile "whiskers" would be at plus/minus infinity. As for a bounded bell-shaped distribution, there are several non-unique ways to define such a distribution.


> as the 0 and 100 percentile "whiskers" would be at plus/minus infinity

The point is not to plot an ideal Gaussian, the point is to plot the data.

In real life the whiskers are the actual minimum and maximum values observed.


> In real life the whiskers are the actual minimum and maximum values observed.

Look at this: https://upload.wikimedia.org/wikipedia/commons/1/1a/Boxplot_...

0.7% of all values are outside the whiskers.


There are two standard ways of doing box plots. One is miniums and maximums, the other is the 1.5 IQR method.

The very Wikipedia article your image comes from explains this:

https://en.wikipedia.org/wiki/Box_plot#Whiskers


> For a box plot you need to calculate mean, variance

Quantiles and medians. (Plus min and max.) Non-parametric.


Mean and variance have nothing to do with boxplots, you are mistaken.


> because these values unambiguously define such a shape

I think this is a misunderstanding, and I think it is shared by the author of the article. Boxpolots show ranges. That's it.


The mean and variance are not features of a box plot. Box plots show the quartiles, which are about the cumulative distribution.


Which is why I find the article so compelling because I'd always read box plots as being about variance. To me the plot implied a quite normal distribution.


Note that "not knowing how to correctly interpret a boxplot" is not equivalent to "boxplots are useless".


If people like me are in the audience, they might be worse than useless.


Sure. But if someone is using, for example, a notched boxplot to quickly evaluate differences in medians (i.e., they know how to correctly interpret a boxplot), it can still be a useful plot that conveys specific information that you would otherwise not get when looking at a violin plot, histogram, kernel density estimate or a strip plot.

My point, again, was: just because a boxplot is not useful to some people, doesn't mean that it is not a useful plot (particularly when augmented with a rugplot or a strip plot). Plots are not just used to convey information to others: they are also a useful tool in exploratory data analysis.

Notice that you can also apply the same critique to almost any plot: some people don't know how to interpret a violin plot (or kernel density estimate plot) correctly... does that make them useless?

The main advantage of a boxplot is that it is parameter-free (unlike histograms, violin plots and kernel density plots) and quickly conveys very specific information (median, range, quantiles, confidence interval for the median) that other types of plot usually don't.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: