Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Could you summarize the criticisms in this (pretty long) video, and what she is proposing as a better alternative (beanplots? or is she criticizing those too?)? I couldn't figure it out from perusing the transcript.

I think it's useful to be able to compare the approximate shapes of histograms during exploratory data analysis. Is the thesis of this criticism that this isn't actually a useful thing to do, or that violin plots don't achieve this, or is it "just" an aesthetic argument?



The summary is she is saying you almost always want to show one of two things (and not both):

1) To show the distribution, in which case just the histogram arranged horizontally in the traditional fashion is far better than a violin plot with 2 copies of the histogram vertically and some extra quartile stuff tacked on, especially since lots of standard libraries to do violin plots do kde with very extreme smoothing so the distribution they show can be very misleading as to the real empirical distribution.

2) To highlight the summary statistics (quartiles and median) in which case just the boxplot is better because generally these are hard to read on a violin plot

In case #1 this is usually because the distribution differs significantly from a Gaussian in some interesting way that would make a boxplot irrelevant or misleading. (eg it is bimodal or multimodal).

In case #2 this is usually because the distribution is Gaussian (or otherwise standard) and you want to compare it with other standard distributions. You don't need all the information in the histogram and to include it all would obscure the important point(s) you're trying to make about the median and quartiles. What is considered standard is going to depend a lot on the domain, audience and subject matter. In her case, she's an astrophysicist, so if you're looking at say red shift data from some observation, other astrophysicists will know the distribution you would expect to get from that sort of observation for example.

That video is basically a summary of all the conversation attached to this article in some ways.


This is helpful!

Is there a different name for the version of this that doesn't include the summary statistics on the same graph? I think seeing the distributions at different x-axis values (in my work, nearly always in a time series), but including the summary statistics is not as important and I agree that it's noisy.


Vertical kernel density estimation plot maybe? I'm not 100% sure what you mean. It would just be a vertical histogram if you're not doing kde.


I just mean the "violin" part - yes, just a vertical histogram, but centered - without including the "hard to read" summary metrics on top of it.


3) They look like THAT


This I don't get, they usually look a lot nicer than other visualizations I see. What's the issue here?


Watch the video to understand her perspective on this. I don't want to spoil them for you if you like the look and once seen it's hard to unsee.


Presumably you're referring to the "it looks like a vulva" thing that some other commenter mentioned, which honestly makes me think I must be trying to give credence to the opinions of people who have not progressed past adolescence, if this is truly their issue.


I think you're missing some nuance. She's saying this frequently leads to a situation where she (as a female scientist) is put in an uncomfortable/weird spot by a data visualisation because her colleagues/peers have (in your words) not progressed past adolescence. It seems completely unnecessary to use a data visualisation technique that leads to this issue, especially since it doesn't have any other particular benefit relative to more conventional techniques.

In any case - I don't personally use them not because of that but because of the reasons I gave[1] which she also mentions in the video - you usually want to present either the distribution (in which case a horizontal histogram without extreme kde smoothing or quartile info is usually better) or you want to highlight just the summary stats in which case the boxplot on its own (or just a table) is generally better. When I find I want to call out a given summary stat (median/mode/some quantile cutoff) on a histogram it's usually better in my view to just show the cutoff on the histogram and shade the tail (eg you frequently see hypothesis tests as a histogram with the critical region shaded and the CV1 number or whatever called out specifically).

[1] and one other which is they are even more confusing in many respects for non-experts than a boxplot so if I was to put one in a presentation or whatever I would find myself spending an undue amount of time explaining the plot rather than making whatever point I wanted to make with the plot which is never a good sign. It would be different for someone who tends to write for/present to fellow experts I imagine.


Well, I think it's crazy to let idiots keep people from using things that are useful. If it's not useful, then ok, but if it is, then that's a bad reason to avoid it.

And I just don't relate to this at all:

> you usually want to present either the distribution (in which case a horizontal histogram without extreme kde smoothing or quartile info is usually better)

Where I almost always see this is in time series plots where there is a distribution at each point. Horizontal histograms are not as intuitive for visualizing this, because plotting time on the x-axis is so universal. And while it is true that box plots work well for this when the distribution at each point is close to normal, it is not true that all data looks like this, and it's easy to not notice this if you default to using a box plot.

I do agree with this:

> or you want to highlight just the summary stats in which case the boxplot on its own (or just a table) is generally better

Yes, but you can also just leave off the summary stats from the "violin plot" (just like, as you point out, histograms usually don't and shouldn't include summary stats) in order to visualize only the shape of each distribution.

I also really don't care about the flourish of vertically centering / "reflecting" the distribution, a series of vertical histograms totally expresses the same information that I'm saying is useful here! People seem to find that ugly, which I figure is why they started doing the reflection thing to make it prettier, but I really don't have a strong view either way on which of these presentations is or isn't ugly or leads to awkward jokes. I just think "a series of distribution shapes laid out vertically" is a commonly useful visualization.

And I really don't know about your last point; I don't spend much time working with non-experts who don't understand histograms really well.


Well yes.


Her argument that convinced me, is that the same result can always be better represented with multiple histograms - z-stacked, side-by-side, 3D or ridgeline-plots (ridgeline plots look awesome). Check out her examples at 21:11.

Compared to these alternatives, violin plots are comically bad.


I watched that part of the video and I just truly don't think any of the options you listed here are as easy to parse as a normal violin chart. They look like the kind of thing I'd see in a superficial infographic, not a serious analysis.


The two other replies are her main point(s), but the video also spends some time on another issue that she labels as minor but I found interesting to hear the perspective on. I'll try to do it justice:

They look like vulvas. We're all adults, it's not a problem typically, but given that it's an aesthetic choice (noticing how half of the chart conveys the same info without this property), why? And it does come up, like if someone does make a joke about it, a room full of typically only well-meaning men will now look to her if she's comfortable with the joke and, what was okay before, now turns into a feeling of being singled out and outside the rest of the group


I'm honestly not sure what else to say about this besides: that's stupid. If someone is making that kind of joke and/or looking at the women in the room for validation ... how embarrassing for that childish person.


Perhaps, but (1) that's apparently what happens nevertheless and (2) to be clear, I'm just (further) answering the question about what's contained in this super long video


Yes, I do appreciate the info! At this point, I could have just watched the whole video, instead of replying piecemeal in comments :) But I appreciate you summarizing it.


Her criticisms of violin plots seem to be (1) they combine histogram-style information with box-plot-style information, when you generally would only want one or the other [ie: don't use boxplot for bimodal, don't use histogram when boxplot suffices], (2) The histogram-style information is not comparable between blobs of data, since they're not visually aligned, have no tick marks, etc — a plain histogram is better for this, and (3) she finds them ugly on a personal level.

EDIT: Maybe she'd be fine with using them in an exploratory manner. She seems to mainly be complaining about using them in publications, meant for other people to consume. Also: I did not watch the entire video (:


Thanks for this summary! I definitely hadn't seen the point about comparability between blobs of data because of the alignment. But that really seems like an odd point to me, as I almost entirely see / use these with time series data, where pretty much the whole point is to compare the evolution of the values over time using their "vertical" location, with a was to see the shape of a distribution of values at each point in time, at a glance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: