Hacker Newsnew | past | comments | ask | show | jobs | submitlogin



I was thinking the same thing while reading, but the author does mention them at the end (together with the bee swarm plot or sina plot, which I think is the better version of a violin plot)

https://www.rhoworld.com/i-swarm-you-swarm-we-all-swarm-for-...


I use violin plots but a complication is that the shape depends upon the bandwidth hyperparameter of the kernel density estimator that is used inside. The plot can differ a lot for different bandwidth values.

Selection of the 'proper' bandwidth is a classic bias-variance tradeoff problem.


While true, that's not an additional problem compared to box plots which effectively just set the bandwidth to maximum. So IMO they are strictly better.


I find violin plots suggest far smoother results than actually exist so you need to be careful with the amount of data.


I agree but so do box plots. I think probably the best thing is violin plots when there's lots of data and bee swarm plots when there isn't. But either are better than box plots.


What about using rotated, symmetric histograms--like a quantized violin plot?


The author mentions those at the bottom of the article, but two problems highlighted still remain:

* There's another intermediary concept (kernel density estimation) between the audience and the data

* They're still likely to misrepresent tight groupings and discontinuities, which will be smoothed out


Histograms and box plots are just clunky kernels density estimates too




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: