In R, one can simply do:
library("ggplot2")
library("datasets")
ggplot(faithful, aes(x=eruptions)) + geom_density() + geom_rug()
which gives a chart like this (http://jean-francois.im/temp/eruptions-kde.png). Contrast with: ggplot(faithful, aes(x=eruptions)) + geom_histogram(binwidth=1)
which gives a chart like this (http://jean-francois.im/temp/eruptions-histogram.png).Edit: Other plots mentioned in this discussion:
ggplot(faithful, aes(x = eruptions)) + stat_ecdf(geom = "step")
Cumulative distribution, as suggested by leot (http://jean-francois.im/temp/eruptions-ecdf.png) qqnorm (faithful$eruptions)
Q-Q plot, as suggested by christopheraden (http://jean-francois.im/temp/eruptions-qq.png)A histogram is considered (by statisticians) to be a non-parametric density estimator. Kernel density estimation is also considered a non-parametric density estimator.
The kernel function you use does not depend on the distribution of your data. If you have normal data, you can use an equation to provide the 'optimal' bandwidth in that case, but this is about bandwidth selection and not the kernel itself.
You can also, say, fit a spline to a univariate dataset. We can also call this non-parametric in the sense that the number of knot parameters, etc., can grow with the data size. This doesn't use any probabalistic machinery until you actually 'fit' the spline.
My takeaway from the original post is that you should probably be aware of how things work if you use them, or the defaults might bite you. I like histograms but I don't like bin-size/position optimization algorithms and just use lots of bins, I like kernel density estimates with the data points lightly shown, and in either case you're gonna fool yourself a couple times.
And as you mention, ggplot is seriously awesome.
plot(density(Annie), col="red")
lines(density(Brian), col="blue")
lines(density(Chris), col="green")
lines(density(Zoe), col="cyan")
This is the plot you get: http://i.imgur.com/sY2awX7.pngBut of course, playing around with these parameters will hopefully give you a nice plot, insight into the problem and allow you to propose a proper model describing your data. Then you can fit this model to your data and extract the model parameters more precisely.
And when the distribution width of the toplogical features match your kernel sizes, of course, this PDF will look almost identical as the density plots.
To the credit of the shadier individuals in my profession, this histogram subtlety nicely highlights how it can be quite easy to bend the data to your argument using ad-hoc procedures (KDEs, hists, QQs, boxplots). A carefully chosen bin width, smoothing parameter, or covariate can present a different view of the data than some other parameter/covariate. That's why it's nice to have other statisticians capable of reproducing and disseminating the work.
There's a great story of a histogram of heights in Napoleon's army having two peaks, eliciting all sorts of theories. Reality was that height in the army was normally distributed, but that the data was collected in centimeters and people were looking at a histogram binned by inches. The middle bin contained only two centimeter counts while the bins on either side contained three each, thus having dramatically more counts.