### Sunday, February 26, 2006

## st: RE: Asymmetric kernel estimators

In addition to other comments -- and closest in spirit to Marcello Pagano's comment -- I implemented a log transform, kernel density and back transform option in -mdensity-, which is for Stata 6 and has been on SSC since 1999.

There is no public version of -mdensity- for Stata 8 up. Stata's rewriting of -kdensity- in Stata 8 made it difficult if not impossible for me to keep the same structure for -mdensity-, which was just a wrapper for -kdensity-.

No matter, the logic is quite easy and yields to a few lines, for which -mdensity- is not needed at all. There was a write-up in

Graphing distributions. SJ 4(1):66--88 (2004)

What follows is based on an extract. For the references, please buy or borrow a copy of the Stata Journal.

---------------------- Some simple devices extend the range of applications of Stata's official commands for kernel density estimation. First is the idea of estimating the density function on a transformed scale and then back-transforming the estimate to one for the raw scale. Two of the most natural transformations here, as elsewhere, are logarithms for positive variables and logit-like transformations for proportions and other data measured on some interval (a,b). The underlying general principle is that for a continuous monotone transformation t(x), the densities f(x) and f(t(x)) are related by f(x) = f(t(x)) |dt/dx|. This procedure is mentioned briefly by Silverman (1986, pp.27-30), although his worked example (p.28) is not very encouraging. Good expositions are given by Wand and Jones (1995, pp.43-45), Simonoff (1996, pp.61-64) and Bowman and Azzalini (1997, pp.14-16).

With a logarithmic transformation of x we have

estimate of f(x) = estimate of f(log x) times (1 / x),

given that d/dx (log x) = 1/x. Note in particular, if data are right skewed, that the result of this transformation is more smoothing in the tail and less near the main part of the distribution than in the default method. I have found this one of the most valuable ways of going beyond the default. It fits very well both the common finding that positive variables are right-skewed, suggesting a transformation such as the logarithm, and the common attitude that results on the original scale are of direct scientific or practical interest. To put it another way, the transformation behaves more like a link function than a classical transformation, given that end results are on the scale of the original response. You can get the best of both worlds.

Returning to the wage data, here is an illustrative (and certainly not definitive) example, in which we just use default kernel and width choice.

. gen logwage = log(wage) . kdensity logwage, at(logwage) gen(densitylog) . gen density = densitylog/wage . levels wage, local(levels) . line density wage, sort xtick(`levels', tposition(inside))

The density function ... is much smoother in the tails than the equivalent default .... However, the step in the left-hand tail needs investigation: is this some odd artefact or a genuine feature of the data?

<original continues>

For gammas, some people use cube roots (cf. Wilson-Hilferty transformation).

Nick n.j.cox@durham.ac.uk

Daniel Schneider > I am looking for a way to use asymmetric kernels in kernel density > estimations, for example a gamma kernel (Chen 2000). Unfortunately, I > have not been able to locate any implementation in Stata. > > Am I missing something or has anyone implemented an asymmetric kernel > estimator in Stata (my data is more or less from a gamma > distributions, > non-negative by definition)?

* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

Tag: statalist