Thursday, March 16, 2006

st: -textbarplot- available from SSC

A few days ago, I mentioned a program -textbarplot- I was developing. This mention was in response to a posting by Hiroshi Maeda. Austin Nichols contributed helpfully to discussion.

My own earlier stimulus came from teaching. A student had a reasonable question and the Stata answer was rather complicated compared with what she wanted. This is no criticism of -graph bar- or -twoway bar-, the main alternatives for doing it: both are general, flexible and to that extent necessarily formidable in total.

Thanks to Kit Baum, this program is now available from SSC. Stata 8.2 is required.

The major point is best illustrated by example. On a happy holiday in Stockholm I naturally visited a good bookshop and acquired a simple statistical guide to Sweden.

The following data on home access to internet 2002 (%) come from Statistiska centralbyrån. 2003. Sweden in figures/Sverige i siffror 2004. p.52:

Men 66.7 Women 60.3 16-24 75.5 25-34 75.0 35-44 80.0 45-54 75.4 55-64 59.9 65-74 29.8 75-84 10.3 Labourers 49.9 Lower white collar 60.8 Managers and officials 83.5 Entrepreneurs 66.3 Farmers 26.8 Old-age pensioners 22.1

In cases like this the breakdown by sex (certainly) or by age or occupation (possibly) is too simple to justify a graph or even a table in a report, but the three together show enough to deserve a graph or table. What I (and my student with her data (and Hiroshi Maeda too with a similar example)) would like to be able to do is insist easily on precisely this order and precisely the blank lines specified. In effect we want almost an immediate graph command.

With -textbarplot-, you enter these data as two variables, say a string variable -text- and a numeric variable -access-, with blank strings and missing numerics in observations 3 and 11 to indicate blank lines. Then

. label var access "home access to internet (%)" . textbarplot text access

More generally, -textbarplot- produces a horizontal bar plot with text shown to the left of the bars. A textvar specifies the text and a barvar, which must be a numeric variable, specifies the magnitude of the bars.

By default

1. If textvar is a string variable, then observation numbers are used to determine row positions for the bars, which are y axis values on a reversed scale, and its values are used as text.

2. If textvar is a numeric variable with labels, then its numeric values are used to determine row positions, and its labels are used as text.

3. If textvar is an integer-valued numeric variable, then its numeric values are used to determine row positions and are also shown as text.

These defaults can be over-ridden by specifying an integer-valued y variable with the -y()- option. In that case there are no constraints on what textvar is.

textbarplot is a wrapper for -twoway bar- and -twoway scatter-. Other kinds of graph can be obtained by using a -recast()- option. The most useful alternatives are -recast(dot)-, -recast(dropline)- and -recast(spike)-.

By comparison, similar graphs with -graph hbar- can be a little awkward if there are to be gaps between clusters of bars or repeated category names. -textbarplot- is implemented in terms of -twoway- to provide a simpler alternative for some kinds of plots.

In short, -textbarplot- is at best a convenience command, but convenience is still preferable to its opposite.

P.S. There is a -vertical- option to yield giraffe graphics. It's usually a bad idea, but you can do it.


