### Tuesday, March 07, 2006

## st: 4th German Users' Group Meeting: Final Announcement and Program

4th German Stata Users' Group Meeting: Announcement and Program ===============================================================

The 4th German Stata Users' Group Meeting will be held at the University of Mannheim (http://www.wz-berlin.de) on Friday, March 31th 2006.

The content of the meeting has been organized by Johannes Giesecke, University of Mannheim (jgiesecke@rumms.uni-mannheim.de), Ulrich Kohler, WZB (kohler@wz-berlin.de), and Fred Ramb, Deutsche Bundesbank (fred.ramb@bundesbank.de). The logistics are being organized by Dittrich and Partner (http://www.dpc.de), the distributor of Stata in several countries including Germany and Austria.

The meeting is open to all interested, and we will be happy if Stata users from neighboring countries join us. StataCorp will be represented. The conference language will be English due to the 'international' nature of the meeting and the participation of non-German guest speakers. There will be a "wishes and grumbles" session at which you may air your thoughts to Stata developers. There will also be an optional informal meal at a Mannheim restaurant on Friday evening (at additional cost of 20 Euro).

Participants are asked to travel on their own fees. There will be a small conference fee (regular 20 Euro, students 10 Euro) to cover costs for coffee, teas, and luncheons.

For further information on registration, please contact anke.mrosek@dpc.de. Mrs. Mrosek will also assist you in finding an accommodation. For general information about the meeting see also http://www.stata.com/mannheim06.

Readers of previous announcements should note that the conference venue has changed to Room W 117, located in the Schloss. You will find an exact plan of the conference venue on http://www.stata.com/mannheim06.

Note: Counting the number of windows, the Schloss of Mannheim is the biggest palace in Europe. Even if you don't trust the indicator, believe us: the Schloss is big. We therefore ask you to plan ample time. It is not difficult to find the Schloss, in Mannheim, but it probably is difficult to find the room within the Schloss.

Schedule of the 3rd German Stata Users' Group Meeting -----------------------------------------------------

8:45 Registration and coffee/tea

9:15 Welcome Johannes Giesecke

9:30 Resultssets, resultsspreadsheets and resultsplots in Stata Roger Newson, Imperial College London r.newson@imperial.ac.uk

Most Stata users make their living producing results in a form accessible to end users. Most of these end users cannot immediately understand Stata logs. However, they can understand tables (in paper, PDF, HTML, spreadsheet or word processor documents) and plots (produced using Stata or non--Stata software). Tables are produced by Stata as resultsspreadsheets, and plots are produced by Stata as resultsplots. Sometimes (but not always), resultsspreadsheets and resultsplots are produced using resultssets. Resultssets, resultsspreadsheets and resultsplots are all produced, directly or indirectly, as output by Stata commands. A resultsset is a Stata dataset, which is a table, whose rows are Stata observations and whose columns are Stata variables. A resultsspreadsheet is a table in generic text format, conforming to a TeX or HTML convention, or to another convention with a column separator string and possibly left and right row delimiter strings. A resultsplot is a plot produced as output, using a resultsset or a resultsspreadsheet as input. Resultsset--producing programs include -statsby-, -parmby-, -parmest-, -collapse-, -contract-, -xcollapse- and -xcontract-. Resultsspreadsheet--producing programs include -outsheet-, -listtex-, -estout- and -estimates table-. Resultsplot--producing programs include -eclplot- and -mileplot-. There are two main approaches (or dogmas) for generating resultsspreadsheets and resultsplots. The resultsset--centred dogma is followed by -parmest- and -parmby- users, and states: ``Datasets make resultssets, which make resultsplots and resultsspreadsheets''. The resultsspreadsheet--centred dogma is followed by -estout- and -estimates table- users, and states: ``Datasets make resultsspreadsheets, which make resultssets, which make resultsplots''. The two dogmas are complementary, and each dogma has its advantages and disadvantages. The resultsspreadsheet dogma is much easier for the casual user to learn to apply in a hurry, and is therefore probably preferred by most users most of the time. The resultsset dogma is more difficult for most users to learn, but is more convenient for users who wish to program everything in do-files, with little or no manual cutting and pasting.

10:20 Coffee

GLLAMM-Session --------------

10:30 Intervention evaluation using -gllamm- Andrew Pickles, University of Manchester (andrew.pickles@manchester.ac.uk)

The gllamm procedure provides a framework within which many of the more difficult analyses required for trials and intervention studies may be undertaken.

Treatment effect estimation in the presence of non-compliance can be undertaken using instrumental variable (IV) methods. We illustrate how gllamm can be used for IV estimation for the full range of types of treatment and outcome measures and describe how missing data may be tackled on an assumption of latent ignorability. Alternative approaches to account for clustering and the analysis of cluster-randomised studies will also be described.

Examples from studies of alcohol consumption of primary care patients, cognitive behaviour therapy of depression patients and a school based smoking intervention are discussed. 11:20 Estimating IRT models with -gllamm- Herbert Matschinger, University of Leipzig (math@medizin.uni-leipzig.de) Within the framework of economic evaluation, health econometricians are interested in constructing a meaningful health index that is consistent with individual or societal preferences. One way to derive such an index is based on the EQ-5D description and valuation of health related quality of life (HRQOL). The purpose of this study was to analyze how well the EQ-5D reflects one latent construct of HRQOL and how large is the potential impact of measurement variance with respect to six different countries. Data came from the European Study of the Epidemiology of Mental Disorders (ESEMeD), a cross-sectional survey of a representative random sample (N=21,425) in Belgium, France, Germany, Italy, the Netherlands and Spain. At least in psychology much attention is paid to different forms of IRT models and particularly the Rasch model, since it is the only model featuring specific objectivity which enables what is called a “fair comparison” with respect to the latent dimension to be measured. Therefore the dimensionality of the construct is evaluated by means of one-parameter and two-parameter Item Response Theory (IRT). Differential Item Functioning is tested with respect to the six countries and both the difficulty and discrimination parameters. Results show, that a unidimensional one-parameter IRT model holds for all countries if only the item “anxiety/depression” is omitted. If both the physical and the mental component of health related (HRQOL) should be represented the questionnaire should be extended to a two-dimensional construct. Consequently, more items to portray the mental component are then needed. This presentation will focus on the possibilities and restrictions in estimating these models with -gllamm-. It will be shown how these models can be established and tested. Problems regarding the structure of the data and the assignment of incidental parameters to individual observations will be discussed.

General Statistics ------------------

11:50 Variance estimation for Generalized Entropy and Atkinson inequality indices: the complex survey data case Martin Biewen, University of Frankfurt (biewen@wiwi.uni-frankfurt.de)

We derive the sampling variances of Generalized Entropy and Atkinson indices when estimated from complex survey data, and show how they can be calculated straightforwardly using widely- available software. We also show that, when the same approach is used to derive variance formulae for the i.i.d. case, it leads to estimators that are simpler than those proposed before. Both cases are illustrated with a comparison of income inequality in Britain and Germany.

12:20 Lunch

13:30 Linear mixed models in Stata Roberto G. Gutierrez, StataCorp (rgutierrez@stata.com)

Included with Stata version 9 is the new command xtmixed, for fitting linear mixed models. Mixed models containing both fixed and random effects. The fixed effects are analagous to standard regression coefficients and are estimated directly. The random effects are not directly estimated but are summarized according to the unique elements of their respective variance–covariance matrices, known as variance components. xtmixed syntax is summarized and demonstrated using several examples. In addition, xtmixed and its postestimation routines may be used to perform nonparametric smoothing via penalized splines.

User Written Programs ----------------------

14:20 Implementing Restricted Least Squares in Linear Models J. Haisken-DeNew, RWI Essen (jhaiskendenew@rwi-essen.de)

The presentation illustrates the user written program -hds97-, which implements the restricted least squares procedure as described by Haisken-DeNew and Schmidt (1997). Log wages are regressed on a group of k-1 industry/region/job/etc dummies. The k-th dummy is the omitted reference dummy. Using RLS, all k dummy coefficients and standard errors are reported. The coefficients are interpreted as percent-point deviations from the industry weighted average. An overall measure of dispersion is also reported.

This ado corrects problems with the Krueger and Summers (1988) Econometrica methodology of overstated differential standard errors, and understated overall dispersion.

General comments: The coefficients of continuous variables are not affected by -hds97-. Also, all results calculated in -hds97- are independent of the choice of the reference category. By the way, for all dummy variable sets having only two outcomes, i.e. male/female, the t-values of the hds97 adjusted coefficients are always equal in magnitude, but opposite in sign.

14:50 Sequence analysis using Stata Christian Brzinsky-Fay, WZB; Ulrich Kohler, WZB (brzinsky-fay@wz-berlin.de; kohler@wz-berlin.de)

Sequences are ordered lists of elements. A typical example for a sequence is the sequence of bases in the DNS of creatures. Other examples are sequences of employment stages during life time, or individual party-preferences over time. Sequence analysis include techniques to handle, describe, and, most importantly, to compare sequences among each other.

Sequences are most commonly used by scholars of genomes, but far less by social scientist. This is in so far surprising as sequence data is readily available in many datasets for the social sciences. In fact, all data from panel studies can be regarded as sequence data. Despite that, social scientists relatively seldom use panel data for sequence analysis. The first aim of the presentation therefore is to illustrate a typical research topics that can be dealt with sequence analysis. The second part will then describe a bundle of user written Stata programs for sequence analysis, including a Mata algorithm for performing optimal matching with the so called "Needleman-Wunsch" Algorithm.

15:30 Coffee

15:40 New Tools for Evaluating the Results of Cluster Analyses Hildegard Schaeper, HIS (schaeper@his.de)

Clustering methods are designed for finding groups in data, for grouping similar objects (variables or observations) into the same cluster and dissimilar objects into separate clusters. Whereas this main idea is rather simple, carrying out a cluster analysis remains a challenging task: The number of different clustering methods is huge and clustering includes many choices, such as the decision between basic approaches (e. g. hierarchical and partitioning methods), the choice of a dissimilarity or similarity measure, the selection of a particular linkage method when performing a hierarchical agglomerative cluster analysis, the choice of an initial partition when carrying out a partitioning cluster analysis, and the determination of the appropriate number of clusters. Each of these decisions and choices can affect the classification results. Apart from two commands for determining the number of clusters (cluster stop, cluster dendrogram) Stata has no inbuilt utilities which allow to examine clustering results. We, therefore, developed some simple tools which provide additional evaluation criteria:

– programs assisting in determining the number of clusters (Mojena’s stopping rules for hierarchical clustering techniques, PRE coefficient, F-Max statistic and Beale’s F values for a partitioning cluster analysis),

– a program for testing the stability of classifications produced by different cluster analyses (Rand index), and

– a program that computes ETA2 in order to assess how well the clustering variables separate the clusters.

In the presentation these programs will be presented, and their usefullness will be discussed in comparison with other tools for the evaluation of clustering results (agglomeration schedule, scree diagram).

Towards an Open Wish List to StataCorp --------------------------------------

16:10 Stata goes BUGS (via R) Susumu Shikano, University of Mannheim (shikanos@rumms.uni-mannheim.de)

Recently, Bayesian methods such as Markov chain Monte Carlo (MCMC) techniques find an increasing use in the social sciences, with (Win)BUGS being one of the most widely applied software for this kind of analysis. Unfortunately, due to the absence of MCMC techniques and any interfaces to WinBUGS or BUGS in Stata, Stata users who apply MCMC techniques have to perform such painful tasks as reformatting data by themselves. As a preliminary solution to this problem, one can call another statistical software R from inside Stata and use it as an interface to (Win)BUGS. This presentation outlines this solution providing an exemplar analysis.

16:40 Optimal Large Package Administration for Stata Markus Hahn, RWI Essen

The Stata package tool is quite simple to use for smaller ADO packages stored on user webpages. However when the number of files in a package becomes large and the files need to be updated on a regular basis, this becomes cumbersome. Package updates could take many minutes to complete. Here a method of storing packages as compressed archives on the host server is outlined, whereby the user sends a query to the update server to check for a new version. If a new version is available, the package archive is downloaded in its entirety, and then extracted and installed locally. This is far more efficient with respect to installation times (typically only 1/10 of the time needed) than downloading many text files individually. For large packages, the bottleneck is most often the download time. Currently this automated updating can be achieved with a Stata Ado and the aid of additional binaries (such as tar, gzip, zip). The usability of this technique would be enhanced dramatically if the functionality of an archiving format (such as tar, gzip, zip) were directly integrated into the Stata binary. Even encrpyted files could be distributed in this manner as well. Ado files inside the package archive can be configured to make an automatic call to the host server to check for available updates.

17:10 Coffee

17:20 Report to the users Alan Riley, StataCorp

17:50 Wishes and Grumpels

18:30 End of the Meeting

-- kohler@wz-berlin.de +49 (030) 25491-361

* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

Tag: statalist