## RE: st: Outliers in correlation analysis

I just read a great article on the effects of cleaning data (using trimming, winsorizing, etc.). You may want to have a look at it: Bollinger, C.R. and A. Chandra. 2005. "Iatrogenic Specification Error: A Cautionary Tale of Cleaning Data." Journal of Labor Economics 23(2): 235-257.

Also, just curious if you've tried the non-parametric Spearman test?

-----Original Message----- From: Maarten buis To: statalist@hsphsun2.harvard.edu Sent: 12/31/2005 3:36 AM Subject: RE: st: Outliers in correlation analysis

Hi Siddharth,

There is a very nice cartoon in (Fox 1991) about how we deal with outliers: You see a man in front of a blackboard with a scatterplot on it. He tries to fit a regression line through it, but there are some obvious outliers. He frowns at the outliers, than he gets an idea, picks up an eraser, erases the inconvenient points, draws the regression line, and looks very happy at the result.

This cartoon is probably not the reference you are looking for, but the John Fox "little green Sage book" on regression diagnostics would be a good place to start looking. Many ideas about diagnosing outliers can also be applied to correlation, since the correlation coefficient can be seen as the standardized regresion coefficient in a bivariate regression.

HTH and happy 2006, Maarten

John Fox (1991), "Regression Diagnostics, an Introduction". Thousand Oaks, Sage.

----------------------------------------- Maarten L. Buis Department of Social Research Methodology Vrije Universiteit Amsterdam Boelelaan 1081 1081 HV Amsterdam The Netherlands

visiting adress: Buitenveldertselaan 3 (Metropolitan), room Z214

+31 20 5986715

http://home.fsw.vu.nl/m.buis/ -----------------------------------------

Siddharth wrote: > I am trying to correlate two things using pearsons correlation, > the results are non-significant due to one particular outlier > (total number of observations = 36). If I exclude this outlier, > there is a strong correlation between the other 35 patients > (and this result makes biological sense) > > I checked if there were any biological reasons why this outlier > should be excluded, there are none. > > These are cognitive tests, and it is possible that the outlier > possibly was distracted and was not able to perform well. > > How should I deal with the problem? Are there any defined > criteria for dealing with outliers in correlational analysis > and if possible, is there a reference I can quote?

___________________________________________________________ To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

Tag: