Since an outlier may change the partial coefficients of regression, examining the residuals of a non-robust estimator results wrong conclusions. An outlier may change one or more regression coefficients and hide itself with a relatively small residual. This effect is called masking. This change in coefficients can get a clean observation distant from the regression object with higher residual. This effect is called swamping. A successful robust estimator should minimize these two effects to estimate regression coefficients in more precision.
The medmad function in R package galts can be used for robust estimation of regression coefficients. This package is hosted in the CRAN servers and can be installed in R terminal by typing
install.packages("galts")
Once the package is installed, its content can be used by typing
require("galts")
and the functions and help files can be ready to use after typing an enter key. Here is a complete example of generating a regression data, contaminating some observations and estimating the robust regression coefficients:
The output is
(Intercept) x1 x2
4.979828 4.993914 4.985901
and the medmad function returns in 0.25 seconds in an Intel i5 computer with 8 GBs ram installed.
in which the parameters are near to 5 as the data is generated before. The details of this algorithm can be found in the paper
Satman, Mehmet Hakan. "A New Algorithm for Detecting Outliers in Linear Regression." International Journal of Statistics and Probability 2.3 (2013): p101.
which is avaliable at site
http://www.ccsenet.org/journal/index.php/ijsp/article/view/28207
and
http://www.ccsenet.org/journal/index.php/ijsp/article/download/28207/17282
Have a nice detect!
No comments:
Post a Comment
Thanks