The command winsorize
winsorizes observations based 5 times the interquartile.
- By default, outliers are defined as observatations with a distance to median higher than 5 times the interquartile. You can also use the option
p(pmin pmax)
to define outliers as the values below and above the specified percentiles. Use0
for `pmax' to avoid defining outliers in one direction - With the option
, variable is replaced by top coded one. With the optiongen
, a new variable is created - with the option
, outliers are replaced by missing values rather than top coded - with the option
, outliers are defined within groups defined by the list of variables inby
The overall syntax is
winsorize [varlist] [if] [in] [, p(pmin pmax) replace gen(varlist) trim by(varlist)]
sysuse nlsw88.dta, clear
winsorize hours, replace
winsorize hours, p(1 99) replace
winsorize hours, p(0 99) replace
winsorize hours, gen(newhours)
net install winsorize, from("")
If you have a version of Stata < 13, you need to install it manually
Click the "Download ZIP" button in the right column to download a zipfile.
Extract it into a folder (e.g. ~/SOMEFOLDER)
cap ado uninstall winsorize net install winsorize, from("~/SOMEFOLDER")