Title: | Factor Model Estimation Using Proxy Variables |
---|---|
Description: | Functions to estimate a factor model using discrete and continuous proxy variables. The function 'dproxyme' estimates a factor model of discrete proxy variables using an EM algorithm (Dempster, Laird, Rubin (1977) <doi:10.1111/j.2517-6161.1977.tb01600.x>; Hu (2008) <doi:10.1016/j.jeconom.2007.12.001>; Hu(2017) <doi:10.1016/j.jeconom.2017.06.002> ). The function 'cproxyme' estimates a linear factor model (Cunha, Heckman, and Schennach (2010) <doi:10.3982/ECTA6551>). |
Authors: | Yujung Hwang [aut, cre] |
Maintainer: | Yujung Hwang <[email protected]> |
License: | GPL-3 |
Version: | 1.0 |
Built: | 2024-11-21 04:05:05 UTC |
Source: | https://github.com/yujunghwang/factormodel |
This function estimates a linear factor model using continuous variables. The linear factor model to estimate has the following form. proxy = intercept + factorloading * (latent variable) + measurement error The measurement error is assumed to follow a Normal distribution with a mean zero and a variance, which needs to be estimated.
cproxyme(dat, anchor = 1, weights = NULL)
cproxyme(dat, anchor = 1, weights = NULL)
dat |
A proxy variable data frame list. |
anchor |
This is a column index of an anchoring proxy variable. Default is 1. That is, the code will use the first column in dat data frame as an achoring variable. |
weights |
An optional weight vector |
Returns a list of 3 components :
This is a vector of intercepts in a linear factor model. The k-th entry is the intercept of k-th proxy variable factor model.
This is a vector of factor loadings. The k-th entry is the factor loading of k-th proxy variable. The factor loading of anchoring variable is normalized to 1.
This is a vector of variances of measurement errors in proxy variables. The k-th entry is the variance of k-th proxy measurement error. The measurement error is assumed to follow a Normal distribution with mean 0.
This is a mean of the latent variable. It is equal to the mean of the anchoring proxy variable.
This is a variance of the latent variable.
Yujung Hwang, [email protected]
Estimating the technology of cognitive and noncognitive skill formation. Econometrica, 78(3), 883-931. doi:10.3982/ECTA6551
Bounding Omitted Variable Bias Using Auxiliary Data. Working Paper. doi:10.2139/ssrn.3866876
dat1 <- data.frame(proxy1=c(1,2,3),proxy2=c(0.1,0.3,0.6),proxy3=c(2,3,5)) cproxyme(dat=dat1,anchor=1) ## you can specify weights cproxyme(dat=dat1,anchor=1,weights=c(0.1,0.5,0.4))
dat1 <- data.frame(proxy1=c(1,2,3),proxy2=c(0.1,0.3,0.6),proxy3=c(2,3,5)) cproxyme(dat=dat1,anchor=1) ## you can specify weights cproxyme(dat=dat1,anchor=1,weights=c(0.1,0.5,0.4))
This function estimates measurement stochastic matrices of discrete proxy variables.
dproxyme( dat, sbar = 2, initvar = 1, initvec = NULL, seed = 210313, tol = 0.005, maxiter = 200, miniter = 10, minobs = 100, maxiter2 = 1000, trace = FALSE, weights = NULL )
dproxyme( dat, sbar = 2, initvar = 1, initvec = NULL, seed = 210313, tol = 0.005, maxiter = 200, miniter = 10, minobs = 100, maxiter2 = 1000, trace = FALSE, weights = NULL )
dat |
A proxy variable data frame list. |
sbar |
A number of discrete types. Default is 2. |
initvar |
A column index of a proxy variable to initialize the EM algorithm. Default is 1. That is, the proxy variable in the first column of "dat" is used for initialization. |
initvec |
This vector defines how to group the initvar to initialize the EM algorithm. |
seed |
Seed. Default is 210313 (birthday of this package). |
tol |
A tolerance for EM algorithm. Default is 0.005. |
maxiter |
A maximum number of iterations for EM algorithm. Default is 200. |
miniter |
A minimum number of iterations for EM algorithm. Default is 10. |
minobs |
Compute likelihood of a proxy variable only if there are more than "minobs" observations. Default is 100. |
maxiter2 |
Maximum number of iterations for "multinom". Default is 1000. |
trace |
Whether to trace EM algorithm progress. Default is FALSE. |
weights |
An optional weight vector |
Returns a list of 5 components :
This is a list of estimated measurement (stochastic) matrices. The k-th matrix is a measurement matrix of a proxy variable saved in the kth column of dat data frame (or matrix). The ij-th element in a measurement matrix is the conditional probability of observing j-th (largest) proxy response value conditional on that the latent type is i.
This is a list of column labels of 'M_param' matrices
This is a list of row labels of 'M_param' matrices. It is simply c(1:sbar).
This is a list of multinomial logit coefficients which were used to compute 'M_param' matrices. These coefficients are useful to compute the likelihood of proxy responses.
This is a type probability matrix of size N-by-sbar. The ij-th entry of this matrix gives the probability of observation i to have type j.
Yujung Hwang, [email protected]
"Maximum likelihood from incomplete data via the EM algorithm." Journal of the Royal Statistical Society: Series B (Methodological) 39.1 : 1-22. doi:10.1111/j.2517-6161.1977.tb01600.x
Identification and estimation of nonlinear models with misclassification error using instrumental variables: A general solution. Journal of Econometrics, 144(1), 27-61. doi:10.1016/j.jeconom.2007.12.001
The econometrics of unobservables: Applications of measurement error models in empirical industrial organization and labor economics. Journal of Econometrics, 200(2), 154-168. doi:10.1016/j.jeconom.2017.06.002
Identification and Estimation of a Dynamic Discrete Choice Models with Endogenous Time-Varying Unobservable States Using Proxies. Working Paper. doi:10.2139/ssrn.3535098
Bounding Omitted Variable Bias Using Auxiliary Data. Working Paper. doi:10.2139/ssrn.3866876
dat1 <- data.frame(proxy1=c(1,2,3),proxy2=c(2,3,4),proxy3=c(4,3,2)) ## default minimum num of obs to run an EM algorithm is 10 dproxyme(dat=dat1,sbar=2,initvar=1,minobs=3) ## you can specify weights dproxyme(dat=dat1,sbar=2,initvar=1,minobs=3,weights=c(0.1,0.5,0.4))
dat1 <- data.frame(proxy1=c(1,2,3),proxy2=c(2,3,4),proxy3=c(4,3,2)) ## default minimum num of obs to run an EM algorithm is 10 dproxyme(dat=dat1,sbar=2,initvar=1,minobs=3) ## you can specify weights dproxyme(dat=dat1,sbar=2,initvar=1,minobs=3,weights=c(0.1,0.5,0.4))
This function is to make dummy variables using a discrete variable.
makeDummy(tZ)
makeDummy(tZ)
tZ |
An input vector |
Returns dZ, a matrix of size length(tZ)-by-card(tZ) :
The ij-th element in dZ is 1 if tZ[i] is equal to the j-th largest value of tZ. And the ij-th element in DZ is 0 otherwise. The row sum of dZ must be 1 by construction.
Yujung Hwang, [email protected]
makeDummy(c(1,2,3))
makeDummy(c(1,2,3))
This function is to compute an unbiased sample weighted covariance. The function uses only pairwise complete observations.
weighted.cov(x, y, w = NULL)
weighted.cov(x, y, w = NULL)
x |
An input vector to compute a covariance, cov(x,y) |
y |
An input vector to compute a covariance, cov(x,y) |
w |
A weight vector |
Returns an unbiased sample weighted covariance
Yujung Hwang, [email protected]
# If you do not specify weights, # it returns the usual unweighted sample covariance weighted.cov(x=c(1,3,5),y=c(2,3,1)) weighted.cov(x=c(1,3,5),y=c(2,3,1),w=c(0.1,0.5,0.4))
# If you do not specify weights, # it returns the usual unweighted sample covariance weighted.cov(x=c(1,3,5),y=c(2,3,1)) weighted.cov(x=c(1,3,5),y=c(2,3,1),w=c(0.1,0.5,0.4))
This function is to compute an unbiased sample weighted variance.
weighted.var(x, w = NULL)
weighted.var(x, w = NULL)
x |
A vector to compute a variance, var(x) |
w |
A weight vector |
Returns an unbiased sample weighted variance
Yujung Hwang, [email protected]
## If you do not specify weights, ## it returns the usual unweighted sample variance weighted.var(x=c(1,3,5)) weighted.var(x=c(1,3,5),w=c(0.1,0.5,0.4))
## If you do not specify weights, ## it returns the usual unweighted sample variance weighted.var(x=c(1,3,5)) weighted.var(x=c(1,3,5),w=c(0.1,0.5,0.4))