Package 'bndovb' reference manual

Title:	Bounding Omitted Variable Bias Using Auxiliary Data
Description:	Functions to implement a Hwang(2021) <doi:10.2139/ssrn.3866876> estimator, which bounds an omitted variable bias using auxiliary data.
Authors:	Yujung Hwang [aut, cre]
Maintainer:	Yujung Hwang <[email protected]>
License:	GPL-3
Version:	1.2
Built:	2025-02-19 03:59:22 UTC
Source:	https://github.com/yujunghwang/bndovb

A simulated auxiliary data to show how to use 'bndovbme' function with continuous proxy variables

Description

A simulated auxiliary data to show how to use 'bndovbme' function with continuous proxy variables

Usage

auxdat_mecont
auxdat_mecont

Format

A data frame with 3000 rows and 5 variables:

w1: A common covariate in both main and auxiliary data
x: A common covariate in both main and auxiliary data
z1: A continuous proxy variable
z2: A continuous proxy variable
z3: A continuous proxy variable

Source

This dataset was simulated by simulatePackageData.R in data-raw folder

A simulated auxiliary data to show how to use 'bndovbme' function with discrete proxy variables

Description

A simulated auxiliary data to show how to use 'bndovbme' function with discrete proxy variables

Usage

auxdat_medisc
auxdat_medisc

Format

A data frame with 3000 rows and 5 variables:

w1: A common covariate in both main and auxiliary data
x: A common covariate in both main and auxiliary data
z1: A discrete proxy variable
z2: A discrete proxy variable
z3: A discrete proxy variable

Source

This dataset was simulated by simulatePackageData.R in data-raw folder

A simulated auxiliary data to show how to use 'bndovb' function

Description

A simulated auxiliary data to show how to use 'bndovb' function

Usage

auxdat_nome
auxdat_nome

Format

A data frame with 50000 rows and 3 variables:

x1: An omitted variable in the main data
x2: A common covariate in both main and auxiliary data
x3: A common covariate in both main and auxiliary data

Source

This dataset was simulated by simulatePackageData.R in data-raw folder

bndovb

Description

This function runs a two sample least squares when auxiliary data contains every right-hand side regressor and main data contains a dependent variable and every right-hand side regressor but one omitted variable.

Usage

bndovb(
  maindat,
  auxdat,
  depvar,
  ovar,
  comvar,
  method = 1,
  mainweights = NULL,
  auxweights = NULL,
  signres = NULL,
  ci = FALSE,
  nboot = 100,
  scale = -1/2,
  tau = 0.05,
  seed = 210823,
  display = TRUE
)
bndovb(
  maindat,
  auxdat,
  depvar,
  ovar,
  comvar,
  method = 1,
  mainweights = NULL,
  auxweights = NULL,
  signres = NULL,
  ci = FALSE,
  nboot = 100,
  scale = -1/2,
  tau = 0.05,
  seed = 210823,
  display = TRUE
)

Arguments

`maindat`	Main data set. It must be a data frame.
`auxdat`	Auxiliary data set. It must be a data frame.
`depvar`	A name of a dependent variable in main dataset
`ovar`	A name of an omitted variable in main dataset which exists in auxiliary data
`comvar`	A vector of the names of common regressors existing in both main data and auxiliary data
`method`	CDF and Quantile function estimation method. Users can choose either 1 or 2. If the method is 1, the CDF and quantile function is estimated assuming a parametric normal distribution. If the method is 2, the CDF and quantile function is estimated using a nonparaemtric estimator in Li and Racine(2008) doi:10.1198/073500107000000250, Li, Lin, and Racine(2013) doi:10.1080/07350015.2012.738955. Default is 1.
`mainweights`	An optional weight vector for the main dataset. The length must be equal to the number of rows of 'maindat'.
`auxweights`	An optional weight vector for the auxiliary dataset. The length must be equal to the number of rows of 'auxdat'.
`signres`	An option to impose a sign restriction on a coefficient of an omitted variable. Set either NULL or pos or neg. Default is NULL. If NULL, there is no sign restriction. If 'pos', the estimator imposes an extra restriction that the coefficient of an omitted variable must be positive. If 'neg', the estimator imposes an extra restriction that the coefficient of an omitted variable must be negative.
`ci`	An option to compute an equal-tailed confidence interval. Default is FALSE. It may take some time to compute CI from bootstrap.
`nboot`	Number of bootstraps to compute the confidence interval. Default is 100.
`scale`	A tuning parameter for rescaled numerical bootstrap. The value must be between -1/2 and 0. (main data sample size)^scale is the tuning parameter epsilon_n in Hwang (2021). Default is -1/2 (that is, standard bootstrap).
`tau`	Significance level. (1-tau)% confidence interval is computed. Default is 0.05.
`seed`	Seed for random number generation. Default is 210823.
`display`	It must be either TRUE or FALSE. Whether to display progress and messages. Default is TRUE.

Value

Returns a list of 12 components :

hat_beta_l: lower bound estimates of regression coefficients
hat_beta_u: upper bound estimates of regression coefficients
mu_l: lower bound estimate of E[ovar*depvar]
mu_u: upper bound estimate of E[ovar*depvar]
hat_beta_l_cil: (1-tau)% confidence interval lower bound for hat_beta_l
hat_beta_l_ciu: (1-tau)% confidence interval upper bound for hat_beta_l
hat_beta_u_cil: (1-tau)% confidence interval lower bound for hat_beta_u
hat_beta_u_ciu: (1-tau)% confidence interval upper bound for hat_beta_u
mu_l_cil: (1-tau)% confidence interval lower bound for mu_l
mu_l_ciu: (1-tau)% confidence interval upper bound for mu_l
mu_u_cil: (1-tau)% confidence interval lower bound for mu_u
mu_u_ciu: (1-tau)% confidence interval upper bound for mu_u

Author(s)

Yujung Hwang, [email protected]

References

Hwang, Yujung (2021): Bounding Omitted Variable Bias Using Auxiliary Data. Available at SSRN.doi:10.2139/ssrn.3866876

Examples

data(maindat_nome)
data(auxdat_nome)

bndovb(maindat=maindat_nome,auxdat=auxdat_nome,depvar="y",ovar="x1",comvar=c("x2","x3"),method=1)


data(maindat_nome)
data(auxdat_nome)

bndovb(maindat=maindat_nome,auxdat=auxdat_nome,depvar="y",ovar="x1",comvar=c("x2","x3"),method=1)

bndovb_tuning

Description

This function computes an optimal tuning parameter to compute the confidence interval for bndovb function The function returns an optimal tuning parameter using double bootstrap procedure

Usage

bndovb_tuning(
  maindat,
  auxdat,
  depvar,
  ovar,
  comvar,
  method = 1,
  mainweights = NULL,
  auxweights = NULL,
  signres = NULL,
  nboot = 100,
  scalegrid = c(-1/2, -1/3, -1/4, -1/5, -1/6),
  tau = 0.05,
  seed = 210823,
  parallel = TRUE
)
bndovb_tuning(
  maindat,
  auxdat,
  depvar,
  ovar,
  comvar,
  method = 1,
  mainweights = NULL,
  auxweights = NULL,
  signres = NULL,
  nboot = 100,
  scalegrid = c(-1/2, -1/3, -1/4, -1/5, -1/6),
  tau = 0.05,
  seed = 210823,
  parallel = TRUE
)

Arguments

`maindat`	Main data set. It must be a data frame.
`auxdat`	Auxiliary data set. It must be a data frame.
`depvar`	A name of a dependent variable in main dataset
`ovar`	A name of an omitted variable in main dataset which exists in auxiliary data
`comvar`	A vector of the names of common regressors existing in both main data and auxiliary data
`method`	CDF and Quantile function estimation method. Users can choose either 1 or 2. If the method is 1, the CDF and quantile function is estimated assuming a parametric normal distribution. If the method is 2, the CDF and quantile function is estimated using a nonparaemtric estimator in Li and Racine(2008) doi:10.1198/073500107000000250, Li, Lin, and Racine(2013) doi:10.1080/07350015.2012.738955. Default is 1.
`mainweights`	An optional weight vector for the main dataset. The length must be equal to the number of rows of 'maindat'.
`auxweights`	An optional weight vector for the auxiliary dataset. The length must be equal to the number of rows of 'auxdat'.
`signres`	An option to impose a sign restriction on a coefficient of an omitted variable. Set either NULL or pos or neg. Default is NULL. If NULL, there is no sign restriction. If 'pos', the estimator imposes an extra restriction that the coefficient of an omitted variable must be positive. If 'neg', the estimator imposes an extra restriction that the coefficient of an omitted variable must be negative.
`nboot`	Number of bootstraps to compute the confidence interval. Default is 100.
`scalegrid`	Tuning parameter grid to search. It must be a vector of numbers between -1/2 and 0. Default is c(-1/2,-1/3,-1/4,-1/5,-1/6).
`tau`	Significance level. (1-tau)% confidence interval is computed. Default is 0.05.
`seed`	Seed for random number generation. Default is 210823.
`parallel`	Either TRUE or FALSE. Whether to compute in parallel. Default is TRUE.

Value

Returns a list of 3 components :

optimal_scale: An optimal scale parameter which gives coverage rates closest to (1-tau)
cover_beta_l: A matrix of coverage rates of the lower bound parameters under different scale parameters
cover_beta_u: A matrix of coverage rates of the lower bound parameters under different scale parameters

Author(s)

Yujung Hwang, [email protected]

References

Hwang, Yujung (2021): Bounding Omitted Variable Bias Using Auxiliary Data. Available at SSRN.doi:10.2139/ssrn.3866876

Examples

data(maindat_nome)
data(auxdat_nome)

# To shorten computation time, I set the number of bootstrap small in an example below.
# In practice, please set it a large number
bndovb_tuning(maindat_nome,auxdat_nome,depvar="y",ovar="x1",comvar=c("x2","x3"),method=1,nboot=2)


data(maindat_nome)
data(auxdat_nome)

# To shorten computation time, I set the number of bootstrap small in an example below.
# In practice, please set it a large number
bndovb_tuning(maindat_nome,auxdat_nome,depvar="y",ovar="x1",comvar=c("x2","x3"),method=1,nboot=2)

bndovbme

Description

This function runs a two sample least squares when main data contains a dependent variable and every right hand side regressor but one omitted variable. The function requires an auxiliary data which includes every right hand side regressor but one omitted variable, and enough proxy variables for the omitted variable. When the omitted variable is continuous, the auxiliary data must contain at least two continuous proxy variables. When the omitted variable is discrete, the auxiliary data must contain at least three continuous proxy variables.

Usage

bndovbme(
  maindat,
  auxdat,
  depvar,
  pvar,
  ptype = 1,
  comvar,
  sbar = 2,
  mainweights = NULL,
  auxweights = NULL,
  normalize = TRUE,
  signres = NULL,
  ci = FALSE,
  nboot = 100,
  scale = -1/2,
  tau = 0.05,
  seed = 210823,
  display = TRUE
)
bndovbme(
  maindat,
  auxdat,
  depvar,
  pvar,
  ptype = 1,
  comvar,
  sbar = 2,
  mainweights = NULL,
  auxweights = NULL,
  normalize = TRUE,
  signres = NULL,
  ci = FALSE,
  nboot = 100,
  scale = -1/2,
  tau = 0.05,
  seed = 210823,
  display = TRUE
)

Arguments

`maindat`	Main data set. It must be a data frame.
`auxdat`	Auxiliary data set. It must be a data frame.
`depvar`	A name of a dependent variable in main dataset
`pvar`	A vector of the names of the proxy variables for the omitted variable. When proxy variables are continuous, the first proxy variable is used as an anchoring variable. When proxy variables are discrete, the first proxy variable is used for initialization (For details, see a documentation for "dproxyme" function).
`ptype`	Either 1 (continuous) or 2 (discrete). Whether proxy variables are continuous or discrete. Default is 1 (continuous).
`comvar`	A vector of the names of the common regressors existing in both main data and auxiliary data
`sbar`	A cardinality of the support of the discrete proxy variables. Default is 2. If proxy variables are continuous, this variable is irrelevant.
`mainweights`	An optional weight vector for the main dataset. The length must be equal to the number of rows of 'maindat'.
`auxweights`	An optional weight vector for the auxiliary dataset. The length must be equal to the number of rows of 'auxdat'.
`normalize`	Whether to normalize the omitted variable to have mean 0 and standard deviation 1. Set TRUE or FALSE. Default is TRUE. If FALSE, then the scale of the omitted variable is anchored with the first proxy variable in pvar list.
`signres`	An option to impose a sign restriction on a coefficient of an omitted variable. Set either NULL or pos or neg. Default is NULL. If NULL, there is no sign restriction. If 'pos', the estimator imposes an extra restriction that the coefficient of an omitted variable must be positive. If 'neg', the estimator imposes an extra restriction that the coefficient of an omitted variable must be negative.
`ci`	An option to compute an equal-tailed confidence interval. Default is FALSE. It may take some time to compute CI from bootstrap.
`nboot`	Number of bootstraps to compute the confidence interval. Default is 100.
`scale`	A tuning parameter for rescaled numerical bootstrap. The value must be between -1/2 and 0. (main data sample size)^scale is the tuning parameter epsilon_n in Hwang (2021). Default is -1/2 (that is, standard bootstrap).
`tau`	Significance level. (1-tau)% confidence interval is computed. Default is 0.05.
`seed`	Seed for random number generation. Default is 210823.
`display`	It must be either TRUE or FALSE. Whether to display progress and messages. Default is TRUE.

Value

Returns a list of 4 components :

hat_beta_l: lower bound estimates of regression coefficients
hat_beta_u: upper bound estimates of regression coefficients
mu_l: lower bound estimate of E[ovar*depvar]
mu_u: upper bound estimate of E[ovar*depvar]
hat_beta_l_cil: (1-tau)% confidence interval lower bound for hat_beta_l
hat_beta_l_ciu: (1-tau)% confidence interval upper bound for hat_beta_l
hat_beta_u_cil: (1-tau)% confidence interval lower bound for hat_beta_u
hat_beta_u_ciu: (1-tau)% confidence interval upper bound for hat_beta_u
mu_l_cil: (1-tau)% confidence interval lower bound for mu_l
mu_l_ciu: (1-tau)% confidence interval upper bound for mu_l
mu_u_cil: (1-tau)% confidence interval lower bound for mu_u
mu_u_ciu: (1-tau)% confidence interval upper bound for mu_u

Author(s)

Yujung Hwang, [email protected]

References

Hwang, Yujung (2021): Bounding Omitted Variable Bias Using Auxiliary Data. Available at SSRN. doi:10.2139/ssrn.3866876

Examples

## load example data
data(maindat_mecont)
data(auxdat_mecont)

## set ptype=1 for continuous proxy variables
 pvar<-c("z1","z2","z3")
 cvar<-c("x","w1")
bndovbme(maindat=maindat_mecont,auxdat=auxdat_mecont,depvar="y",pvar=pvar,ptype=1,comvar=cvar)

## set ptype=2 for discrete proxy variables
data(maindat_medisc)
data(auxdat_medisc)
bndovbme(maindat=maindat_medisc,auxdat=auxdat_medisc,depvar="y",pvar=pvar,ptype=2,comvar=cvar)

## load example data
data(maindat_mecont)
data(auxdat_mecont)

## set ptype=1 for continuous proxy variables
 pvar<-c("z1","z2","z3")
 cvar<-c("x","w1")
bndovbme(maindat=maindat_mecont,auxdat=auxdat_mecont,depvar="y",pvar=pvar,ptype=1,comvar=cvar)

## set ptype=2 for discrete proxy variables
data(maindat_medisc)
data(auxdat_medisc)
bndovbme(maindat=maindat_medisc,auxdat=auxdat_medisc,depvar="y",pvar=pvar,ptype=2,comvar=cvar)

bndovbme_tuning

Description

This function computes an optimal tuning parameter to compute the confidence interval for bndovbme function The function returns an optimal tuning parameter using double bootstrap procedure

Usage

bndovbme_tuning(
  maindat,
  auxdat,
  depvar,
  pvar,
  ptype = 1,
  comvar,
  sbar = 2,
  mainweights = NULL,
  auxweights = NULL,
  normalize = TRUE,
  signres = NULL,
  nboot = 100,
  scalegrid = c(-1/2, -1/3, -1/4, -1/5, -1/6),
  tau = 0.05,
  seed = 210823,
  parallel = TRUE
)
bndovbme_tuning(
  maindat,
  auxdat,
  depvar,
  pvar,
  ptype = 1,
  comvar,
  sbar = 2,
  mainweights = NULL,
  auxweights = NULL,
  normalize = TRUE,
  signres = NULL,
  nboot = 100,
  scalegrid = c(-1/2, -1/3, -1/4, -1/5, -1/6),
  tau = 0.05,
  seed = 210823,
  parallel = TRUE
)

Arguments

`maindat`	Main data set. It must be a data frame.
`auxdat`	Auxiliary data set. It must be a data frame.
`depvar`	A name of a dependent variable in main dataset
`pvar`	A vector of the names of the proxy variables for the omitted variable. When proxy variables are continuous, the first proxy variable is used as an anchoring variable. When proxy variables are discrete, the first proxy variable is used for initialization (For details, see a documentation for "dproxyme" function).
`ptype`	Either 1 (continuous) or 2 (discrete). Whether proxy variables are continuous or discrete. Default is 1 (continuous).
`comvar`	A vector of the names of the common regressors existing in both main data and auxiliary data
`sbar`	A cardinality of the support of the discrete proxy variables. Default is 2. If proxy variables are continuous, this variable is irrelevant.
`mainweights`	An optional weight vector for the main dataset. The length must be equal to the number of rows of 'maindat'.
`auxweights`	An optional weight vector for the auxiliary dataset. The length must be equal to the number of rows of 'auxdat'.
`normalize`	Whether to normalize the omitted variable to have mean 0 and standard deviation 1. Set TRUE or FALSE. Default is TRUE. If FALSE, then the scale of the omitted variable is anchored with the first proxy variable in pvar list.
`signres`	An option to impose a sign restriction on a coefficient of an omitted variable. Set either NULL or pos or neg. Default is NULL. If NULL, there is no sign restriction. If 'pos', the estimator imposes an extra restriction that the coefficient of an omitted variable must be positive. If 'neg', the estimator imposes an extra restriction that the coefficient of an omitted variable must be negative.
`nboot`	Number of bootstraps to compute the confidence interval. Default is 100.
`scalegrid`	Tuning parameter grid to search. It must be a vector of numbers between -1/2 and 0. Default is c(-1/2,-1/3,-1/4,-1/5,-1/6).
`tau`	Significance level. (1-tau)% confidence interval is computed. Default is 0.05.
`seed`	Seed for random number generation. Default is 210823.
`parallel`	Either TRUE or FALSE. Whether to compute in parallel. Default is TRUE.

Value

Returns a list of 3 components :

optimal_scale: An optimal scale parameter which gives coverage rates closest to (1-tau)
cover_beta_l: A matrix of coverage rates of the lower bound parameters under different scale parameters
cover_beta_u: A matrix of coverage rates of the lower bound parameters under different scale parameters

Author(s)

Yujung Hwang, [email protected]

References

Hwang, Yujung (2021): Bounding Omitted Variable Bias Using Auxiliary Data. Available at SSRN.doi:10.2139/ssrn.3866876

Examples

## load example data
data(maindat_mecont)
data(auxdat_mecont)

## set ptype=1 for continuous proxy variables
pvar<-c("z1","z2","z3")
cvar<-c("x","w1")

# To shorten computation time, I set the number of bootstrap small in an example below.
# In practice, please set it a large number
bndovbme_tuning(maindat_mecont,auxdat_mecont,depvar="y",pvar=pvar,ptype=1,comvar=cvar,nboot=2)

## load example data
data(maindat_mecont)
data(auxdat_mecont)

## set ptype=1 for continuous proxy variables
pvar<-c("z1","z2","z3")
cvar<-c("x","w1")

# To shorten computation time, I set the number of bootstrap small in an example below.
# In practice, please set it a large number
bndovbme_tuning(maindat_mecont,auxdat_mecont,depvar="y",pvar=pvar,ptype=1,comvar=cvar,nboot=2)

A simulated main data to show how to use 'bndovbme' function with continuous proxy variables

Description

A simulated main data to show how to use 'bndovbme' function with continuous proxy variables

Usage

maindat_mecont
maindat_mecont

Format

A data frame with 3000 rows and 3 variables:

w1: A common covariate in both main and auxiliary data
x: A common covariate in both main and auxiliary data
y: A dependent variable

Source

This dataset was simulated by simulatePackageData.R in data-raw folder

A simulated main data to show how to use 'bndovbme' function with discrete proxy variables

Description

A simulated main data to show how to use 'bndovbme' function with discrete proxy variables

Usage

maindat_medisc
maindat_medisc

Format

A data frame with 3000 rows and 3 variables:

w1: A common covariate in both main and auxiliary data
x: A common covariate in both main and auxiliary data
y: A dependent variable

Source

This dataset was simulated by simulatePackageData.R in data-raw folder

A simulated main data to show how to use 'bndovb' function

Description

A simulated main data to show how to use 'bndovb' function

Usage

maindat_nome
maindat_nome

Format

A data frame with 100000 rows and 3 variables:

x2: A common covariate in both main and auxiliary data
x3: A common covariate in both main and auxiliary data
y: A dependent variable

Source

This dataset was simulated by simulatePackageData.R in data-raw folder

Package 'bndovb'

Help Index

A simulated auxiliary data to show how to use 'bndovbme' function with continuous proxy variables

Description

Usage

Format

Source

A simulated auxiliary data to show how to use 'bndovbme' function with discrete proxy variables

Description

Usage

Format

Source

A simulated auxiliary data to show how to use 'bndovb' function

Description

Usage

Format

Source

bndovb

Description

Usage

Arguments

Value

Author(s)

References

Examples

bndovb_tuning

Description

Usage

Arguments

Value

Author(s)

References

Examples

bndovbme

Description

Usage

Arguments

Value

Author(s)

References

Examples

bndovbme_tuning

Description

Usage

Arguments

Value

Author(s)

References

Examples

A simulated main data to show how to use 'bndovbme' function with continuous proxy variables

Description

Usage

Format

Source

A simulated main data to show how to use 'bndovbme' function with discrete proxy variables

Description

Usage

Format

Source

A simulated main data to show how to use 'bndovb' function

Description

Usage

Format

Source