第8章 制限従属変数モデル
実証例 8.1 (ロジットとプロビット)
データの読み込み・加工
pp.347-348.まず,データを加工します.
<- read.csv("piaac.csv") # データの読み込み
piaac_original
<- subset(piaac_original, gender == "Female") # 女性だけにサンプルを制限します.
piaac $emp <- 1 * (piaac$lfs == "Employed") piaac
線形確率モデル
ベースラインとなる線形確率モデルです.これは今までと同じです.
<- estimatr::lm_robust(emp ~ educ + age + couple + child,
ols data = piaac,
se_type = "stata")
summary(ols)
##
## Call:
## estimatr::lm_robust(formula = emp ~ educ + age + couple + child,
## data = piaac, se_type = "stata")
##
## Standard error type: HC1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
## (Intercept) 0.375932 0.105430 3.566 0.0003727 0.1691497 0.58271 1742
## educ 0.019482 0.005882 3.312 0.0009448 0.0079454 0.03102 1742
## age 0.002013 0.001166 1.726 0.0845504 -0.0002746 0.00430 1742
## couple -0.117846 0.031481 -3.743 0.0001874 -0.1795902 -0.05610 1742
## child 0.013723 0.012861 1.067 0.2860951 -0.0115009 0.03895 1742
##
## Multiple R-squared: 0.0137 , Adjusted R-squared: 0.01144
## F-statistic: 6.665 on 4 and 1742 DF, p-value: 2.541e-05
ちなみに,予測確率を確認すると,1を超える者が1名います.0を下回る者はいないようです.
sum(ols$fitted.values > 1)
## [1] 1
sum(ols$fitted.values < 0)
## [1] 0
プロビット・モデル
プロビット・モデルやロジット・モデルの推定には,glm()
という関数を使います.基本的な操作は,lm()
と同じです.
<- glm(emp ~ educ + age + couple + child,
probit family = binomial(link = "probit"),
data = piaac)
summary(probit)
##
## Call:
## glm(formula = emp ~ educ + age + couple + child, family = binomial(link = "probit"),
## data = piaac)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9149 -1.3774 0.8631 0.9542 1.2290
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.344270 0.288020 -1.195 0.231970
## educ 0.053671 0.016076 3.339 0.000842 ***
## age 0.005297 0.002997 1.767 0.077161 .
## couple -0.339106 0.097026 -3.495 0.000474 ***
## child 0.039649 0.033777 1.174 0.240461
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2264.1 on 1746 degrees of freedom
## Residual deviance: 2239.5 on 1742 degrees of freedom
## (958 observations deleted due to missingness)
## AIC: 2249.5
##
## Number of Fisher Scoring iterations: 4
平均限界効果を推定するためには,marginsというパッケージのmargins()
という関数が便利です.
<- margins::margins(probit)
probit_ame
summary(probit_ame)
## factor AME SE z p lower upper
## age 0.0019 0.0011 1.7721 0.0764 -0.0002 0.0041
## child 0.0145 0.0124 1.1751 0.2400 -0.0097 0.0388
## couple -0.1243 0.0352 -3.5298 0.0004 -0.1933 -0.0553
## educ 0.0197 0.0058 3.3702 0.0008 0.0082 0.0311
ロジット・モデル
教科書が第1刷である場合,表8-1のロジット・モデルの係数の推定値の標準誤差に誤植がありますから,注意してください.
<- glm(emp ~ educ + age + couple + child,
logit family = binomial(link = "logit"),
data = piaac)
summary(logit)
##
## Call:
## glm(formula = emp ~ educ + age + couple + child, family = binomial(link = "logit"),
## data = piaac)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9583 -1.3755 0.8639 0.9534 1.2324
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.578109 0.471171 -1.227 0.219837
## educ 0.086867 0.026266 3.307 0.000942 ***
## age 0.008822 0.004886 1.806 0.070975 .
## couple -0.558740 0.163200 -3.424 0.000618 ***
## child 0.071606 0.057635 1.242 0.214084
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2264.1 on 1746 degrees of freedom
## Residual deviance: 2239.4 on 1742 degrees of freedom
## (958 observations deleted due to missingness)
## AIC: 2249.4
##
## Number of Fisher Scoring iterations: 4
<- margins::margins(logit)
logit_ame
summary(logit_ame)
## factor AME SE z p lower upper
## age 0.0020 0.0011 1.8118 0.0700 -0.0002 0.0041
## child 0.0161 0.0129 1.2442 0.2134 -0.0093 0.0414
## couple -0.1255 0.0362 -3.4630 0.0005 -0.1965 -0.0545
## educ 0.0195 0.0058 3.3453 0.0008 0.0081 0.0309
実証例 8.2 (順序付きプロビット)
データの読み込み・加工
pp.356-357.まず,データを加工します.
<- read.csv("piaac.csv") # データの読み込み
piaac_original
<- subset(piaac_original, gender == "Male") # 女性だけにサンプルを制限します.
piaac$js <- factor(piaac$js, levels = c("Extremely dissatisfied",
piaac"Dissatisfied",
"Neither satisfied nor dissatisfied",
"Satisfied",
"Extremely satisfied"))
順序付きプロビット・モデル
順序付きプロビット・モデルの推定は,oglmxというパッケージのoprobit.reg()
という関数が便利です.基本的な使い方は,lm()
関数と同じです.
library(oglmx)
## Loading required package: maxLik
## Loading required package: miscTools
##
## Please cite the 'maxLik' package as:
## Henningsen, Arne and Toomet, Ott (2011). maxLik: A package for maximum likelihood estimation in R. Computational Statistics 26(3), 443-458. DOI 10.1007/s00180-010-0217-1.
##
## If you have questions, suggestions, or comments regarding the 'maxLik' package, please use a forum or 'tracker' at maxLik's R-Forge site:
## https://r-forge.r-project.org/projects/maxlik/
<- oprobit.reg(as.integer(js) ~ educ + age + couple + child, data = piaac)
result
summary(result)
## Ordered Probit Regression
## Log-Likelihood: -1436.441
## No. Iterations: 5
## McFadden's R2: 0.003613072
## AIC: 2888.883
## Estimate Std. error t value Pr(>|t|)
## educ 0.0305033 0.0120428 2.5329 0.01131 *
## age 0.0068156 0.0030683 2.2213 0.02633 *
## couple -0.0841328 0.1514264 -0.5556 0.57848
## child -0.0110474 0.0403781 -0.2736 0.78439
## ----- Threshold Parameters -----
## Estimate Std. error t value Pr(>|t|)
## Threshold (1->2) -1.77151 0.29171 -6.0728 1.257e-09 ***
## Threshold (2->3) -0.63336 0.27364 -2.3146 0.02064 *
## Threshold (3->4) 0.41447 0.27343 1.5158 0.12957
## Threshold (4->5) 2.02784 0.27761 7.3047 2.778e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
平均限界効果は同じくoglmxのmargins.oglmx()
という関数を用います.
margins.oglmx(result)
## Marginal Effects on Pr(Outcome==1)
## Marg. Eff Std. error t value Pr(>|t|)
## couple 1.6584e-03 2.7523e-03 0.6026 0.54679
## educ -6.5916e-04 3.1473e-04 -2.0943 0.03623 *
## age -1.4728e-04 7.6962e-05 -1.9137 0.05566 .
## child 2.3873e-04 8.7451e-04 0.2730 0.78486
## ------------------------------------
## Marginal Effects on Pr(Outcome==2)
## Marg. Eff Std. error t value Pr(>|t|)
## couple 0.01248158 0.02148408 0.5810 0.56126
## educ -0.00472759 0.00188871 -2.5031 0.01231 *
## age -0.00105633 0.00048038 -2.1989 0.02788 *
## child 0.00171220 0.00625940 0.2735 0.78444
## ------------------------------------
## Marginal Effects on Pr(Outcome==3)
## Marg. Eff Std. error t value Pr(>|t|)
## couple 0.01823813 0.03347800 0.5448 0.58591
## educ -0.00646781 0.00258424 -2.5028 0.01232 *
## age -0.00144516 0.00065647 -2.2014 0.02771 *
## child 0.00234246 0.00856280 0.2736 0.78442
## ------------------------------------
## Marginal Effects on Pr(Outcome==4)
## Marg. Eff Std. error t value Pr(>|t|)
## couple -0.01881137 0.03198370 -0.5882 0.55643
## educ 0.00718784 0.00287254 2.5023 0.01234 *
## age 0.00160604 0.00072992 2.2003 0.02779 *
## child -0.00260323 0.00951600 -0.2736 0.78442
## ------------------------------------
## Marginal Effects on Pr(Outcome==5)
## Marg. Eff Std. error t value Pr(>|t|)
## couple -0.01356679 0.02569570 -0.5280 0.59751
## educ 0.00466671 0.00186074 2.5080 0.01214 *
## age 0.00104272 0.00047297 2.2046 0.02748 *
## child -0.00169015 0.00617836 -0.2736 0.78442
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
実証例 8.3 (多項ロジット)
pp.365-366.まず,データを加工します.
データの読み込み・加工
<- read.csv("piaac.csv")
piaac_original
<- subset(piaac_original, gender == "Female")
piaac_female <- mlogit::mlogit.data(piaac_female, choice = "empstat_edt", shape = "wide") piaac
多項ロジット・モデル
多項ロジット・モデルの推定には,mlogitパッケージのmlogit()
という関数が便利です.
library(mlogit)
<- mlogit(empstat_edt ~ 1 | educ + age + couple + child,
result reflevel = "3",
data = piaac)
summary(result)
##
## Call:
## mlogit(formula = empstat_edt ~ 1 | educ + age + couple + child,
## data = piaac, reflevel = "3", method = "nr")
##
## Frequencies of alternatives:choice
## 3 1 2
## 0.38065 0.28506 0.33429
##
## nr method
## 4 iterations, 0h:0m:0s
## g'(-H)^-1g = 0.000468
## successive function values within tolerance limits
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept):1 -2.4891812 0.5739814 -4.3367 1.446e-05 ***
## (Intercept):2 -0.2080416 0.5289990 -0.3933 0.694117
## educ:1 0.1856352 0.0321446 5.7750 7.695e-09 ***
## educ:2 0.0032559 0.0296902 0.1097 0.912677
## age:1 0.0081549 0.0059179 1.3780 0.168200
## age:2 0.0071771 0.0054852 1.3085 0.190718
## couple:1 -0.8978057 0.1857095 -4.8345 1.335e-06 ***
## couple:2 -0.5414821 0.1830675 -2.9578 0.003098 **
## child:1 0.0849287 0.0666004 1.2752 0.202239
## child:2 0.0837459 0.0626562 1.3366 0.181355
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -1874
## McFadden R^2: 0.017435
## Likelihood ratio test : chisq = 66.504 (p.value = 2.4316e-11)
平均限界効果の計算にこれといった便利な方法が見つかりませんでした.仕方がないので,自分で計算します4.
<- result
model_result <- piaac
mydata
<- length(names(model_result$model))
coef_names_length <- coef_names_length - 2
coef_names_drop <- names(model_result$model)[c(-1, -coef_names_drop:-coef_names_length)]
coef_names
<- sapply(coef_names,
marginal_effects function(x) stats::effects(result, covariate = x, data = mydata),
simplify = FALSE)
<- t(sapply(marginal_effects, colMeans))
ame
ame# 1 2 3
# educ 0.0363882484 -0.0166267152 -0.019761533
# age 0.0009412598 0.0008230052 -0.001764265
# couple -0.1268631878 -0.0356907738 0.162553962
# child 0.0089617065 0.0105587969 -0.019520503
c(2, 3, 1)] # 見づらいので並び替え
ame[,
# 1 2 3
# educ 0.0363882484 -0.0166267152 -0.019761533
# age 0.0009412598 0.0008230052 -0.001764265
# couple -0.1268631878 -0.0356907738 0.162553962
# child 0.0089617065 0.0105587969 -0.019520503
標準誤差は次の通り計算してみましょう.
<- function(betas) {
ame_fun <- model_result
tmp $coefficients <- betas
tmp<- sapply(coef_names, function(x)
me_mnl effects(tmp, covariate = x, data = mydata), simplify = FALSE)
c(sapply(me_mnl, colMeans))
}
<- numDeriv::jacobian(ame_fun, model_result$coef)
grad <- matrix(sqrt(diag(grad %*% vcov(model_result) %*% t(grad))),
ame_se nrow = length(coef_names),
byrow = TRUE)
ame_se
# [,1] [,2] [,3]
# [1,] 0.005909444 0.005503409 0.005762793
# [2,] 0.001113900 0.001049001 0.001086593
# [3,] 0.037055885 0.029879947 0.033409403
# [4,] 0.013378596 0.010627150 0.011153053
c(2, 3, 1)]
ame_se[,
# [,1] [,2] [,3]
# [1,] 0.005503409 0.005762793 0.005909444
# [2,] 0.001049001 0.001086593 0.001113900
# [3,] 0.029879947 0.033409403 0.037055885
# [4,] 0.010627150 0.011153053 0.013378596
\(Z\)値は次の通りです.
<- ame[, c(2, 3, 1)] / ame_se[, c(2, 3, 1)]
z
z
# 1 2 3
# educ 6.6119466 -2.8851837 -3.344060
# age 0.8972914 0.7574184 -1.583862
# couple -4.2457635 -1.0682853 4.386725
# child 0.8432841 0.9467181 -1.459085
\(p\)値は次の通りです.
<- 2 * (1 - pnorm(abs(z)))
p
p
# 1 2 3
# educ 3.792988e-11 0.003911853 8.256197e-04
# age 3.695634e-01 0.448799257 1.132251e-01
# couple 2.178504e-05 0.285391843 1.150704e-05
# child 3.990696e-01 0.343782459 1.445418e-01
実証例8.4 (トービット)
pp.371-372.
データの読み込み・加工
<- read.csv("piaac.csv")
piaac_original
<- subset(piaac_original, gender == "Female") piaac
OLS
<- lm(hours ~ educ + age + couple + child, data = piaac)
ols summary(ols)
##
## Call:
## lm(formula = hours ~ educ + age + couple + child, data = piaac)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.49 -19.06 0.03 16.49 80.83
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.98657 4.15323 2.886 0.003949 **
## educ 0.85299 0.23203 3.676 0.000244 ***
## age 0.04638 0.04340 1.068 0.285450
## couple -6.80806 1.34330 -5.068 4.45e-07 ***
## child 0.67291 0.44251 1.521 0.128525
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.63 on 1731 degrees of freedom
## (969 observations deleted due to missingness)
## Multiple R-squared: 0.0216, Adjusted R-squared: 0.01934
## F-statistic: 9.553 on 4 and 1731 DF, p-value: 1.222e-07
トービット・モデル
トービット・モデルの推定には,censRegというパッケージのcensReg()
関数が便利です.
library(censReg)
##
## Please cite the 'censReg' package as:
## Henningsen, Arne (2017). censReg: Censored Regression (Tobit) Models. R package version 0.5. http://CRAN.R-Project.org/package=censReg.
##
## If you have questions, suggestions, or comments regarding the 'censReg' package, please use a forum or 'tracker' at the R-Forge site of the 'sampleSelection' project:
## https://r-forge.r-project.org/projects/sampleselection/
<- censReg(hours ~ educ + age + couple + child, data = piaac)
tobit
summary(tobit)
##
## Call:
## censReg(formula = hours ~ educ + age + couple + child, data = piaac)
##
## Observations:
## Total Left-censored Uncensored Right-censored
## 1736 608 1128 0
##
## Coefficients:
## Estimate Std. error t value Pr(> t)
## (Intercept) -0.26880 6.31684 -0.043 0.966058
## educ 1.29136 0.35170 3.672 0.000241 ***
## age 0.09362 0.06668 1.404 0.160319
## couple -9.43196 1.99461 -4.729 2.26e-06 ***
## child 0.99635 0.66882 1.490 0.136301
## logSigma 3.28764 0.02316 141.963 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Newton-Raphson maximisation, 9 iterations
## Return code 1: gradient close to zero (gradtol)
## Log-likelihood: -5856.426 on 6 Df
実証例8.5 (ヘキット)
pp.386-388.
データの読み込み・加工
<- read.csv("piaac.csv")
piaac_original
<- subset(piaac_original, gender == "Female")
piaac $lwage <- log(piaac$wage)
piaac$exp <- piaac$age - piaac$educ - 6
piaac$expsq <- (piaac$exp) ^ 2
piaac$selected <- 1 * (!is.na(piaac$lwage)) piaac
OLS
<- lm(lwage ~ educ + exp + expsq, data = piaac)
ols
summary(ols)
##
## Call:
## lm(formula = lwage ~ educ + exp + expsq, data = piaac)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.9565 -0.2838 -0.0544 0.2516 5.5254
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.8946054 0.1279650 46.064 < 2e-16 ***
## educ 0.0767333 0.0089127 8.609 < 2e-16 ***
## exp 0.0124602 0.0047378 2.630 0.00862 **
## expsq -0.0002220 0.0001013 -2.191 0.02862 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.663 on 1583 degrees of freedom
## (1118 observations deleted due to missingness)
## Multiple R-squared: 0.05641, Adjusted R-squared: 0.05463
## F-statistic: 31.55 on 3 and 1583 DF, p-value: < 2.2e-16
ヘキット・モデル
ヘキット・モデルの推定は,sampleSelectionというパッケージのheckit()
関数が便利です.
library(sampleSelection)
<- heckit(selected ~ educ + exp + expsq + couple + child,
twostep ~ educ + exp + expsq,
lwage data = piaac,
method = "2step")
summary(twostep)
## --------------------------------------------
## Tobit 2 model (sample selection model)
## 2-step Heckman / heckit estimation
## 1747 observations (763 censored and 984 observed)
## 13 free parameters (df = 1735)
## Probit selection equation:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.1734568 0.2987755 -3.928 8.92e-05 ***
## educ 0.0032513 0.0171564 0.190 0.8497
## exp 0.1332373 0.0144499 9.221 < 2e-16 ***
## expsq -0.0023922 0.0002509 -9.533 < 2e-16 ***
## couple -0.2876216 0.0938053 -3.066 0.0022 **
## child 0.0013419 0.0309743 0.043 0.9654
## Outcome equation:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.7862247 0.6725489 8.603 < 2e-16 ***
## educ 0.0861687 0.0131580 6.549 7.62e-11 ***
## exp 0.0091285 0.0339531 0.269 0.788
## expsq -0.0001668 0.0006090 -0.274 0.784
## Multiple R-Squared:0.0526, Adjusted R-Squared:0.0488
## Error terms:
## Estimate Std. Error t value Pr(>|t|)
## invMillsRatio 0.02771 0.38986 0.071 0.943
## sigma 0.71505 NA NA NA
## rho 0.03875 NA NA NA
## --------------------------------------------
<- heckit(selected ~ educ + exp + expsq + couple + child,
mle ~ educ + exp + expsq,
lwage data = piaac,
method = "ml")
summary(mle)
## --------------------------------------------
## Tobit 2 model (sample selection model)
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 3 iterations
## Return code 8: successive function values within relative tolerance limit (reltol)
## Log-Likelihood: -2207.948
## 1747 observations (763 censored and 984 observed)
## 12 free parameters (df = 1735)
## Probit selection equation:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.1749600 0.2991461 -3.928 8.91e-05 ***
## educ 0.0033490 0.0171841 0.195 0.84550
## exp 0.1331523 0.0144494 9.215 < 2e-16 ***
## expsq -0.0023909 0.0002509 -9.528 < 2e-16 ***
## couple -0.2882325 0.0937756 -3.074 0.00215 **
## child 0.0022663 0.0312037 0.073 0.94211
## Outcome equation:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.7679381 0.3473704 16.605 < 2e-16 ***
## educ 0.0861431 0.0131388 6.556 7.25e-11 ***
## exp 0.0100524 0.0175793 0.572 0.568
## expsq -0.0001834 0.0003119 -0.588 0.557
## Error terms:
## Estimate Std. Error t value Pr(>|t|)
## sigma 0.71536 0.01689 42.342 <2e-16 ***
## rho 0.05457 0.22029 0.248 0.804
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------