第8章 制限従属変数モデル

実証例 8.1 (ロジットとプロビット)

データの読み込み・加工

pp.347-348.まず,データを加工します.

piaac_original <- read.csv("piaac.csv") # データの読み込み

piaac <- subset(piaac_original, gender == "Female") # 女性だけにサンプルを制限します.
piaac$emp <- 1 * (piaac$lfs == "Employed") 

線形確率モデル

ベースラインとなる線形確率モデルです.これは今までと同じです.

ols <- estimatr::lm_robust(emp ~ educ + age + couple + child, 
                           data = piaac,
                           se_type = "stata")
summary(ols)
## 
## Call:
## estimatr::lm_robust(formula = emp ~ educ + age + couple + child, 
##     data = piaac, se_type = "stata")
## 
## Standard error type:  HC1 
## 
## Coefficients:
##              Estimate Std. Error t value  Pr(>|t|)   CI Lower CI Upper   DF
## (Intercept)  0.375932   0.105430   3.566 0.0003727  0.1691497  0.58271 1742
## educ         0.019482   0.005882   3.312 0.0009448  0.0079454  0.03102 1742
## age          0.002013   0.001166   1.726 0.0845504 -0.0002746  0.00430 1742
## couple      -0.117846   0.031481  -3.743 0.0001874 -0.1795902 -0.05610 1742
## child        0.013723   0.012861   1.067 0.2860951 -0.0115009  0.03895 1742
## 
## Multiple R-squared:  0.0137 ,    Adjusted R-squared:  0.01144 
## F-statistic: 6.665 on 4 and 1742 DF,  p-value: 2.541e-05

ちなみに,予測確率を確認すると,1を超える者が1名います.0を下回る者はいないようです.

sum(ols$fitted.values > 1)
## [1] 1

sum(ols$fitted.values < 0)
## [1] 0

プロビット・モデル

プロビット・モデルやロジット・モデルの推定には,glm()という関数を使います.基本的な操作は,lm()と同じです.

probit <- glm(emp ~ educ + age + couple + child, 
                  family = binomial(link = "probit"), 
                  data = piaac)
summary(probit)
## 
## Call:
## glm(formula = emp ~ educ + age + couple + child, family = binomial(link = "probit"), 
##     data = piaac)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9149  -1.3774   0.8631   0.9542   1.2290  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.344270   0.288020  -1.195 0.231970    
## educ         0.053671   0.016076   3.339 0.000842 ***
## age          0.005297   0.002997   1.767 0.077161 .  
## couple      -0.339106   0.097026  -3.495 0.000474 ***
## child        0.039649   0.033777   1.174 0.240461    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2264.1  on 1746  degrees of freedom
## Residual deviance: 2239.5  on 1742  degrees of freedom
##   (958 observations deleted due to missingness)
## AIC: 2249.5
## 
## Number of Fisher Scoring iterations: 4

平均限界効果を推定するためには,marginsというパッケージのmargins()という関数が便利です.

probit_ame <- margins::margins(probit)

summary(probit_ame)
##  factor     AME     SE       z      p   lower   upper
##     age  0.0019 0.0011  1.7721 0.0764 -0.0002  0.0041
##   child  0.0145 0.0124  1.1751 0.2400 -0.0097  0.0388
##  couple -0.1243 0.0352 -3.5298 0.0004 -0.1933 -0.0553
##    educ  0.0197 0.0058  3.3702 0.0008  0.0082  0.0311

ロジット・モデル

教科書が第1刷である場合,表8-1のロジット・モデルの係数の推定値の標準誤差に誤植がありますから,注意してください.

logit <- glm(emp ~ educ + age + couple + child, 
                  family = binomial(link = "logit"), 
                  data = piaac)

summary(logit)
## 
## Call:
## glm(formula = emp ~ educ + age + couple + child, family = binomial(link = "logit"), 
##     data = piaac)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9583  -1.3755   0.8639   0.9534   1.2324  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.578109   0.471171  -1.227 0.219837    
## educ         0.086867   0.026266   3.307 0.000942 ***
## age          0.008822   0.004886   1.806 0.070975 .  
## couple      -0.558740   0.163200  -3.424 0.000618 ***
## child        0.071606   0.057635   1.242 0.214084    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2264.1  on 1746  degrees of freedom
## Residual deviance: 2239.4  on 1742  degrees of freedom
##   (958 observations deleted due to missingness)
## AIC: 2249.4
## 
## Number of Fisher Scoring iterations: 4
logit_ame <- margins::margins(logit)

summary(logit_ame)
##  factor     AME     SE       z      p   lower   upper
##     age  0.0020 0.0011  1.8118 0.0700 -0.0002  0.0041
##   child  0.0161 0.0129  1.2442 0.2134 -0.0093  0.0414
##  couple -0.1255 0.0362 -3.4630 0.0005 -0.1965 -0.0545
##    educ  0.0195 0.0058  3.3453 0.0008  0.0081  0.0309

実証例 8.2 (順序付きプロビット)

データの読み込み・加工

pp.356-357.まず,データを加工します.

piaac_original <- read.csv("piaac.csv") # データの読み込み

piaac<- subset(piaac_original, gender == "Male") # 女性だけにサンプルを制限します.
piaac$js <- factor(piaac$js, levels = c("Extremely dissatisfied",
                                             "Dissatisfied", 
                                             "Neither satisfied nor dissatisfied",
                                             "Satisfied",
                                             "Extremely satisfied"))

順序付きプロビット・モデル

順序付きプロビット・モデルの推定は,oglmxというパッケージのoprobit.reg()という関数が便利です.基本的な使い方は,lm()関数と同じです.

library(oglmx)
## Loading required package: maxLik
## Loading required package: miscTools
## 
## Please cite the 'maxLik' package as:
## Henningsen, Arne and Toomet, Ott (2011). maxLik: A package for maximum likelihood estimation in R. Computational Statistics 26(3), 443-458. DOI 10.1007/s00180-010-0217-1.
## 
## If you have questions, suggestions, or comments regarding the 'maxLik' package, please use a forum or 'tracker' at maxLik's R-Forge site:
## https://r-forge.r-project.org/projects/maxlik/

result <- oprobit.reg(as.integer(js) ~ educ + age + couple + child, data = piaac)

summary(result)
## Ordered Probit Regression 
## Log-Likelihood: -1436.441 
## No. Iterations: 5 
## McFadden's R2: 0.003613072 
## AIC: 2888.883 
##          Estimate Std. error t value Pr(>|t|)  
## educ    0.0305033  0.0120428  2.5329  0.01131 *
## age     0.0068156  0.0030683  2.2213  0.02633 *
## couple -0.0841328  0.1514264 -0.5556  0.57848  
## child  -0.0110474  0.0403781 -0.2736  0.78439  
## ----- Threshold Parameters -----
##                  Estimate Std. error t value  Pr(>|t|)    
## Threshold (1->2) -1.77151    0.29171 -6.0728 1.257e-09 ***
## Threshold (2->3) -0.63336    0.27364 -2.3146   0.02064 *  
## Threshold (3->4)  0.41447    0.27343  1.5158   0.12957    
## Threshold (4->5)  2.02784    0.27761  7.3047 2.778e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

平均限界効果は同じくoglmxmargins.oglmx()という関数を用います.

margins.oglmx(result)
## Marginal Effects on Pr(Outcome==1)
##          Marg. Eff  Std. error t value Pr(>|t|)  
## couple  1.6584e-03  2.7523e-03  0.6026  0.54679  
## educ   -6.5916e-04  3.1473e-04 -2.0943  0.03623 *
## age    -1.4728e-04  7.6962e-05 -1.9137  0.05566 .
## child   2.3873e-04  8.7451e-04  0.2730  0.78486  
## ------------------------------------ 
## Marginal Effects on Pr(Outcome==2)
##          Marg. Eff  Std. error t value Pr(>|t|)  
## couple  0.01248158  0.02148408  0.5810  0.56126  
## educ   -0.00472759  0.00188871 -2.5031  0.01231 *
## age    -0.00105633  0.00048038 -2.1989  0.02788 *
## child   0.00171220  0.00625940  0.2735  0.78444  
## ------------------------------------ 
## Marginal Effects on Pr(Outcome==3)
##          Marg. Eff  Std. error t value Pr(>|t|)  
## couple  0.01823813  0.03347800  0.5448  0.58591  
## educ   -0.00646781  0.00258424 -2.5028  0.01232 *
## age    -0.00144516  0.00065647 -2.2014  0.02771 *
## child   0.00234246  0.00856280  0.2736  0.78442  
## ------------------------------------ 
## Marginal Effects on Pr(Outcome==4)
##          Marg. Eff  Std. error t value Pr(>|t|)  
## couple -0.01881137  0.03198370 -0.5882  0.55643  
## educ    0.00718784  0.00287254  2.5023  0.01234 *
## age     0.00160604  0.00072992  2.2003  0.02779 *
## child  -0.00260323  0.00951600 -0.2736  0.78442  
## ------------------------------------ 
## Marginal Effects on Pr(Outcome==5)
##          Marg. Eff  Std. error t value Pr(>|t|)  
## couple -0.01356679  0.02569570 -0.5280  0.59751  
## educ    0.00466671  0.00186074  2.5080  0.01214 *
## age     0.00104272  0.00047297  2.2046  0.02748 *
## child  -0.00169015  0.00617836 -0.2736  0.78442  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

実証例 8.3 (多項ロジット)

pp.365-366.まず,データを加工します.

データの読み込み・加工

piaac_original <- read.csv("piaac.csv")

piaac_female <- subset(piaac_original, gender == "Female")
piaac <- mlogit::mlogit.data(piaac_female, choice = "empstat_edt", shape = "wide") 

多項ロジット・モデル

多項ロジット・モデルの推定には,mlogitパッケージのmlogit()という関数が便利です.

library(mlogit)
result <- mlogit(empstat_edt ~ 1 | educ + age + couple + child, 
                 reflevel = "3", 
                 data = piaac)

summary(result)
## 
## Call:
## mlogit(formula = empstat_edt ~ 1 | educ + age + couple + child, 
##     data = piaac, reflevel = "3", method = "nr")
## 
## Frequencies of alternatives:choice
##       3       1       2 
## 0.38065 0.28506 0.33429 
## 
## nr method
## 4 iterations, 0h:0m:0s 
## g'(-H)^-1g = 0.000468 
## successive function values within tolerance limits 
## 
## Coefficients :
##                 Estimate Std. Error z-value  Pr(>|z|)    
## (Intercept):1 -2.4891812  0.5739814 -4.3367 1.446e-05 ***
## (Intercept):2 -0.2080416  0.5289990 -0.3933  0.694117    
## educ:1         0.1856352  0.0321446  5.7750 7.695e-09 ***
## educ:2         0.0032559  0.0296902  0.1097  0.912677    
## age:1          0.0081549  0.0059179  1.3780  0.168200    
## age:2          0.0071771  0.0054852  1.3085  0.190718    
## couple:1      -0.8978057  0.1857095 -4.8345 1.335e-06 ***
## couple:2      -0.5414821  0.1830675 -2.9578  0.003098 ** 
## child:1        0.0849287  0.0666004  1.2752  0.202239    
## child:2        0.0837459  0.0626562  1.3366  0.181355    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -1874
## McFadden R^2:  0.017435 
## Likelihood ratio test : chisq = 66.504 (p.value = 2.4316e-11)

平均限界効果の計算にこれといった便利な方法が見つかりませんでした.仕方がないので,自分で計算します4

model_result <- result
mydata <- piaac

coef_names_length <- length(names(model_result$model))
coef_names_drop <- coef_names_length - 2
coef_names <- names(model_result$model)[c(-1, -coef_names_drop:-coef_names_length)]
  
marginal_effects <- sapply(coef_names, 
                           function(x) stats::effects(result, covariate = x, data = mydata), 
                           simplify = FALSE) 
ame <- t(sapply(marginal_effects, colMeans))
  
ame
#                    1             2            3
# educ    0.0363882484 -0.0166267152 -0.019761533
# age     0.0009412598  0.0008230052 -0.001764265
# couple -0.1268631878 -0.0356907738  0.162553962
# child   0.0089617065  0.0105587969 -0.019520503

ame[, c(2, 3, 1)] # 見づらいので並び替え

#                    1             2            3
# educ    0.0363882484 -0.0166267152 -0.019761533
# age     0.0009412598  0.0008230052 -0.001764265
# couple -0.1268631878 -0.0356907738  0.162553962
# child   0.0089617065  0.0105587969 -0.019520503

標準誤差は次の通り計算してみましょう.

ame_fun <- function(betas) {
  tmp <- model_result
  tmp$coefficients <- betas
  me_mnl <- sapply(coef_names, function(x)
    effects(tmp, covariate = x, data = mydata), simplify = FALSE)
  c(sapply(me_mnl, colMeans))
}

grad <- numDeriv::jacobian(ame_fun, model_result$coef)
ame_se <- matrix(sqrt(diag(grad %*% vcov(model_result) %*% t(grad))), 
                 nrow = length(coef_names), 
                 byrow = TRUE)

ame_se

#             [,1]        [,2]        [,3]
# [1,] 0.005909444 0.005503409 0.005762793
# [2,] 0.001113900 0.001049001 0.001086593
# [3,] 0.037055885 0.029879947 0.033409403
# [4,] 0.013378596 0.010627150 0.011153053

ame_se[, c(2, 3, 1)]

#             [,1]        [,2]        [,3]
# [1,] 0.005503409 0.005762793 0.005909444
# [2,] 0.001049001 0.001086593 0.001113900
# [3,] 0.029879947 0.033409403 0.037055885
# [4,] 0.010627150 0.011153053 0.013378596

\(Z\)値は次の通りです.

z <- ame[, c(2, 3, 1)] / ame_se[, c(2, 3, 1)]
z

#                 1          2         3
# educ    6.6119466 -2.8851837 -3.344060
# age     0.8972914  0.7574184 -1.583862
# couple -4.2457635 -1.0682853  4.386725
# child   0.8432841  0.9467181 -1.459085

\(p\)値は次の通りです.

p <- 2 * (1 - pnorm(abs(z)))
p

#                   1           2            3
# educ   3.792988e-11 0.003911853 8.256197e-04
# age    3.695634e-01 0.448799257 1.132251e-01
# couple 2.178504e-05 0.285391843 1.150704e-05
# child  3.990696e-01 0.343782459 1.445418e-01

実証例8.4 (トービット)

pp.371-372.

データの読み込み・加工

piaac_original <- read.csv("piaac.csv")

piaac <- subset(piaac_original, gender == "Female")

OLS

ols <- lm(hours ~ educ + age + couple + child, data = piaac)
summary(ols)
## 
## Call:
## lm(formula = hours ~ educ + age + couple + child, data = piaac)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -30.49 -19.06   0.03  16.49  80.83 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 11.98657    4.15323   2.886 0.003949 ** 
## educ         0.85299    0.23203   3.676 0.000244 ***
## age          0.04638    0.04340   1.068 0.285450    
## couple      -6.80806    1.34330  -5.068 4.45e-07 ***
## child        0.67291    0.44251   1.521 0.128525    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.63 on 1731 degrees of freedom
##   (969 observations deleted due to missingness)
## Multiple R-squared:  0.0216, Adjusted R-squared:  0.01934 
## F-statistic: 9.553 on 4 and 1731 DF,  p-value: 1.222e-07

トービット・モデル

トービット・モデルの推定には,censRegというパッケージのcensReg()関数が便利です.

library(censReg)
## 
## Please cite the 'censReg' package as:
## Henningsen, Arne (2017). censReg: Censored Regression (Tobit) Models. R package version 0.5. http://CRAN.R-Project.org/package=censReg.
## 
## If you have questions, suggestions, or comments regarding the 'censReg' package, please use a forum or 'tracker' at the R-Forge site of the 'sampleSelection' project:
## https://r-forge.r-project.org/projects/sampleselection/
tobit <- censReg(hours ~ educ + age + couple + child, data = piaac)

summary(tobit)
## 
## Call:
## censReg(formula = hours ~ educ + age + couple + child, data = piaac)
## 
## Observations:
##          Total  Left-censored     Uncensored Right-censored 
##           1736            608           1128              0 
## 
## Coefficients:
##             Estimate Std. error t value  Pr(> t)    
## (Intercept) -0.26880    6.31684  -0.043 0.966058    
## educ         1.29136    0.35170   3.672 0.000241 ***
## age          0.09362    0.06668   1.404 0.160319    
## couple      -9.43196    1.99461  -4.729 2.26e-06 ***
## child        0.99635    0.66882   1.490 0.136301    
## logSigma     3.28764    0.02316 141.963  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Newton-Raphson maximisation, 9 iterations
## Return code 1: gradient close to zero (gradtol)
## Log-likelihood: -5856.426 on 6 Df

実証例8.5 (ヘキット)

pp.386-388.

データの読み込み・加工

piaac_original <- read.csv("piaac.csv")

piaac <- subset(piaac_original, gender == "Female")
piaac$lwage <-  log(piaac$wage)
piaac$exp <- piaac$age - piaac$educ - 6
piaac$expsq <- (piaac$exp) ^ 2
piaac$selected <- 1 * (!is.na(piaac$lwage))

OLS

ols <- lm(lwage ~ educ + exp + expsq, data = piaac)

summary(ols)
## 
## Call:
## lm(formula = lwage ~ educ + exp + expsq, data = piaac)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.9565 -0.2838 -0.0544  0.2516  5.5254 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.8946054  0.1279650  46.064  < 2e-16 ***
## educ         0.0767333  0.0089127   8.609  < 2e-16 ***
## exp          0.0124602  0.0047378   2.630  0.00862 ** 
## expsq       -0.0002220  0.0001013  -2.191  0.02862 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.663 on 1583 degrees of freedom
##   (1118 observations deleted due to missingness)
## Multiple R-squared:  0.05641,    Adjusted R-squared:  0.05463 
## F-statistic: 31.55 on 3 and 1583 DF,  p-value: < 2.2e-16

ヘキット・モデル

ヘキット・モデルの推定は,sampleSelectionというパッケージのheckit()関数が便利です.

library(sampleSelection)

twostep <- heckit(selected ~ educ + exp + expsq + couple + child, 
                  lwage ~ educ + exp + expsq, 
                  data = piaac, 
                  method = "2step")

summary(twostep)
## --------------------------------------------
## Tobit 2 model (sample selection model)
## 2-step Heckman / heckit estimation
## 1747 observations (763 censored and 984 observed)
## 13 free parameters (df = 1735)
## Probit selection equation:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.1734568  0.2987755  -3.928 8.92e-05 ***
## educ         0.0032513  0.0171564   0.190   0.8497    
## exp          0.1332373  0.0144499   9.221  < 2e-16 ***
## expsq       -0.0023922  0.0002509  -9.533  < 2e-16 ***
## couple      -0.2876216  0.0938053  -3.066   0.0022 ** 
## child        0.0013419  0.0309743   0.043   0.9654    
## Outcome equation:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.7862247  0.6725489   8.603  < 2e-16 ***
## educ         0.0861687  0.0131580   6.549 7.62e-11 ***
## exp          0.0091285  0.0339531   0.269    0.788    
## expsq       -0.0001668  0.0006090  -0.274    0.784    
## Multiple R-Squared:0.0526,   Adjusted R-Squared:0.0488
##    Error terms:
##               Estimate Std. Error t value Pr(>|t|)
## invMillsRatio  0.02771    0.38986   0.071    0.943
## sigma          0.71505         NA      NA       NA
## rho            0.03875         NA      NA       NA
## --------------------------------------------
mle <- heckit(selected ~ educ + exp + expsq + couple + child, 
                  lwage ~ educ + exp + expsq, 
                  data = piaac, 
                  method = "ml")

summary(mle)
## --------------------------------------------
## Tobit 2 model (sample selection model)
## Maximum Likelihood estimation
## Newton-Raphson maximisation, 3 iterations
## Return code 8: successive function values within relative tolerance limit (reltol)
## Log-Likelihood: -2207.948 
## 1747 observations (763 censored and 984 observed)
## 12 free parameters (df = 1735)
## Probit selection equation:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.1749600  0.2991461  -3.928 8.91e-05 ***
## educ         0.0033490  0.0171841   0.195  0.84550    
## exp          0.1331523  0.0144494   9.215  < 2e-16 ***
## expsq       -0.0023909  0.0002509  -9.528  < 2e-16 ***
## couple      -0.2882325  0.0937756  -3.074  0.00215 ** 
## child        0.0022663  0.0312037   0.073  0.94211    
## Outcome equation:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.7679381  0.3473704  16.605  < 2e-16 ***
## educ         0.0861431  0.0131388   6.556 7.25e-11 ***
## exp          0.0100524  0.0175793   0.572    0.568    
## expsq       -0.0001834  0.0003119  -0.588    0.557    
##    Error terms:
##       Estimate Std. Error t value Pr(>|t|)    
## sigma  0.71536    0.01689  42.342   <2e-16 ***
## rho    0.05457    0.22029   0.248    0.804    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## --------------------------------------------