Skip to contents

Fit Binary Logit Model by Maximum Likelihood

Usage

ml_logit(
  value,
  scale = NULL,
  weights = NULL,
  data,
  subset = NULL,
  noint_value = FALSE,
  constraints = NULL,
  start = NULL,
  method = "NR",
  control = NULL,
  ...
)

Arguments

value

Two-sided formula for the probability equation.

scale

Optional one-sided formula for heteroskedasticity.

weights

Optional weights variable. It can be either the name of the variable in data, or a vector with the weights.

data

Data frame.

subset

Optional subset expression. Only observations for which this expression evaluates to TRUE are used in the estimation. This can be a logical vector or an expression (e.g. subset = age > 30).

noint_value

Logical. Should the value equation omit the intercept? Default is FALSE.

constraints

Optional constraints on the parameters. Can be a character vector of string constraints, a named list of string constraints, or a raw maxLik constraints list. See Details.

start

Numeric vector of starting values for the coefficients. Required if constraints are being supplied. If supplied without constraints they will be ignored. See Details.

method

A string with the method used for optimization. See maxLik for options, and see Details.

control

A list of control parameters passed to maxLik. If NULL (default), a sensible set of options is chosen automatically depending on whether constraints are used. See maxControl.

...

Additional arguments passed to maxLik.

Value

An object of class ml_logit that extends mlmodel.

Details

Important: Do not use the usual R syntax to remove the intercept in the formula (- 1 or + 0) for the value equation. Use the dedicated argument noint_value instead. For the scale equation (if modeling heteroskedasticity), the formula must contain only the predictors (right-hand side).

ml_logit() handles both strictly binary (0/1), and fractional response (0 <= y <= 1) outcomes. When using fractional responses, it is recommended to use robust standard errors (vcov.type = "robust").

Coefficient names in the fitted object use the prefixes value:: and scale:: (when heteroskedasticity is modeled) to clearly identify which equation each coefficient belongs to.

Either inequality or equality linear constraints are accepted, but not both. A constraint cannot have a linear combination of more than two coefficients.

Important: When constraints are supplied, start cannot be NULL. You must provide initial values that yield a feasible log-likelihood. If no constraints are used, any supplied start is ignored.

When constraints are used, ml_logit automatically chooses method:

  • Equality constraints => Nelder-Mead ("NM")

  • Inequality constraints => BFGS ("BFGS")

In these cases your supplied method argument (if any) is ignored.

See also

Author

Alfonso Sanchez-Penalver

Examples


# Homoskedastic binary logit model
data(smoke)
smoke$smokes <- smoke$cigs > 0

fit_logit <- ml_logit(smokes ~ cigpric + income + age, 
                      data = smoke)

summary(fit_logit, vcov.type = "robust")
#> 
#> Maximum Likelihood Model
#>  Type: Homoskedastic Binary Logit 
#> ---------------------------------------
#> Call:
#> ml_logit(value = smokes ~ cigpric + income + age, data = smoke)
#> 
#> Log-Likelihood: -533.27 
#> 
#> Wald significance tests:
#>  all: Chisq(3) = 9.010, Pr(>Chisq) = 0.0292
#> 
#> Variance type: Robust
#> ---------------------------------------
#>                         Estimate Std. Error z value Pr(>|z|)    
#> Value (smokes):  
#>   value::(Intercept)  0.6885299  0.9527423   0.723  0.46988   
#>   value::cigpric     -0.0104209  0.0154879  -0.673  0.50105   
#>   value::income      -0.0000019  0.0000080  -0.243  0.80786   
#>   value::age         -0.0121257  0.0041786  -2.902  0.00371 **
#> ---------------------------------------
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> ---
#> Number of observations:807 (Successes: 310, Failures: 497)
#> Pseudo R-squared - Cor.Sq.: 0.009815 McFadden: 0.007888
#> AIC: 1074.53  BIC: 1093.31 

# Heteroskedastic binary logit model
fit_logit_het <- ml_logit(smokes ~ cigpric + income + age,
                          scale = ~ educ,
                          data = smoke)
#>  Improving initial values by scaling (factor = 0.5).
#>  Initial log-likelihood: -555.841
#>  Final scaled log-likelihood: -537.617

summary(fit_logit_het, vcov.type = "robust")
#> 
#> Maximum Likelihood Model
#>  Type: Heteroskedastic Binary Logit 
#> ---------------------------------------
#> Call:
#> ml_logit(value = smokes ~ cigpric + income + age, scale = ~educ, 
#>     data = smoke)
#> 
#> Log-Likelihood: -522.93 
#> 
#> Wald significance tests:
#>  all: Chisq(4) = 215.327, Pr(>Chisq) = < 1e-8
#>  Mean: Chisq(3) = 0.761, Pr(>Chisq) = 0.8588
#>  Scale: Chisq(1) = 13.936, Pr(>Chisq) = 0.0002
#> 
#> Variance type: Robust
#> ---------------------------------------
#>                         Estimate Std. Error z value Pr(>|z|)     
#> Value (smokes):  
#>   value::(Intercept) -6.334e-03  3.278e-02  -0.193 0.846792    
#>   value::cigpric      8.513e-05  5.654e-04   0.151 0.880325    
#>   value::income      -9.200e-09  2.932e-07  -0.031 0.974960    
#>   value::age         -4.163e-04  4.950e-04  -0.841 0.400312    
#> Scale (log(sigma)):  
#>   scale::educ        -2.439e-01  6.534e-02  -3.733 0.000189 ***
#> ---------------------------------------
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> ---
#> Number of observations:807 (Successes: 310, Failures: 497)
#> Pseudo R-squared - Cor.Sq.: 0.03261 McFadden: 0.02712
#> AIC: 1055.86  BIC: 1079.32 
#> 
#> Distribution of Std. Deviation (sigma):
#> ---------------------------------------
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   0.012   0.037   0.054   0.063   0.087   0.231 
#> 

# Different predict types
head(predict(fit_logit, type = "response")$fit)   # Predicted probability
#> [1] 0.3684765 0.3874225 0.3375703 0.4213065 0.4590092 0.2718363
head(predict(fit_logit, type = "link")$fit)       # Linear predictor (log-odds)
#> [1] -0.5387582 -0.4581597 -0.6741409 -0.3174125 -0.1643319 -0.9853257

# Fitted values and residuals
head(fitted(fit_logit))
#> [1] 0.3684765 0.3874225 0.3375703 0.4213065 0.4590092 0.2718363
head(residuals(fit_logit))
#> [1] -0.3684765 -0.3874225  0.6624297 -0.4213065 -0.4590092 -0.2718363
head(residuals(fit_logit, type = "pearson"))
#> [1] -0.7638536 -0.7952650  1.4008377 -0.8532470 -0.9211191 -0.6109972