Change search
CiteExportLink to record
Permanent link

Direct link
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Uncertainty intervals for regression parameters with non-ignorable missingness in the outcome
Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
University of Perugia, Perugia, Italy.
Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
2015 (English)In: Statistical papers, ISSN 0932-5026, E-ISSN 1613-9798, Vol. 56, no 3, 829-847 p.Article in journal (Refereed) Published
Abstract [en]

When estimating regression models with missing outcomes, scientists usually have to rely either on a missing at random assumption (missing mechanism is independent from the outcome given the observed variables) or on exclusion restrictions (some of the covariates affecting the missingness mechanism do not affect the outcome). Both these hypotheses are controversial in applications since they are typically not testable from the data. The alternative, which we pursue here, is to derive identification sets (instead of point identification) for the parameters of interest when allowing for a missing not at random mechanism. The non-ignorability of this mechanism is quantified with a parameter. When the latter can be bounded with a priori information, a bounded identification set follows. Our approach allows the outcome to be continuous and unbounded and relax distributional assumptions. Estimation of the identification sets can be performed via ordinary least squares and sampling variability can be incorporated yielding uncertainty intervals achieving a coverage of at least (1-α) probability. Our work is motivated by a study on predictors of body mass index (BMI) change in middle age men allowing us to identify possible predictors of BMI change even when assuming little on the missing mechanism.

Place, publisher, year, edition, pages
Springer-Verlag New York, 2015. Vol. 56, no 3, 829-847 p.
Keyword [en]
Heckman model, informative dropout, selection models, sensitivity analysis, set identification, two stage least squares
National Category
Probability Theory and Statistics
Research subject
URN: urn:nbn:se:umu:diva-92843DOI: 10.1007/s00362-014-0610-xISI: 000358219900013Scopus ID: 2-s2.0-84937527000OAI: diva2:743926
Swedish Research Council
Available from: 2014-09-05 Created: 2014-09-05 Last updated: 2017-12-05Bibliographically approved
In thesis
1. Uncertainty intervals and sensitivity analysis for missing data
Open this publication in new window or tab >>Uncertainty intervals and sensitivity analysis for missing data
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

In this thesis we develop methods for dealing with missing data in a univariate response variable when estimating regression parameters. Missing outcome data is a problem in a number of applications, one of which is follow-up studies. In follow-up studies data is collected at two (or more) occasions, and it is common that only some of the initial participants return at the second occasion. This is the case in Paper II, where we investigate predictors of decline in self reported health in older populations in Sweden, the Netherlands and Italy. In that study, around 50% of the study participants drop out. It is common that researchers rely on the assumption that the missingness is independent of the outcome given some observed covariates. This assumption is called data missing at random (MAR) or ignorable missingness mechanism. However, MAR cannot be tested from the data, and if it does not hold, the estimators based on this assumption are biased. In the study of Paper II, we suspect that some of the individuals drop out due to bad health. If this is the case the data is not MAR. One alternative to MAR, which we pursue, is to incorporate the uncertainty due to missing data into interval estimates instead of point estimates and uncertainty intervals instead of confidence intervals. An uncertainty interval is the analog of a confidence interval but wider due to a relaxation of assumptions on the missing data. These intervals can be used to visualize the consequences deviations from MAR have on the conclusions of the study. That is, they can be used to perform a sensitivity analysis of MAR.

The thesis covers different types of linear regression. In Paper I and III we have a continuous outcome, in Paper II a binary outcome, and in Paper IV we allow for mixed effects with a continuous outcome. In Paper III we estimate the effect of a treatment, which can be seen as an example of missing outcome data.

Place, publisher, year, edition, pages
Umeå: Umeå Universitet, 2016. 13 p.
Statistical studies, ISSN 1100-8989 ; 50
missing data, missing not at random, non-ignorable, set identification, uncertainty intervals, sensitivity analysis, self reported health, average causal effect, average causal effect on the treated, mixed-effects models
National Category
Probability Theory and Statistics
Research subject
urn:nbn:se:umu:diva-127121 (URN)978-91-7601-555-1 (ISBN)
Public defence
2016-11-25, Hörsal E, Humanisthuset, Umeå Universitet, Umeå, 10:00 (English)
Available from: 2016-11-04 Created: 2016-10-31 Last updated: 2016-11-30Bibliographically approved

Open Access in DiVA

fulltext(826 kB)