Change search
ReferencesLink to record
Permanent link

Direct link
Uncertainty intervals and sensitivity analysis for missing data
Umeå University, Faculty of Social Sciences, Umeå School of Business and Economics (USBE), Statistics.
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

In this thesis we develop methods for dealing with missing data in a univariate response variable when estimating regression parameters. Missing outcome data is a problem in a number of applications, one of which is follow-up studies. In follow-up studies data is collected at two (or more) occasions, and it is common that only some of the initial participants return at the second occasion. This is the case in Paper II, where we investigate predictors of decline in self reported health in older populations in Sweden, the Netherlands and Italy. In that study, around 50% of the study participants drop out. It is common that researchers rely on the assumption that the missingness is independent of the outcome given some observed covariates. This assumption is called data missing at random (MAR) or ignorable missingness mechanism. However, MAR cannot be tested from the data, and if it does not hold, the estimators based on this assumption are biased. In the study of Paper II, we suspect that some of the individuals drop out due to bad health. If this is the case the data is not MAR. One alternative to MAR, which we pursue, is to incorporate the uncertainty due to missing data into interval estimates instead of point estimates and uncertainty intervals instead of confidence intervals. An uncertainty interval is the analog of a confidence interval but wider due to a relaxation of assumptions on the missing data. These intervals can be used to visualize the consequences deviations from MAR have on the conclusions of the study. That is, they can be used to perform a sensitivity analysis of MAR.

The thesis covers different types of linear regression. In Paper I and III we have a continuous outcome, in Paper II a binary outcome, and in Paper IV we allow for mixed effects with a continuous outcome. In Paper III we estimate the effect of a treatment, which can be seen as an example of missing outcome data.

Place, publisher, year, edition, pages
Umeå: Umeå Universitet , 2016. , 13 p.
Series
Statistical studies, ISSN 1100-8989 ; 50
Keyword [en]
missing data, missing not at random, non-ignorable, set identification, uncertainty intervals, sensitivity analysis, self reported health, average causal effect, average causal effect on the treated, mixed-effects models
National Category
Probability Theory and Statistics
Research subject
Statistics
Identifiers
URN: urn:nbn:se:umu:diva-127121ISBN: 978-91-7601-555-1OAI: oai:DiVA.org:umu-127121DiVA: diva2:1043969
Public defence
2016-11-25, Hörsal E, Humanisthuset, Umeå Universitet, Umeå, 10:00 (English)
Opponent
Supervisors
Available from: 2016-11-04 Created: 2016-10-31 Last updated: 2016-11-30Bibliographically approved
List of papers
1. Uncertainty intervals for regression parameters with non-ignorable missingness in the outcome
Open this publication in new window or tab >>Uncertainty intervals for regression parameters with non-ignorable missingness in the outcome
2015 (English)In: Statistical papers, ISSN 0932-5026, E-ISSN 1613-9798, Vol. 56, no 3, 829-847 p.Article in journal (Refereed) Published
Abstract [en]

When estimating regression models with missing outcomes, scientists usually have to rely either on a missing at random assumption (missing mechanism is independent from the outcome given the observed variables) or on exclusion restrictions (some of the covariates affecting the missingness mechanism do not affect the outcome). Both these hypotheses are controversial in applications since they are typically not testable from the data. The alternative, which we pursue here, is to derive identification sets (instead of point identification) for the parameters of interest when allowing for a missing not at random mechanism. The non-ignorability of this mechanism is quantified with a parameter. When the latter can be bounded with a priori information, a bounded identification set follows. Our approach allows the outcome to be continuous and unbounded and relax distributional assumptions. Estimation of the identification sets can be performed via ordinary least squares and sampling variability can be incorporated yielding uncertainty intervals achieving a coverage of at least (1-α) probability. Our work is motivated by a study on predictors of body mass index (BMI) change in middle age men allowing us to identify possible predictors of BMI change even when assuming little on the missing mechanism.

Place, publisher, year, edition, pages
Springer-Verlag New York, 2015
Keyword
Heckman model, informative dropout, selection models, sensitivity analysis, set identification, two stage least squares
National Category
Probability Theory and Statistics
Research subject
Statistics
Identifiers
urn:nbn:se:umu:diva-92843 (URN)10.1007/s00362-014-0610-x (DOI)000358219900013 ()2-s2.0-84937527000 (ScopusID)
Funder
Swedish Research Council
Available from: 2014-09-05 Created: 2014-09-05 Last updated: 2016-11-02Bibliographically approved
2. Predictors of decline in self-reported health: addressing non-ignorable dropout in longitudinal studies of ageing
Open this publication in new window or tab >>Predictors of decline in self-reported health: addressing non-ignorable dropout in longitudinal studies of ageing
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Predictors of decline in health in older populations have been investigated in multiple studies before. Most longitudinal studies of ageing, however, assume that dropout at follow-up is ignorable (missing at random) given a set of observed characteristics at baseline. The objective of this study was to address non-ignorable dropout in investigating predictors of declining self-reported health in an older population (50 years or older) in Sweden, the Netherlands, and Italy. We used the SHARE panel survey, and since only 2893 out of the original 5653 participants in the survey 2004 were followed-up in 2013, we studied whether the results were sensitive to the high dropout rate. When taking dropout into account, we found that age and a greater number of chronic diseases were positively associated with a decline in self-reported health in the three countries studies here. Maximum grip strength was associated with decline in self-reported health in Sweden and Italy, and higher body mass index and self-reported limitations in normal activities due to health problems was associated with decline in self-reported health in Sweden. The findings, although not surprising, contribute to the literature in understanding the robustness of longitudinal study results to non-ignorable dropout while considering three different populations in Europe.

Keyword
Longitudinal studies, Dropout, Sensitivity analysis, Chronic disease, SHARE
National Category
Probability Theory and Statistics
Research subject
Statistics
Identifiers
urn:nbn:se:umu:diva-127118 (URN)
Funder
Forte, Swedish Research Council for Health, Working Life and Welfare, 2013-2506
Available from: 2016-11-01 Created: 2016-10-31 Last updated: 2016-11-02
3. Bounds and sensitivity analysis when estimating average treatment effects with imputation and double robust estimators
Open this publication in new window or tab >>Bounds and sensitivity analysis when estimating average treatment effects with imputation and double robust estimators
(English)Manuscript (preprint) (Other academic)
Abstract [en]

When estimating average causal effects of treatments with observational data, scientists often rely on the assumption of unconfoundedness. We propose a sensitivity analysis for imputation estimators and doubly robust estimators, based on bounds (defining an identification interval) for the causal effect of interest, which allow for unobserved confounders. The bounds are derived from the bias of the estimators, expressed as a function of a sensitivity parameter. We describe how such bounds can take into account sampling variation, thereby yielding an uncertainty interval. We are also able to contrast the size of potential bias due to violation of the unconfoundedness assumption, to the misspecification of the models used to explain outcome with the observed covariates. While the latter bias can in principle be made arbitrarily small with increasing sample size (by increasing the flexibility of the models used), the bias due to unobserved confounding does not disappear with increasing sample size. Through numerical experiments we illustrate the relative size of the biases due to unobserved confounders and model misspecification, as well as the empirical coverage of the uncertainty interval on which the proposed sensitivity analysis is based.

National Category
Probability Theory and Statistics
Research subject
Statistics
Identifiers
urn:nbn:se:umu:diva-127119 (URN)
Available from: 2016-11-01 Created: 2016-10-31 Last updated: 2016-11-02
4. Uncertainty intervals for mixed effects models with non-ignorable missingness
Open this publication in new window or tab >>Uncertainty intervals for mixed effects models with non-ignorable missingness
(English)Manuscript (preprint) (Other academic)
Abstract [en]

When estimating regression models with missing outcomes, scientists usually rely on strong untestable assumptions such as missing at random, exclusion restrictions, or distributional assumptions on the missing data, in order to point identify the parameters of interest. An alternative is to estimate identification intervals under milder assumptions. In this paper, we use a sensitivity parameter, which quantifies the non-ignorability of the missingness mechanism, in order to estimate identification intervals for regression parameters in linear mixed effects models. By taking sampling variability into account, we obtain uncertainty intervals which can be used for a sensitivity analysis of the missing at random assumption or to draw conclusions from the data without making unnecessarily strong assumptions.

National Category
Probability Theory and Statistics
Research subject
Statistics
Identifiers
urn:nbn:se:umu:diva-127120 (URN)
Available from: 2016-11-01 Created: 2016-10-31 Last updated: 2016-11-01

Open Access in DiVA

Fulltext(216 kB)35 downloads
File information
File name FULLTEXT01.pdfFile size 216 kBChecksum SHA-512
7d1848a892b20f4f0fdd2f97498f4f90e0024c1f2e03b8735448750f4786b3552ad132b350f727049b6291f206aeec92e3bc9fa3a55e12a9c94e13ec32eb3004
Type fulltextMimetype application/pdf
Spikblad(73 kB)15 downloads
File information
File name FULLTEXT02.pdfFile size 73 kBChecksum SHA-512
a3f279aca4401d0147419fe0fa68aa4ebcad3d4001538299c68d141eb38f6834c790a1580c3b06893ecbe7a6a706075c23cf8c1acf7df7c0ee126ffd9cf579a9
Type spikbladMimetype application/pdf

Search in DiVA

By author/editor
Genbäck, Minna
By organisation
Statistics
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 50 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 263 hits
ReferencesLink to record
Permanent link

Direct link