Asked by Statistics

What analysis should I use if I am measuring levels of depression, anxiety, and substance use before and after HIV disclosure? Within subjects design. Continuous variables. Data is not normally distributed. Variables may be influencing each other

All Answers 1

Answered by GPT-5 mini AI
Short answer
- If you only have two time points (pre vs post) and just want to test whether each measure changed: use a paired nonparametric test — Wilcoxon signed-rank test for each outcome (report effect sizes).
- If you want to model the three outcomes jointly (they are correlated) or to adjust for covariates/missing data: use a mixed model or GEE on the repeated (within-subject) data — either a multivariate mixed-effects model or separate mixed models with robust SEs. These approaches handle correlated observations and can be made robust to non‑normality (via transformation, alternative families or robust/bootstrapped inference).
- If you want to model how the outcomes influence each other over time: use longitudinal structural equation models (latent change score or cross-lagged panel models) — but these require ≥3 waves (or careful interpretation with 2 waves) and larger samples.

Guidance and practical options

1) Simple pre vs post, each outcome separately (two time points)
- Test: Wilcoxon signed-rank test (paired). Use when the continuous variable is non‑normal and you only need a univariate pre/post comparison.
- Report: median (IQR) pre/post, Wilcoxon statistic, p-value and an effect size (rank-biserial or r).
- R: wilcox.test(x_pre, x_post, paired=TRUE)

2) Repeated-measures modelling that handles correlations and covariates
- Linear mixed-effects model (LMM): outcome ~ time + (1 | subject) (and add covariates). LMMs handle correlated repeated measures and unbalanced/missing data. If residuals are non‑normal consider: transforming the outcome (log, square-root), using robust (sandwich) standard errors, or bootstrap CIs.
- Generalized estimating equations (GEE): use if you care about population-average effects and want robust inference to misspecified correlation structure (geepack in R).
- If outcomes are skewed, consider GLMMs with an appropriate family (e.g., Gamma + log link) or model transformed scores.

3) Joint modelling of multiple correlated outcomes
- Multivariate LMM (joint mixed model): model depression, anxiety and substance use together to account for between-outcome correlations and test overall time effect. This is preferable to doing three separate tests if you want a single multivariate test and to model covariance between outcomes.
- Practical strategies: (a) reshape data to long format and include outcome-type as factor and time*outcome interaction in a mixed model, or (b) use packages that fit multivariate mixed models (e.g., mvtnorm-based approaches, or R packages like brms or MCMCglmm for a Bayesian joint model).
- If you cannot meet parametric assumptions, use robust or bootstrap inference.

4) Modeling reciprocal influences among outcomes (causal/temporal relations)
- If you have ≥3 time points: use cross-lagged panel models or latent change score models in structural equation modeling (lavaan in R, Mplus). These model how one variable at prior time predicts change in another variable later.
- If you have only 2 time points: you can model change scores (post − pre) and regress change in one outcome on baseline values of another (e.g., ΔDepression ~ baseline substance use + baseline depression), but causal claims are limited.
- SEM can incorporate measurement error (if you have multi-item scales) and test mediation.

Other practical points
- Always inspect distributions and residuals; consider transformations if appropriate.
- Correct for multiple testing if running separate tests (e.g., Holm or FDR).
- Use effect sizes and confidence intervals, not just p-values.
- Missing data: mixed models/GEE are more forgiving (assume MAR). Consider multiple imputation if data are not MAR.
- If sample size is small and assumptions are dubious, consider permutation tests or bootstrap CIs for paired differences.

Suggested workflow example
1. Plot pre/post distributions and paired differences for each measure.
2. If only two time points and no need to model interdependence: run Wilcoxon signed-rank for each variable.
3. If modeling jointly/adjusting covariates: fit a mixed model (outcome ~ time + covariates + (1|subject)). If you have 3 outcomes and want a joint test, fit a multivariate mixed model or a long-format mixed model with outcome* time interaction.
4. If interested in reciprocal influences and you have ≥3 waves: specify a cross-lagged SEM or latent change model.

If you tell me:
- how many time points you have,
- sample size,
- whether outcomes are scale scores (and ranges),
- whether you want joint tests or causal directionality,

I can give a tailored analysis plan and example code (R).