Scoping discussion #24

kkmann · 2023-04-13T21:10:03Z

kkmann
Apr 13, 2023
Maintainer

I figured it might be good to align on scoping async. Here are a few thoughts - curious to hear your thoughts.

Bayesian analyses generally only make sense when using at least partially informative priors - otherwise it more or less comes down to a matter of taste or wrong interpretations of what constitutes a non-informative prior.
A major feature of an opinionated Bayesian MMRM package would be the ability to easily and transparently specify informative priors - either as robust MAP from historical data or via expert elicitation or a combination of both.
Specifying and interpreting informative priors in complex models tends to be a bit of a hassle if paramters are difficult to interpret directly.
One way around this is to specify priors (multivariate normal?) on the per-group marginal means. For ~ AVISIT*TRT01P models (fully saturated without covariates) there is a linear bijection between coefficients and marginal means - this can be used to pull back a prior on the marginal means to the parameter scale. Sensible priors will be correlated on the parameter scale, even if they are not on the marginal means scale. Some degree of correlation on the marginal means scale is, however, likely as well.
An advantage of 4) is that the prior is independent of the model.
The pull-back in 4) does not work if we add covariates - then there are more coefficients/paramters than marginal means and there is no bijection anymore. One would have to think about how to address this. One could resolve this is by integrating out the covariate distributions or using typical values. Priors for the covariate coefficients would still be required.
How would one incorporate prior information about the covariance matrices - Inverse-Wishart?
Multivariate dependent priors are a bit of a hassle in brms but possible via the stanvar argument to brm.
As an alternative to fully-informative priors, one could put independent informative priors on individual coefficients (e.g. effect size at target timepoint). This has limited applicability (pediatirc extrapolation) and makes it hard to borrow properly from the control group (posterior correlation not respected).
When historical data (ILD) is available, the informative prior can be constructed programmatically fitting a compatible model.

In a nutshell, the key problem to me seems to enable easy and transparent prior specification in a number of different borrowing use-cases (tbd). This would probably only be feasible to implement when restricting the number of supported models to an absolute minimum. For anything out of scope working directly with brms or rstan are still options.

Once scoping etc is clear we could think about hard coding the models to get rid of the brms dependency and facilitate parallel processing.

wlandau · 2023-04-14T20:38:55Z

wlandau
Apr 14, 2023
Maintainer

Thanks for laying all this out, Kevin.

Bayesian analyses generally only make sense when using at least partially informative priors - otherwise it more or less comes down to a matter of taste or wrong interpretations of what constitutes a non-informative prior.

At Lilly, we routinely churn out non-informative prior MMRMs for phase 2 chronic pain and neurodegeneration studies. The model is very often the primary analysis the absence of informative priors. Even without informative priors, Bayesian models help us make probabilistic statements about effect size and about transformed parameters. From where I am coming from, I foresee strong and enthusiastic uptake.

A major feature of an opinionated Bayesian MMRM package would be the ability to easily and transparently specify informative priors - either as robust MAP from historical data or via expert elicitation or a combination of both.

Absolutely! And I think we can condition on our use case to simplify prior specification relative to what brms offers.

Specifying and interpreting informative priors in complex models tends to be a bit of a hassle if paramters are difficult to interpret directly.

Currently for other projects, I get around this with a no-intercept cell-means parameterization at #4 (comment) with non-cell-means covariates centered to preserve the reference level. That's another reason I think it would add value to have both an informative-prior model and a non-informative-prior model. In the former case, our fixed effects parameterization may be restricted to something people may not be accustomed to seeing. But in the latter case, we can use the usual treatment effect parameterization of CHG ~ BASE + BASE:AVISIT + AVISIT + TRT01P + AVISIT:TRT01P + covariates which already has wide acceptance in our industry.

One way around this is to specify priors (multivariate normal?) on the per-group marginal means. For ~ AVISIT*TRT01P models (fully saturated without covariates) there is a linear bijection between coefficients and marginal means - this can be used to pull back a prior on the marginal means to the parameter scale. Sensible priors will be correlated on the parameter scale, even if they are not on the marginal means scale. Some degree of correlation on the marginal means scale is, however, likely as well.

For a ~ AVISIT*TRT01P model and user-defined inputs for ~ 0 + AVISIT:TRT01P priors (marginal means), I think there would be enormous value in working out exactly how to implement that bijective transformation in the general case. We could stick with a ~ 0 + AVISIT:TRT01P model until we figure that part out.

An advantage of 4) is that the prior is independent of the model.

Which would be game-changer.

The pull-back in 4) does not work if we add covariates - then there are more coefficients/paramters than marginal means and there is no bijection anymore. One would have to think about how to address this. One could resolve this is by integrating out the covariate distributions or using typical values. Priors for the covariate coefficients would still be required.

For BASE and baseline covariate fixed effects, we could restrict everything to be additive, construct the model matrix, center the columns of that model matrix, and then treat the columns as covariates. This process shifts the reference level to the grand mean and allows us to proceed with the ~ AVISIT*TRT01P |-> ~ 0 + AVISIT:TRT01P bijection as if we had no covariates at all. Happy to go further in depth if this is not clear, but I have used covariates in this way for MMRMs and MMRM-like models in practice, including https://github.com/wlandau/historicalborrowlong.

How would one incorporate prior information about the covariance matrices - Inverse-Wishart?

I was hoping to go with LKJ to avoid the biases if IW, but I admit IW does have a "scale matrix" that could be useful for historical borrowing. Would it be useful to borrow information about the distribution of the residuals? I am not sure, but I think it would be worth having this discussion at a philosophical level first.

Multivariate dependent priors are a bit of a hassle in brms but possible via the stanvar argument to brm.

Joint priors for multiple parameters with nonzero correlation among scalar components?

As an alternative to fully-informative priors, one could put independent informative priors on individual coefficients (e.g. effect size at target timepoint). This has limited applicability (pediatirc extrapolation) and makes it hard to borrow properly from the control group (posterior correlation not respected).

That's an important use case. So then in the absence of the bijection from (4), we might have at least 3 different models:

non-informative prior MMRM
~ 0 + TRT01P:AVISIT model with informative priors on TRT01P:AVISIT.
~ TRT01P + TRT01P:AVISIT with a reference level at baseline and the control group, with informative priors on TRT01P:AVISIT.

0 replies

wlandau · 2023-04-14T20:41:47Z

wlandau
Apr 14, 2023
Maintainer

kkmann
Apr 15, 2023
Maintainer Author

@wlandau let's stew a bit more on the parametrization independent informative priors. I feel this would be really worth spending some time on - everything else will be very opinionated and could be difficult to maintain. I am quite confident that this should be possible (again looking at it more from a GP/functional data perspective we just want to specify functional priors on a grid of visits). This also could be extended to synthesis of data at different timepoints which will be difficult with parameterization specific priors. Happy to discuss this more in a call - an I need to better understand you centering approach - sounds like a way of integrating out covariates - which is excatly what we need (or define mmeans priors for typical sets like emmeans).

0 replies

kkmann · 2023-04-15T11:29:40Z

kkmann
Apr 15, 2023
Maintainer Author

Re: non-informative - I am sure people do it but to me there is no direct benefit of exchanging a REML MMRM with a non-inf Bayesian one - I could just as well do parametric bootstrap from my REML fit and interprete that as non-inf posterior sample. Given the complexities of implementing Bayesian analyses this is rather nice to have from a philosophical consistency perspective (and ofc we need to have non-inf model for generating MAP priors anyhow).

Something that is difficult to justify (not do!) in a frequentist framework is regularization (e.g. using MAP priors) which can stabilize estimates etc. esp in earlier phases. One could ofc use a meta-analytic approach alltogehter (Baayeisan of freq) but that has operational disadvantages (complex model with potential fit problems at the end of the trial, availability of data etc.). It seems a lot more robust to derive the MAP prior up front and then fit an informative Bayesian model. Hence my strong interest in how informative priors fit in the mid-term scope.

0 replies

chstock · 2023-04-21T15:01:21Z

chstock
Apr 21, 2023
Maintainer

Thanks both. This is an interesting and very important discussion. It also seems key to me too how to handle the incorporation of prior knowledge, as this ultimately warrants the package and will likely define its uptake and success.

Some more thoughts:

I can well imagine that it is possible to identify a few "target" MMRM specifications.

A perspective (perhaps starting point) that may also be helpful when thinking about informative priors in a Bayesian MMRM package is to consider the different types of prior evidence that users might want to incorporate - and their respective practical relevance. This includes the potential synthesis that could be required when the evidence comes from multiple sources.

Very broadly, I could think of

individual patient data (IPD) or data in aggregated form (marginal means),
data from single or multiple trials (the latter requiring synthesis),
data from single or multiple arms,
empirical data or expert opinion,

and different combinations thereof.

(More as a side note here: one complexity that came to my mind is that even in the comfortable situation where IPD from multiple trials are available in-house, the visits in these trials may be differently structured. This also calls for a more "marginal" view of the prior information.)

While I like this "prior evidence"-perspective, since it starts from the need for a Bayesian MMRM package and the gaps it could fill in practice, it is perhaps not smart to prioritize it from a software development point of view now. We do not seem to be in a situation yet where we regularly run Bayesian MMRMs (with or without informative priors) using different, still somewhat insufficient solutions and therefore need a new package; thus would have a good understanding of the scope and the prior evidence we would like to consider. This might be slightly different across companies though. It seems, we may want to refine and adapt the scope/ functionality later. There may thus be merit in a still rather flexible, generic implementation first.

Moving forward with package development (where you both are certainly more experienced), it would seem natural to me to first address the noninformative prior case for a core set of MMRM variants (as we do, in my understanding), and then to consider the use of informative priors on all (or almost all) coefficients of such models, more or less ignoring how they could sensibly be derived. This may then be a flexible, generic and, in my view, already very valuable and useful implementation that also serves as a good "proof of principle". If there is a very clear understanding of a (frequently occuring) use case, we could immediately tailor the implementation accordingly, if needed, of course. However, generally, I would, as a later or (perhaps better) parallel workstream, also like if we systematically think about different types of prior evidence (data structures, relevance, complexity of prior derivation) first and then, based on priority, try to bring this together with the existing "basic" implementation, if feasible, or to further develop this implementation.

0 replies

wlandau · 2023-04-21T18:12:42Z

wlandau
Apr 21, 2023
Maintainer

Thanks for you thoughts, Christian! I agree that there is value in finishing an implementation of the current brms-based flexible approach without worrying too much about the construction of priors at first. Even if it is not the final solution, it will at least allow us to compare and to explore potential solutions and directions empirically.

Very broadly, I could think of

individual patient data (IPD) or data in aggregated form (marginal means),

data from single or multiple trials (the latter requiring synthesis),

data from single or multiple arms,

empirical data or expert opinion,

I think these are great scenarios to consider for MMRMs. It might also be worth distinguishing between a master-protocol-like scenario where different studies are very similar, versus a pediatric-trial-like scenario where the borrowed data is not likely to be exchangeable with the current data (e.g. borrowing data from adults to analyze pediatric data).

0 replies

wlandau · 2023-04-21T18:42:12Z

wlandau
Apr 21, 2023
Maintainer

@wlandau let's stew a bit more on the parametrization independent informative priors. I feel this would be really worth spending some time on - everything else will be very opinionated and could be difficult to maintain.

Yes, I am interested and eager to linger on this problem. It seems to be a major crux, and if we solve it, we will immediately be able to accommodate the scenarios @chstock mentioned through prior specification alone.

The more I think about it, the more I think we can tackle parameterization-independent informative priors in a custom Stan model using the approach I proposed in #4 (comment). To expand on this proposal, consider a simplified model of the form:

y ~ Normal(beta0 + beta1 * x, 1)

where y is the response vector, beta0 is the intercept, beta1 is the treatment effect, and x is 0 for control and 1 for active treatment. In Stan code:

data {
  int<lower=0> n;
  array[n] real y;
  array[n] int<lower=0,upper=1> x;
}
parameters {
  real beta0;
  real beta1;
}

The marginal mean of each treatment group is a transformed parameter:

transformed parameters {
  real marginal_mean_placebo = beta0;
  real marginal_mean_treatment = beta0 + beta1;
}

We want to put informative priors on the interpretable marginal group means instead of the non-interpretable parameters beta0 and beta1. Stan lets us do this:

model {
  y ~ normal(beta0 + beta1 * x, 1);
  marginal_mean_placebo ~ normal(1.27, 5.33);
  marginal_mean_treatment ~ normal(0.33, 8.12);
}

Only two things are missing from this simplified model:

We need to manually derive a Jacobian adjustment because we are putting the priors on transformed parameters rather than actual parameters. I have not started seriously thinking about this derivation yet, but our change of variables is always linear for MMRM, so it does not seem hard to do.
IIRC newer versions of Stan throw warnings if priors are placed on transformed parameters. So we will need to either move the declarations of marginal_mean_placebo and marginal_mean_treatment to the model block or use the target+= notation to fool the linter.

0 replies

wlandau · 2023-04-21T18:45:41Z

wlandau
Apr 21, 2023
Maintainer

I am quite confident that this should be possible (again looking at it more from a GP/functional data perspective we just want to specify functional priors on a grid of visits). This also could be extended to synthesis of data at different timepoints which will be difficult with parameterization specific priors. Happy to discuss this more in a call

Is your proposed functional prior approach the same as what I sketched above? If it is, then it seems like the idea of arbitrary functions for transformations of variables might eventually get us to the point where we can consider non-linear link functions and tackle GLMMs.

0 replies

wlandau · 2023-04-21T19:00:16Z

wlandau
Apr 21, 2023
Maintainer

and I need to better understand you centering approach - sounds like a way of integrating out covariates - which is excatly what we need (or define mmeans priors for typical sets like emmeans).

I do not yet fully understand the specific algorithm emmeans uses, but here is what I mean by "centering". Let's start with a small dataset:

library(tibble)
data <- tibble(
  response = seq(8, 3),
  group = rep(c("drug", "placebo"), each = 3),
  age = c(40, 50, 60, 40, 50, 60)
)
data
#> # A tibble: 6 × 3
#>   response group     age
#>      <dbl> <chr>   <dbl>
#> 1        8 drug       40
#> 2        7 drug       50
#> 3        6 drug       60
#> 4        5 placebo    40
#> 5        4 placebo    50
#> 6        3 placebo    60

A simple cell means model gives coefficients that agree with the observed group means.

coef(lm(response ~ 0 + group, data = data))
#>    groupdrug groupplacebo 
#>            7            4

But if we naively adjust for age, we throw off the reference level, and the other model coefficients are no longer interpretable as group means.

coef(lm(response ~ 0 + group + age, data = data))
#>    groupdrug groupplacebo          age 
#>         12.0          9.0         -0.1

A simple solution is to subtract age by its mean in the data. Then our reference level returns to the grand mean and we can interpret model coefficients on drug and placebo as group means.

data$age <- data$age - mean(data$age)
coef(lm(response ~ 0 + group + age, data = data))
#>    groupdrug groupplacebo          age 
#>          7.0          4.0         -0.1

I use the same technique on categorical covariates: first I turn them into sets of binary columns to represent the absence or presence of a level, then center those binary columns. The result is analogous.

Does that make sense?

0 replies

wlandau · 2023-04-26T13:58:32Z

wlandau
Apr 26, 2023
Maintainer

Insights from @chstock on this today:

For Scoping discussion #10 (comment), it would be easier to justify if we published a methods paper. I could definitely get behind that.
For Scoping discussion #10 (comment), centering on the sample means could raise questions about external validity. Other approaches without centering are difficult.

0 replies

wlandau · 2023-05-17T14:59:16Z

wlandau
May 17, 2023
Maintainer

I think this thread belongs as a discussion rather than an issue. I will convert it.

0 replies

wlandau · 2023-05-17T15:58:11Z

wlandau
May 17, 2023
Maintainer

And it would be great to move discussions of the prior specification interface to #4.

0 replies

wlandau · 2024-04-04T21:19:31Z

wlandau
Apr 4, 2024
Maintainer

I think we are aligned on scope now.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scoping discussion #24

{{title}}

Replies: 13 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Scoping discussion #24

kkmann Apr 13, 2023 Maintainer

Replies: 13 comments

wlandau Apr 14, 2023 Maintainer

wlandau Apr 14, 2023 Maintainer

kkmann Apr 15, 2023 Maintainer Author

kkmann Apr 15, 2023 Maintainer Author

chstock Apr 21, 2023 Maintainer

wlandau Apr 21, 2023 Maintainer

wlandau Apr 21, 2023 Maintainer

wlandau Apr 21, 2023 Maintainer

wlandau Apr 21, 2023 Maintainer

wlandau Apr 26, 2023 Maintainer

wlandau May 17, 2023 Maintainer

wlandau May 17, 2023 Maintainer

wlandau Apr 4, 2024 Maintainer

kkmann
Apr 13, 2023
Maintainer

wlandau
Apr 14, 2023
Maintainer

wlandau
Apr 14, 2023
Maintainer

kkmann
Apr 15, 2023
Maintainer Author

kkmann
Apr 15, 2023
Maintainer Author

chstock
Apr 21, 2023
Maintainer

wlandau
Apr 21, 2023
Maintainer

wlandau
Apr 21, 2023
Maintainer

wlandau
Apr 21, 2023
Maintainer

wlandau
Apr 21, 2023
Maintainer

wlandau
Apr 26, 2023
Maintainer

wlandau
May 17, 2023
Maintainer

wlandau
May 17, 2023
Maintainer

wlandau
Apr 4, 2024
Maintainer