Power Analysis by Data Simulation in R - Part IV

Click HERE to download the .Rmd file

This blog is also available on R-Bloggers

  • Power Analysis by simulation in R for really any design - Part IV

    This is the final part of my Power Analysis in R series, focusing on (Generalized) Linear Mixed Effects Models. I assume that, at this point, you already read Parts I to III of this tutorial, if you haven’t I suggest doing so before working through this. As I already mentioned in previous parts, there are good available options for Mixed Model power analysis already, an overview of which can be found in the SOP document of the D2P2 lab. This tutorial however, is meant to give you a deeper understanding of what you are doing when you simulate data and do power analysis by simulation. I hope that the last 3 parts already helped you in achieving this and that this final part will help to generalize this knowledge to mixed-effects models. The good news is that we will actually not be doing a lot of new stuff, the novel part about mixed-effects models compared to the models that we simulated in part III is mostly that we need to use the same methods and add some new things to them (after all this tutorial is about teaching a general skill so it’s good that we can build up on what we learned in the first three parts). Still, there is a lot of ground to cover so we better get started right away.

    Mixed-effects models are extremely powerful in that they are incredibly flexible to the data-structure that they can work with, most notably that they:

    1. They are able to handle between-subject and within-subject effects in the same model along with numeric predictors (covariates).
    2. They allow for non-independent observations, i.e. observations from the same unit of measurement (e.g. a participant, a school, a country etc.) and even from multiple nested or crossed (see infobox below) units of measurements simultaneously.

    A few words on nested vs. crossed random effects

    As mixed-effects model literature has unfortunately developed to use different terms to mean different things, I just want to spend a few words on what I mean by nested as opposed to crossed random effects.

    A data structure is nested when observations are organized in one or more clusters. The classic example of nested data is that we observe the performance of school-children over time. Because we collect data throughout multiple years, we have multiple observations per child, so we can say that observations are nested within children. We also observe multiple children per class, so children are nested within classes. If we involve not one but multiple schools in our study, then we have children nested in classes nested in schools. If we have schools from multiple countries we have …. well, you get the idea. It is important to realize that nested data is defined by “belong” and “contain” relationships. The data is fully nested if (per reasonable assumption in this case) each child only belongs to one specific class, each class only belongs to one school etc. This point is sometimes confusing as we might, in our data-frame, have multiple classes from different schools that have the same ID in the data-set, for instance there might be a class called 9A in schools 1 and 2. However, it is important to realize that these classes are not the same but different classes in reality. They contain different students and belong to a different school. Therefore the data is nested.

    In crossed, or cross-classified data, each observation does also belong to not only 1, but multiple nesting levels. However, the as opposed to fully nested data, the nesting levels are not bounded to each other by a “contain” or “belong” relationship as was the case in the example above where classes contained students and belonged to schools. A classic example of cross-classified data is experimental data in which, for instance, people had to press a button quickly whenever pictures appear on a computer screen. Imagine that we recruit 100 participants and each sees the same 20 pictures 5 times. In this case we have 100x20x5 = 10,000 observations. Imagine we would identify these observations by their row-number (i.e. 1 … 10,000). Now each of these observations “belongs” to a participant, and “belongs” to a specific stimulus. For instance, assuming that the data is ordered ascending by participant and stimulus, the first observation (observation 1) belongs to participant 1 and stimulus 1, the last observation (observation 10,000) belongs to participant 100 and stimulus 20. As you might already realize, in this case it is not the case that a participant either contains or belongs to a specific stimulus and vice versa: Each participant sees each stimulus, or framed differently, each stimulus presents itself to each participant - the two nesting levels are independent from each other, they are cross-classified.

Building up mixed-effects models

We will approach the simulation of mixed-effects models in a bottom-up approach. We will first start simple with a model that has only a single effect and nesting level and then explore different situations in which there are more fixed and random effects and both. Throughout this tutorial, we will use a new hypothetical research question focused on music preference. The overarching research goal will be to - once and for all - find out whether Rock or Pop music is better. Of course, we could just ask people what they prefer, but we do not want to do that because we want a more objective measure of what is Rock and Pop (people might have different ideas about the genres). Therefore, we will have participants listen to a bunch of different songs that are either from a Spotify “best-of-pop” or “best-of-rock” playlist and have them rate the song on a evaluation scale from 0-100 points. We will start this of really simple and make this design more complex along the way to demonstrate the different aspects that we need to consider when simulating mixed-model data.

Simulating a Within-Subject Effect

As a first demonstration, we will simulate a MEM with only the within-subject effect. We also will now write a function that allows us to, quite flexibly, simulate various data structures. No worries, it will make use of things that we already introduced in earlier parts of this tutorial (e.g. rnorm and mvrnorm) so you will hopefully be able to understand what is going on just fine.

To investigate the question stated above, we can use the following design:

  • DV : liking of Song (0-100)
  • IV (within-subject) : Genre of Song (Rock/Pop)

Note that we made the design choice that genre will be a within-subject effect. In theory, it could also be a between-subject design where a person listens to only 1 genre, pop or rock, but a within-subject design has the advantage that general liking of music across conditions (e.g. person 1 might like both genres more than person 2) does not affect the results (as much) as it would in a between-subject design.

I say “as much” because it could still affect the results even in a within subject design if, for instance, a person that likes music generally more will also show fewer difference between conditions compared to a person that dislikes music more generally. In that case, the overall score is related to the difference score which can impact our inference.

The first thing that we need to do is not entirely new for us at this point. We will use the rnorm function to generate the scores for the 2 conditions. However, we will now follow a slightly different approach where we will not create the 2 groups’ scores directly (i.e. use rnorm() for each group separately), but will instead only generate the design matrix of the data, i.e. the structure of the data without having any actual scores for the dependent variable in there. Eventually we will then add the DV in a second step by calculating it’s value for each row in the data-set based on the values for the other variables in that row. However, before we will use the “old” approach one last time because it allows us to explore some important insights.

Generating a design-matrix with 1 within-subject factor

As this is the first time we actually write a custom function in this tutorial, I will spend some time to explain. The following function generate_design will allow us to pass any number of subjects that we would like it to generate. In the function header, we pass the number of participants n_participants and the number of genres n_genres that our design should have. Moreover it allows us to specify the number of songs per genre n_songs which at this part does not do anything in the function (we will add this functionality in a minute). The function then calls expand.grid, a built-in R-function, that will do nothing but create one row in our data-frame for each possible combination of participant-number and genre. The final line of the function returns this created design matrix whenever we call on the function.

generate_design <- function(n_participants, n_genres, n_songs){
  
  design_matrix <- expand.grid(participant = 1:n_participants, genre = 1:n_genres) # here we create the data-frame
  design_matrix$genre <- ifelse(design_matrix$genre ==1, "rock", "pop") # we rename the genres so its easier to follow (this is not really necessary)
  return(design_matrix) # return the data-frame
}

Lets start slow and first create a data-frame that represents 10 people listening to one song from each genre. Thus, when we have 10 people and 2 genres with 1 song each, there should be 20 rows in our data-frame. Let’s see

data1 <- generate_design(n_participants = 10, n_genres = 2, n_songs = 1)
paged_table(data1)

Nice! Indeed our data has 20 rows, and each participant listens to rock and pop once - exactly what we wanted. Now, as said, we would like to have each person listen to more songs - lets say 20 songs per genre. The first instinct here could be to add n_songs to the function just as we did add n_genre. However, this does not work as songs are unique per genre. That is, while each participant listens to both genres, each song only belongs to one genre. For instance the specific song “Blank Space by Taylor Swift” only belongs to the “pop” but not the “rock” condition. Thus, we need to make sure that the songs will have unique identifiers, rather than numbers that will appear in both conditions. Luckily we can easily fix this:

generate_design <- function(n_participants, n_genres, n_songs){
  
  
  design_matrix <- expand.grid(participant = 1:n_participants, genre = 1:n_genres, song = 1:n_songs) # adding song to expand.grid
  design_matrix$genre <- ifelse(design_matrix$genre ==1, "rock", "pop")
  design_matrix$song <- paste0(design_matrix$genre, "_", design_matrix$song) 
  return(design_matrix)
}

As you can see, we just added a slight modification to the function. We added the song variable to expand.grid and in line 3 of the function we rename the song variable using paste0, a function that will just concatenate strings together. We use this to create song-names that now include the genre name. This way, song 1 in the rock-genre will have a different name from song 1 in the pop-genre, telling our model that these are in fact unique songs that cannot be interchanged. Note that this also means that we will pass the number of songs per genre instead of the total number of songs. Lets try this out. If our code functions correctly, we should have 10 participants x2 genres x20 songs per genre = 400 rows in our data frame.

data2 <- generate_design(n_participants = 10, n_genres = 2, n_songs = 20) # note that we pass the number of songs per GENRE
data2 <- data2[order(data2$participant),] # order the data by participant number so its easier to see how its structured
paged_table(data2)

Indeed, we get 400 rows and each participant listens to 20 rock songs and 20 pop songs, that all have unique names - neat! Now that we have the design-simulation out of the way, let us give scores to the songs. We first will do this with only 1 song per person per genre (that is the same across participants) to keep the model simple first.

Revisiting the old simulation approach once more

As mentioned our DV is the liking of songs on a 100-point scale.

Let us assume that, in the population, both genres are liked equally and that people also in general rather like music. We are pretty sure that people like music compared to disliking it, so we are pretty sure the population mean will not be smaller than 50 and we can be 100% sure it will not be larger than 100. Using the 2SD rule from earlier parts of the tutorial as a standard to put our assumed means into numbers we would therefore use a normal distribution with a mean 0f 75 and an SD of 7.5 for both groups. As we are working with a within-subject design, we would also assume the two distributions to be correlated in some way. For example, we could assume that the scores are positively correlated, meaning that a person with a relatively higher liking for rock, will also, in general, like pop music more and vice versa.

As a side note: We are working with a bounded scale here, where scores cannot be smaller than 0 or larger than 100. As normal distributions are actually unbounded, they do not perfectly represent this situation, and we might get scores larger than the boundaries. There are different ways we could deal with this.

First, we could specify the boundaries by censoring the normal distribution. This means we would set each score falls outside the scores to either the minimum or maximum of the scale. This will result in scores that only fall inside of the specified range, but will result in peaks at both ends of the scale, as we are keeping the scores.

We could also truncate the distribution, i.e. throw away all scores that do not fall inside our specified range. This will not result in peaks at the scale boundaries, but will however, result in data loss, that we would either have to accept or compensate for by resampling until we get scores inside the range. Both approaches have there advantages and disadvantages, and the question which one we should use boils down to what we believe will happen in our study: Will people overproportionally give very low or high ratings (i.e. people really HATING or LOVING songs)?

In that case we would expect peaks at the boundaries and might prefer censoring the normal distribution as it might be closer to what we will observe in our study. If we do not expect that and mainly care about keeping the scores inside the range, we could instead use truncation. Here, we will just ignore this problem for now (which is which is not that problematic after all as the underlying process that generates the data is the same normal distribution in all cases).

Before, we simulated a situation like this by specifying the group means that we would expect and the correlation between the group means that we expect using the mvrnorm function (if you do not know exactly know what this code means consider revisiting part II of this tutorial):

require(MASS) # load MASS package
## Loading required package: MASS
group_means <- c(rock = 75,pop = 75) # define means of both genres in a vector
rock_sd <- 7.5 # define sd of rock music
pop_sd <- 7.5 # define sd of pop music
correlation <- 0.2 # define their correlation

sigma <- matrix(c(rock_sd^2, rock_sd*pop_sd*correlation, rock_sd*pop_sd*correlation, pop_sd^2), ncol = 2) # define variance-covariance matrix

set.seed(1)
bivnorm <- data.frame(mvrnorm(nrow(data1)/2, group_means, sigma)) # simulate bivariate normal (we use nrow(data1)/2, the number of rows from the data-set above to simulate 10 observations per group)
par(mfrow=c(1,2))
hist(bivnorm$rock, main = "liking of rock music", xlab = "")
hist(bivnorm$pop, main = "liking of pop music", xlab = "")

The above histograms show how our simulation with 10 participants would turn out. As we would expect, with 10 participants the histograms for each group look quite wacky.

Previously, we analyzed these data with a paired t-test, for example:

t.test(bivnorm$rock, bivnorm$pop, paired = T)
## 
##  Paired t-test
## 
## data:  bivnorm$rock and bivnorm$pop
## t = -0.73577, df = 9, p-value = 0.4806
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -9.618983  4.897481
## sample estimates:
## mean of the differences 
##               -2.360751

showing us that there is no difference between the groups, just as we would expect. We could of course also model the same data in a mixed-effects model (we have seen in part II and III of this tutorial how t-tests can be rewritten as linear models).

So lets try this, here comes our first mixed model simulation! However, let us first increase the sample-size to very high so I can demonstrate some important things that we do not want to get distorted by sampling variation in a small sample-size by now. So instead of 10 people let us simulate the same data but with 10,000 people.

data1 <- generate_design(10000, 2, 1)

group_means <- c(rock = 75,pop = 75) # define means of both genres in a vector
rock_sd <- 7.5 # define sd of rock music
pop_sd <- 7.5 # define sd of pop music
correlation <- 0.2 # define their correlation

sigma <- matrix(c(rock_sd^2, rock_sd*pop_sd*correlation, rock_sd*pop_sd*correlation, pop_sd^2), ncol = 2) # define variance-covariance matrix

set.seed(123)
bivnorm <- data.frame(mvrnorm(nrow(data1)/2, group_means, sigma)) # simulate bivariate normal (we use nrow(data1)/2, the number of rows from the data-set above to simulate 10 observations per group)
par(mfrow=c(1,2))
hist(bivnorm$rock, main = "liking of rock music", xlab = "")
hist(bivnorm$pop, main = "liking of pop music", xlab = "")

We can already see that the histograms look much more normal now. So let us redo the t-test.

t.test(bivnorm$rock, bivnorm$pop, paired = T) 
## 
##  Paired t-test
## 
## data:  bivnorm$rock and bivnorm$pop
## t = 0.90922, df = 9999, p-value = 0.3633
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.09986109  0.27264389
## sample estimates:
## mean of the differences 
##               0.0863914

We can immediately see that the difference between groups got smaller, and the confidence interval is more narrow, showing us that we have a much more accurate simulation now. So now to the mixed model. First we have to restructure the above data as lme4::lmer uses long-format data, while the bivnorm data that we have has columns for the observations.

bivnorm_dat <- data.frame(cbind(liking = c(bivnorm$rock, bivnorm$pop), genre = c(rep("rock", (nrow(data1)/2)), rep("pop", (nrow(data1)/2))), participant = rep(1:(nrow(data1)/2), 2))) # this just converts bivnorm to long format and adds the genre variable
bivnorm_dat$liking <- as.numeric(bivnorm_dat$liking) # variable was converted to character for some reason in cbind so lets make it numerical again
bivnorm_dat$genre_f <- factor(bivnorm_dat$genre) # make genre a factor

lmer_bivnorm <- lmer(liking ~ genre_f + (1 | participant), bivnorm_dat)
summary(lmer_bivnorm)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: liking ~ genre_f + (1 | participant)
##    Data: bivnorm_dat
## 
## REML criterion at convergence: 136955.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.3987 -0.6185  0.0035  0.6294  3.8687 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  participant (Intercept) 11.09    3.330   
##  Residual                45.14    6.719   
## Number of obs: 20000, groups:  participant, 10000
## 
## Fixed effects:
##               Estimate Std. Error         df  t value Pr(>|t|)    
## (Intercept)   74.98622    0.05802 9999.00005 1292.519   <2e-16 ***
## genre_f1      -0.04320    0.04751 9999.00002   -0.909    0.363    
## 
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##          (Intr)
## genre_f1 0.000 
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see ?isSingular

First we get a warning about the fit being “singular”. Second, we see that, even though the absolute covariance is just as large this time but negative (-11.25), the model tells us that the covariance of participants is 0!! Can that be right? The correlation is indeed -0.1971886 so using the formula to infer the covariance from the group SDs in the data we would expect

cor(subset(bivnorm_dat_neg, genre == "rock")$liking, subset(bivnorm_dat_neg, genre == "pop")$liking)
*sd(subset(bivnorm_dat_neg, genre == "rock")$liking)
*sd(subset(bivnorm_dat_neg, genre == "pop")$liking)
== 0

It is, however, -11.0874483, which is certainly not 0! However, take a moment to reflect on how a variance is defined as the squared standard-deviation, i.e. \(var = sd^2\). By definition, a squared number cannot be negative, and therefore the variance can also never be negative. Mixed models have a built-in restriction that honors this notion. If a covariance is negative, the mixed model gives us the closest number that it assumes reasonable for a random effect variance, namely 0 which is just an SD of 0 squared, and will tell us that the model is “singular” i.e. the variance component cannot be estimated.

Anova procedures do not “suffer” from this problem, as they use a sum-of-squares approach, where the sum-of-squares are always positive. This means that in cases where the observed correlation between the observations is negative, Anovas and mixed models will yield different results:

ez_an <- ez::ezANOVA(bivnorm_dat_neg, dv = .(liking), wid = .(participant), within = .(genre_f), detailed = T)

print(ez_an)
## $ANOVA
##        Effect DFn  DFd              SSn      SSd                F         p
## 1 (Intercept)   1 9999 112629624.416931 451366.6 2495053.03358603 0.0000000
## 2     genre_f   1 9999         3.796855 673093.4       0.05640339 0.8122785
##   p<.05            ges
## 1     * 0.990114992309
## 2       0.000003376591

Which one is the true result though? Both and neither, it is just a design choice that package developers made. As Jake Westfall put it better than I ever could for models that have these negative covariances:

“The best-fitting model is one that implies that several of the random variance components are negative. But lmer() (and most other mixed model programs) constrains the estimates of variance components to be non-negative. This is generally considered a sensible constraint, since variances can of course never truly be negative. However, a consequence of this constraint is that mixed models are unable to accurately represent datasets that feature negative intraclass correlations, that is, datasets where observations from the same cluster are less (rather than more) similar on average than observations drawn randomly from the dataset, and consequently where within-cluster variance substantially exceeds between-cluster variance. Such datasets are perfectly reasonable datasets that one will occasionally come across in the real world (or accidentally simulate!), but they cannot be sensibly described by a variance-components model, because they imply negative variance components. They can however be”non-sensibly" described by such models, if the software will allow it. aov() allows it. lmer() does not".

A second, very important point to mention is that lme4 actually scales the random effect terms by the residual variance when estimating them. What this means is that, as far as the fitting algorithm is concerned, a model with a random intercept SD of 10 and a residual SD of 20 is exactly identical to a model where the random intercept SD would be 100 and the residual would be 200 or a model where the random intercept SD would be 1 and the residual SD would be 2, at least regarding the random effect structures. All that the model cares about is that the random intercept is 0.5 times the residual SD. What this means for us is that when specifying the random effect structure, we do not actually need to keep the scale of the response in mind if that is difficult to do. In my experience, in my own research where I work a lot with rating scales and/or binary decisions, keeping estimates on the original scale often helps me figuring out what effect sizes I expect but I can imagine that there are many situations in which this is not so easy. Thus, it is good to know that in those situations you can just think about the residual SD first and then derive how much smaller / bigger you expect the random effects to be relative to the residual.

Back on track: Simulating a DV based on the data-frame row entries

I hope you forgive this rather long detour, but I felt that it was important to clarify how simulations for mixed models relate to simulations for simpler models. In the beginning of this tutorial, I mentioned that we would now use a simulation approach in which we would calculate each value for the DV directly from the other values in the data-frame for each row (i.e. directly from the design matrix). As we want to build more complicated models now, we have to leave the direct group mean simulation approach from before behind and move on to the regression weight simulation approach that I will now describe. It involves writing up the simulation in terms of a linear model notation once more (see part II and part III of this tutorial). This is not entirely new as we used linear models to simulate ANOVA designs before in part III, but what is new is that now we are not simulating groups directly before and use dummy variables for each factor but rather use a pre-baked dataset that entails all information for us to calculate the DV for each row of the dataset.

If we write the above model down in a linear model notation we get:

\[ y_i = \beta_0 + \beta_1genre_i + \epsilon_i \]

In the regression weight simulation approach, we will still start of with thinking about the (group) means. However, we will simulate each observation of the dependent variable \(y_i\) by directly using a regression equation describing the regression model that produces our expected means. Thus, for each row, the value for \(y_i\) (i.e. the liking of the particular song for the particular genre for a particular participant) will be the sum specified above.

The problem that we need to solve is that we need to figure out what \(beta_0\) and \(beta_1\) are to be able to calculate \(y_i\).

In this model, \(\beta_0\) represents the grand mean (mean of all expected group means), which is \(\mu_{grand} = (\mu_{rock}+\mu_{pop})/2 \Rightarrow \mu_{grand} = (75+75)/2 \Rightarrow \beta_0 = 75\)

As we use sum-to-zero coding for factors again (see part III of the tutorial), \(\beta_1\) represents the deviation from \(beta_0\) depending on the respective group. We start by making genre a factor variable.

data1$genre_f <- factor(data1$genre) # make a new variable for the factor
contrasts(data1$genre_f)
##      [,1]
## pop     1
## rock   -1

We see that when genre_f in the data is equal rock, it will assume the value -1 in our linear model, while it will assume the value 1 for pop.

If we put this information in the model we can calculate the \(beta_1\) by filling in one of the group means, for instance \(\mu_{rock}\): For instance, we can calculate \(\mu_{rock} = \beta_0 + \beta_1*genre_i\) We specified that \(\mu_{rock} = 75\), and we just calculated that \(\beta_0 = 75\). Moreover, we know that \(genre_i\) in this case is rock, so in the regression equation it will become -1:

\(75 = 75 + \beta_1*-1\). Solving this equation to \(beta_1\) yields \(0 =\beta_1*-1 \Rightarrow \beta_1 = 0\). This is unsurprisingly representing our initial wish that there should be no difference between group means.

The remaining part of the equation, \(\epsilon\), is the error that we specified above as sigma in mvrnorm. If we write this down in simulation code we get:

data1$genre_pop <- ifelse(data1$genre == "pop", 1, -1)
b0 <- 75
b1 <- 0
epsilon <- 7.5
set.seed(1)
for(i in 1:nrow(data1)){
  data1$liking[i] <- rnorm(1, b0+b1*data1$genre_pop[i], epsilon)
  # we simulate a normal distribution here with the mean being the regression equation (except the error term) and the SD being the error term of the regression equation 
}

lmem1 <- mixed(liking ~ genre + (1 | participant), data1, method = "S")
## Fitting one lmer() model. [DONE]
## Calculating p-values. [DONE]
summary(lmem1)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: liking ~ genre + (1 | participant)
##    Data: data
## 
## REML criterion at convergence: 137424.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.2766 -0.6696 -0.0125  0.6715  3.7953 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  participant (Intercept)  0.2734  0.5228  
##  Residual                56.1596  7.4940  
## Number of obs: 20000, groups:  participant, 10000
## 
## Fixed effects:
##                Estimate  Std. Error          df  t value Pr(>|t|)    
## (Intercept)   74.959773    0.053248 9998.985098 1407.756   <2e-16 ***
## genre1         0.008801    0.052990 9998.984622    0.166    0.868    
## 
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr)
## genre1 0.000

As you can see from the output, we get exactly what we would expect. Note that it was very easy to add the random intercept to the model, we just added the term U[data1$participant[i]] to the existing 1-liner for the simulation. If you are unfamiliar with indexing in R, this does nothing more than say that for each row i we want the value from the random intercept list U that belongs to the participant that we see at row i.

To make the random intercept simulation more traceable we can plot it. If we use the code above, but set the error term \(\epsilon\) to a very small number, we will see what influence the random effect has on the data.

epsilon_noerror <- .001
data1_noerror <- subset(data1, participant <= 20) # take only 20 participants for plotting
set.seed(1)
for(i in 1:nrow(data1_noerror)){
  data1_noerror$liking[i] <- rnorm(1, b0+U0[data1_noerror$participant[i]]+b1*data1_noerror$genre_pop[i], epsilon_noerror)
}

ggplot(data1_noerror, aes(x=genre, y=liking, colour=factor(participant), group = factor(participant))) + geom_line()

We can see in the above plot that each participant has exactly no difference in liking for rock and pop music, as \(beta_1\) is 0 and there is no error term that causes random variations. Still, the likings of each participant differ, as the random intercept causes overall ratings to be lower or higher for different participants.

Adding a random slope

Now, we might have the (reasonable) expectation that, while on average, the difference in liking between the two genres is 0 (represented by our fixed effect \(beta_1\) being 0), different people might still prefer one genre over the other, but that variation is just random in that we do not know where it comes from. Still, it might be there and we might want to model it using a random slope for the genre across participant, tracking, for each participant, what their personal difference from the average genre effect i.e. 0 is. In other word, we now expect that it is rarely the case that each participant likes rock and pop exactly the same. Rather, each participant has a preference but these preferences average out across the sample to be 0. We can add this to the regression equation again:

\[ y_{ij} = \beta_0 + u_{0j} + (\beta_1+u_{1j})\times genre_{i} + \epsilon_{i} \]

this equation denotes the fact the value for genre (1 or -1 representing rock and pop) does not only depend on \(beta_1\) but also the random slope term \(u_{1j}\). As you can see above the terms \(beta_1\) and \(u_{1j}\) are between brackets, so they need to be figured out before multiplying them with the value for genre. However, with the current design, we cannot attempt such a stunt, as each participant provides only 2 ratings (1 song from each genre).

Thus, only 1 possible difference score can be calculated per participant. From this single difference score, it is of course not possible to estimate two distinct parameters, namely the average effect \(beta_0\) across participants as well as the average deviation of each participant from that effect. We can only do this if each participant rates more than 2 songs per genre so that two quantities can be distinguished in the model.

Thus, we will now change our data-set a little, so that each participant provides ratings of not only 1 but 20 songs. We already did this in data2 in the beginning of this tutorial and will now just add more participants to that data:

data2 <- generate_design(n_participants = 500, n_genres = 2, n_songs = 20) # note that we pass the number of songs per GENRE
str(data2)
## 'data.frame':    20000 obs. of  3 variables:
##  $ participant: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ genre      : chr  "rock" "rock" "rock" "rock" ...
##  $ song       : chr  "rock_1" "rock_1" "rock_1" "rock_1" ...
##  - attr(*, "out.attrs")=List of 2
##   ..$ dim     : Named int [1:3] 500 2 20
##   .. ..- attr(*, "names")= chr [1:3] "participant" "genre" "song"
##   ..$ dimnames:List of 3
##   .. ..$ participant: chr [1:500] "participant=  1" "participant=  2" "participant=  3" "participant=  4" ...
##   .. ..$ genre      : chr [1:2] "genre=1" "genre=2"
##   .. ..$ song       : chr [1:20] "song= 1" "song= 2" "song= 3" "song= 4" ...

We have 500 participants now in data2, so that the overall number of observations is the same as in data1 for now.

Now, how big do we think that the random slope should be? The number should represent our expectation about the range that any given person’s difference in liking between rock and pop music should be in. For instance, if we take the normal distribution quantities and use a 2-SD rule, we could think that 95% of people should not have a difference in the 2 likings stronger than 10 points. If we divide that by 2, we get 5 as an sd for the random slope.

data2$genre_pop <- ifelse(data2$genre == "pop", 1, -1)
b0 <- 75
b1 <- 0
sd_u0 <- sqrt(11.25)
epsilon <- sqrt(56.25-11.25)
U0 = rnorm(length(unique(data2$participant)), 0, sd_u0)

sd_u1 = 5
U1 = rnorm(length(unique(data2$participant)), 0, sd_u1)
str(U1)
##  num [1:500] -2.092 1.776 2.567 0.093 6.592 ...

The above code shows that, again, we get 1 score per participant for the random intercept and random slope.

set.seed(1)
for(i in 1:nrow(data2)){
  data2$liking[i] <- rnorm(1, 
                           b0+U0[data2$participant[i]]+
                           (b1+U1[data2$participant[i]])*data2$genre_pop[i], 
                           epsilon)

}

lmem3 <- mixed(liking ~ genre + (1 + genre | participant), data2, method = "S", control = lmerControl(optimizer = "bobyqa"))
## Fitting one lmer() model. [DONE]
## Calculating p-values. [DONE]
summary(lmem3)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: liking ~ genre + (1 + genre | participant)
##    Data: data
## Control: lmerControl(optimizer = "bobyqa")
## 
## REML criterion at convergence: 135817.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.8360 -0.6544 -0.0134  0.6536  3.5644 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev. Corr 
##  participant (Intercept) 11.42    3.379         
##              genre1      28.33    5.323    -0.03
##  Residual                45.21    6.724         
## Number of obs: 20000, groups:  participant, 500
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)  74.9965     0.1584 499.0000 473.407   <2e-16 ***
## genre1       -0.2564     0.2427 498.9999  -1.056    0.291    
## 
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr)
## genre1 -0.153

As you can see, the only thing that changed here is that we are using the matrix of random intercept and slope values here for the simulation here by assessing it at column 1 (U01[data2$participant[i], 1]) for the random intercepts and column 2 (U01[data2$participant[i], 2]) for the random slopes.

Adding a crossed random effects term.

It is possible, that some songs are universally more liked across people than others. For example if we imagine for a moment that the song pop_1 in our data-set represents, for example Blank Space by Taylor Swift - obviously an amazing song that most people will like. Similarly, universally liked songs might be in the rock genre and also similarly, there might be songs in both genres that most people do not like so much (remember though that we are working with a “most liked” playlist from Spotify so it makes sense that regardless our overall mean of 75 stays the same).

Thus, it is not only the fact that people differ in their general liking of music, but also that songs differ in their propensity to be liked. To represent this we can add another random intercept representing songs:

\[ y_{ijk} = \beta_0 + u_{0j} + w_{0k} + (\beta_1+u_{1j})\times genre_{i} + \epsilon_{i} \] The new term \(w_{0k}\) is doing just what I described above, and is very similar to \(u_{0j}\) but it represents the term \(w\), a term that tracks the deviation from the overall liking scores for each given song \(_k\).

We can, for instance, assume that the random intercept for songs will be half the size of random intercepts across people, making the assumption that differences between people are often larger than differences between things, in this case songs.

sd_w0 <- sqrt(11.25)/2
W0 <- rnorm(length(unique(data2$song)), 0, sd_w0) # note how we use the number of songs here to specify the size of W0 instead of the participant number

We can now just add this to the simulation of the DV again and add the random intercept to the lmer syntax. Note that in the code below, we need a way to index to the right song for each value in W0. For participants this was easy, as participant numbers are just numerals from 1-J that automatically also represent index values. For songs however, the syntax data2$song[i] will not return a number, but, for instance rock_1, as this is the song ID. We could either covert song IDs to numbers as well, or we can use syntax that retrieves the index of, for example rock_1 in the list of unique songs. This is what the slightly confusing W0[which(unique_songs == data2$song[i])] syntax does. Note that we create unique_songs outside of the loop, because creating the list of unique songs causes R to do a search across the entire data-set for unique song names, and it suffices to do this once instead of every iteration of the loop. You will note that the code will run significantly slower if you put it inside of the loop.

unique_songs <- unique(data2$song)
for(i in 1:nrow(data2)){
  data2$liking[i] <- rnorm(1, 
                           b0+U01[data2$participant[i], 1] + 
                           W0[which(unique_songs == data2$song[i])] +
                           (b1+U01[data2$participant[i], 2])*data2$genre_pop[i], 
                           epsilon)

}

lmem5 <- mixed(liking ~ genre + (1 + genre | participant) + (1 | song), data2, method = "S", control = lmerControl(optimizer = "bobyqa"))
## Fitting one lmer() model. [DONE]
## Calculating p-values. [DONE]
summary(lmem5)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: liking ~ genre + (1 + genre | participant) + (1 | song)
##    Data: data
## Control: lmerControl(optimizer = "bobyqa")
## 
## REML criterion at convergence: 135951.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9684 -0.6629  0.0116  0.6474  3.4745 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev. Corr 
##  participant (Intercept) 12.47    3.531         
##              genre1      25.39    5.038    -0.13
##  song        (Intercept)  5.20    2.280         
##  Residual                45.21    6.724         
## Number of obs: 20000, groups:  participant, 500; song, 40
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)  75.0514     0.3965 53.5030 189.291   <2e-16 ***
## genre1        0.3312     0.4278 71.8909   0.774    0.441    
## 
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr)
## genre1 -0.146
## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see ?isSingular

We can see that the model shows a singularity warning in this case which might be due to the fact that A) the random intercept term for country is very small compared to the other terms and B) the random correlation for the country intercept and slope is estimated based on only 5 pairs of observations (1 for each country), which is obviously very little information.

I wanted to demonstrate how such a nested effect can be added but will now continue without it, as it causes convergence problems here and I want to move on to models with more fixed effects now, and restore the 100% random effect for participants from above.

set.seed(1)
b0 <- 75
b1 <- 0
sd_u0 <- sqrt(11.25)
epsilon <- sqrt(56.25-11.25)
sd_u1 <- 5
corr_u01 <- -.20
sigma_u01 <- matrix(c(sd_u0^2, sd_u0*sd_u1*corr_u01, sd_u0*sd_u1*corr_u01, sd_u1^2), ncol = 2)

U01 <- mvrnorm(length(unique(data2$participant)),c(0,0),sigma_u01)
sd_w0 <- sqrt(11.25)/2
W0 <- rnorm(length(unique(data2$song)), 0, sd_w0) # note how we use the number of songs here to specify the size of W0 instead of the participant number

Adding more fixed effects

Above, we have seen how to simulate random intercepts and slopes. Together with the techniques that we learned in part III of this tutorial, where we already worked with mixed (between x within + covariate) designs, we should be able to build most of the models we work with now. However, we still haven’t done a proper power analysis yet (given that we did not expect a difference between genres above that would also have been pointless anyway), and will therefore add a few other terms to the model on which we will build the power analysis.

Of course, even though in the general population, people like pop and rock equally, there might still be factors that help us to predict a particular person’s liking for music. For instance, people who play an instrument might like music more in general. If we want to add this factor to our model, we will have to grant some of our participants in the ability to play an instrument. Now, the percentage of people playing an instrument is probably lower than 50%. Thus, we should not just make a 50-50 split but come up with some more realistic number. For instance, I would expect that 1 out of 5 (or 20%) of people plays an instrument. However, because it is not a factor that we are manipulating, the number of people playing an instrument in any particular sample will not be 20% but might be more or less close to that number. This is an important point - we could just always grant exactly 20% of our sample, e.g. 20 out of 100 participants, the ability to play an instrument. However, this would not keep in mind that we only expect 20% of the population to play an instrument and in a sample we should expect sampling variance regarding this number. Thus, instead of granting exactly 20% of people this ability, we will sample the probability that someone is playing an instrument with a probability of 20%.

generate_design <- function(n_participants, n_genres, n_songs, prop_instrument = .20){
  
  
  design_matrix <- expand.grid(participant = 1:n_participants, genre = 1:n_genres, song = 1:n_songs)
  design_matrix$genre <- ifelse(design_matrix$genre ==1, "rock", "pop")
  design_matrix$song <- paste0(design_matrix$genre, "_", design_matrix$song) 
  
  instrument_players <- sample(c("yes", "no"), n_participants, prob = c(prop_instrument, (1-prop_instrument)), replace = T) # sample whether people play instrument
  for(i in 1:nrow(design_matrix)){ # grant people the ability to play instrument
    design_matrix$instrument[i] <- instrument_players[design_matrix$participant[i]]
  }
  
  return(design_matrix)
}

data3 <- generate_design(n_participants = 500, n_genres = 2, n_songs = 20, prop_instrument = .20)
data3$genre_pop <- ifelse(data3$genre == "pop", 1, -1)
data3$instrument_yes <- ifelse(data3$instrument == "yes", 1, -1)

mean(data3$instrument == "yes")
## [1] 0.208

In this case, 20.6 percent of our 500 participants play an instrument, which is close to 20% as the sample-size is so large.

Now what does this additional effect imply for the random effect structure: Playing an instrument is obviously a between-subject factor, but it is, in fact, a within-song factor, as for each song, there are people who do and do not play an instrument and who are listening to it. Thus, we can estimate a random slope for instrument within songs, representing the idea that between different songs, the difference in liking between people who do and do not play instruments might differ. For example, if a song that has very sophisticated guitar-playing, the difference in liking between instrument-players and non-players might be very heavy, while for other songs that have average amounts of sophistication, the difference in liking might be relatively small. Thus, we will keep what we did above when adding a crossed random intercept, but now simulate it with a variance-covariance matrix just as we did for the random effects for participant above.

We also need to specify how big we expect that instrument effect to be. As a sum-to-zero coded effect, we could set it to 2.5 so that playing an instrument increases the average liking of music by 5 points compared to non-players.

set.seed(1)
b0 <- 75
b1 <- 0
b2 <- 2.5 # the instrument effect
sd_u0 <- sqrt(11.25)
epsilon <- sqrt(56.25-11.25)
sd_u1 <- 5
corr_u01 <- -.20
sigma_u01 <- matrix(c(sd_u0^2, sd_u0*sd_u1*corr_u01, sd_u0*sd_u1*corr_u01, sd_u1^2), ncol = 2)
U01 <- mvrnorm(length(unique(data2$participant)),c(0,0),sigma_u01)

sd_w0 <- sqrt(11.25)/2
sd_w1 <- sd_w0/4
corr_w01 <- .10
sigma_w01 <- matrix(c(sd_w0^2, sd_w0*sd_w1*corr_w01, sd_u0*sd_w1*corr_u01, sd_w1^2), ncol = 2)
W01 <- mvrnorm(length(unique(data2$song)),c(0,0),sigma_w01)

As you can see, we now have a random-effects variance-covariance matrix for the song effects as well. I set the correlation to be very small in this case, but for no particular reason (I think it does not matter much). Moreover, the random slope is 0.25x the size of the random intercept SD, as I do not expect the aforementioned instrument effect to be very big.

The only thing left is to simulate the DV and estimate the model:

unique_songs <- unique(data3$song)
for(i in 1:nrow(data3)){
  data3$liking[i] <- rnorm(1, 
                           b0+ # fixed intercept (average liking)
                             U01[data3$participant[i], 1] + # random intercept for participants
                           W01[which(unique_songs == data3$song[i]), 1] + # random intercept term for song
                           (b1+ # fixed effect of genre (which is 0)
                              U01[data3$participant[i], 2]) # random slope for genre across participants
                           *data3$genre_pop[i] # for each row whether its pop or rock
                           +(b2+ # fixed effect of instrument 
                               W01[which(unique_songs == data3$song[i]), 2])
                           *data3$instrument_yes[i]# random slope for instrument across songs
                           , epsilon) # residual SD

}

data3$genre <- factor(data3$genre)
data3$instrument <- factor(data3$instrument, levels = c("yes", "no")) # explicitly set yes to be 1 and no to be 0

lmem7 <- mixed(liking ~ genre + instrument + (1 + genre | participant) + (1 + instrument | song), data3, method = "S", control = lmerControl(optimizer = "bobyqa"))
## Fitting one lmer() model. [DONE]
## Calculating p-values. [DONE]
summary(lmem7)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: liking ~ genre + instrument + (1 + genre | participant) + (1 +  
##     instrument | song)
##    Data: data
## Control: lmerControl(optimizer = "bobyqa")
## 
## REML criterion at convergence: 135963.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9185 -0.6525 -0.0122  0.6531  3.5999 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev. Corr 
##  participant (Intercept) 12.6255  3.5532        
##              genre1      25.5115  5.0509   -0.16
##  song        (Intercept)  2.6381  1.6242        
##              instrument1  0.1525  0.3905   0.00 
##  Residual                45.2161  6.7243        
## Number of obs: 20000, groups:  participant, 500; song, 40
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)  75.28328    0.32770  88.67272 229.735   <2e-16 ***
## genre1       -0.04608    0.34708 109.64759  -0.133    0.895    
## instrument1   2.50729    0.21125 416.01733  11.869   <2e-16 ***
## 
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) genre1 genre2
## genre1      -0.730              
## genre2      -0.327  0.387       
## instrument1  0.277  0.005 -0.002

As you can see, the only thing that changed here compared to previous versions is that we split up the genre effect into b1a for the effect of pop and b1b for the effect of rock. Both of them get a line in the simulation code, with the U01[data4$participant[i], 2] and U01[data4$participant[i], 3] referring to the random effect distribution of pop and rock respectively.

Adding an interaction

What if we would like to add the genre by instrument interaction here? As this is a within-between interaction, there will be no random effects associated with this interaction. Therefore, we can add it just as we did earlier in part 3 of this tutorial. For instance the b1ab2 effect size refers to how much bigger we expect the difference of pop music from the grand mean to be for people who do play an instrument compared to people who do not play an instrument. b1bb2 Has the same interpretation for rock music.

b1ab2 <- -3
b1bb2 <- 12

unique_songs <- unique(data4$song)
for(i in 1:nrow(data4)){
  data4$liking[i] <- rnorm(1, 
                           b0+ # fixed intercept (average liking)
                             U01[data4$participant[i], 1] + # random intercept for participants
                           W01[which(unique_songs == data4$song[i]), 1] + # random intercept term for song
                           (b1a+ # fixed effect of genre (which is 0)
                              U01[data4$participant[i], 2]) # random slope for genre across participants
                           *data4$genre_pop[i]  # for each row whether its pop
                           +(b1b+ # fixed effect of genre (which is 0)
                              U01[data4$participant[i], 3]) # random slope for genre across participants
                           *data4$genre_rock[i]  # for each row whether its rock
                           +b1ab2*data4$genre_pop[i]*data4$instrument_yes[i] # pop*instrument interaction
                           +b1bb2*data4$genre_rock[i]*data4$instrument_yes[i] # rock*instrument interaction
                           +(b2+ # fixed effect of instrument 
                               W01[which(unique_songs == data4$song[i]), 2])
                           *data4$instrument_yes[i]# random slope for instrument across songs
                           , epsilon) # residual SD

}


lmem9 <- lmer(liking ~ genre * instrument + (1 + genre * instrument | participant) + (1 + instrument | song), data4, control = lmerControl(optimizer = "bobyqa"))
## boundary (singular) fit: see ?isSingular
summary(lmem9)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## liking ~ genre * instrument + (1 + genre * instrument | participant) +  
##     (1 + instrument | song)
##    Data: data4
## Control: lmerControl(optimizer = "bobyqa")
## 
## REML criterion at convergence: 4100
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.9706 -0.6655  0.0055  0.6835  3.2152 
## 
## Random effects:
##  Groups      Name               Variance   Std.Dev. Corr                   
##  song        (Intercept)         1.3606443 1.16647                         
##              instrument1         0.0003257 0.01805  1.00                   
##  participant (Intercept)         5.4140813 2.32682                         
##              genre1             12.3449699 3.51354  -0.56                  
##              genre2             24.3080763 4.93032   0.11  0.65            
##              instrument1        15.0023384 3.87329   0.61 -0.87 -0.33      
##              genre1:instrument1 14.3134080 3.78331  -0.58  0.94  0.58 -0.91
##              genre2:instrument1 12.6011017 3.54980  -0.70  0.82  0.20 -0.90
##  Residual                       48.9195750 6.99425                         
##       
##       
##       
##       
##       
##       
##       
##       
##   0.89
##       
## Number of obs: 600, groups:  song, 60; participant, 10
## 
## Fixed effects:
##                    Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)         74.0413     1.7531  2.6051  42.235 0.000091 ***
## genre1              -1.6228     2.1450  2.1525  -0.757   0.5234    
## genre2               8.2713     2.2298  3.3231   3.709   0.0286 *  
## instrument1         -0.1673     1.7466  2.5670  -0.096   0.9307    
## genre1:instrument1   0.2586     2.1344  2.1105   0.121   0.9141    
## genre2:instrument1  11.9038     2.2196  3.2632   5.363   0.0102 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) genre1 genre2 instr1 gnr1:1
## genre1      -0.784                            
## genre2      -0.485  0.762                     
## instrument1  0.757 -0.769 -0.503              
## gnr1:nstrm1 -0.770  0.943  0.750 -0.791       
## gnr2:nstrm1 -0.504  0.750  0.542 -0.489  0.775
## optimizer (bobyqa) convergence code: 0 (OK)
## boundary (singular) fit: see ?isSingular

As you can see above the only thing left to do then is to multiply the effect by the respective indicators for each row identifying the genre and whether the person plays an instrument for each row. For b1ab2 we multiply by the genre_pop contrast variable, and for b1bb2 we use the genre_rock variable. For example, for a row of a person that does play an instrument and a song that is a pop song, the value will be: b1ab2*data4$genre_pop[i]*data4$instrument_yes[i] = -3*1*1 = -3.

Extending Random-Effect matrices

If the interaction would be fully within participants and we would like to add random effects, they can be added to the random effect matrix in the exact same way that we added the 2nd contrast for genre above. The matrix just needs to grow by two additional row and column (i.e. have 6x6 = 36 total entries). the first 3 columns would contain the same information as before in the first 3 rows and 3 additional rows for the main effect of instrument and the genre_pop*instrument_yes and genre_rock*instrument_yes interactions respectively. Columns 4, 5 and 6 would contain the variances and covariances of those additional variables. that now also contain the variance of the random effect for the main effect of instrument, as well as interaction and the covariances with the random intercept of participant and the random slopes of genre_pop and genre_rock.

As this can be very time-consuming, I created a function make_vcov_matrix that can help with creating the random effect matrices. As demonstrated below, you can pass to it the parameters and the correlations between the parameters (in this case 15 correlation for the 6 effects (6^2-6)/2).

sd_u0 <- 1 # random intercept
sd_u1a <- 2 # random slope of pop
sd_u1b <- 3 # random slope of rock
sd_u2 <- 4 # random slope of instrument
sd_u1a2 <- 5 # random slope of pop X instrument interaction slopes
sd_u1b2 <- 6 # random slope of rock X instrument interaction slopes

corr_u01a <- .01 # random correlation between intercept and pop slope
corr_u01b <- .02 # random correlation between intercept and rock slope
corr_u02 <- .03 # random correlation between intercept and instrument slope
corr_u01a2 <- .04 # random correlation between intercept and pop X instrument interaction slopes
corr_u01b2 <- .05 # random correlation between intercept and rock X instrument interaction slopes

corr_u1a1b <- .06 # random correlation between pop and rock slopes
corr_u1a2 <- .07 # random correlation between pop and instrument slopes
corr_u1a1a2 <- .08 # random correlation between pop and pop X instrument interaction slopes
corr_u1a1b2 <- .09 # random correlation between pop and rock X instrument interaction slopes

corr_u1b2 <- .10 # random correlation between rock and instrument slopes
corr_u1b1a2 <- .11 # random correlation between rock and pop X instrument interaction slopes
corr_u1b1b2 <- .12 # random correlation between rock and rock X instrument interaction slopes

corr_u21a2 <- .13 # random correlation between instrument and pop X instrument interaction slopes
corr_u21b2 <- .14 # random correlation between instrument and rock X instrument interaction slopes

corr_u1a21b2 <- .15 # random correlation between pop X instrument interaction and rock X instrument interaction slopes

pars <- c(sd_u0, sd_u1a, sd_u1b, sd_u2, sd_u1a2, sd_u1b2)

corrs <- c(corr_u01a, corr_u01b, corr_u02, corr_u01a2, corr_u01b2,
           corr_u1a1b, corr_u1a2, corr_u1a1a2, corr_u1a1b2,
           corr_u1b2, corr_u1b1a2, corr_u1b1b2,
           corr_u21a2, corr_u21b2,
           corr_u1a21b2
           )


make_vcov_matrix <- function(pars = c(), corrs = c()){
  
  # make empty square matrix of size length(pars)
  m <- matrix(rep(NA), nrow = length(pars), ncol = length(pars))
  k_low <- 1 # counter for below diagonal matrix
  k_high <- 1 # counter for above diagonal matrix
  
  # the loop basically iterates over the columns and rows and fills in the variances and covariances in both diagonals of the matrix
  for(col in 1:ncol(m)){
    for(row in 1:nrow(m)){
      if(col == row){
        m[row, col] <- pars[col]*pars[row]
      } else {
        if(col < row){
          m[row, col] <- pars[col]*pars[row]*corrs[k_low]
          k_low <- k_low+1
        } else {
          m[row, col] <- pars[col]*pars[row]*corrs[k_high]
          k_high <- k_high+1
        }

      }
    }

  }
  
  return(m) # return the filled-in matrix
  
}

sigma_u <- make_vcov_matrix(pars, corrs)

U <- mvrnorm(length(unique(data4$participant)),rep(0, length(pars)),sigma_u)

Note that the order that you pass the arguments to the function should be the same order that you will use it in the regression-equation simulation approach. This is, whatever effect you enter the function first (e.g. the random intercept), can be assessed later as U[data$participant[i], 1] for example, meaning that the first effect will be in the first column of U and the second effect in the second column etc. The correlation need to be entered in the following order:

  • E1 x E2, E1 x E3, E1 x E4, E1 x E5, E1 x E6
  • E2 x E3, E2 x E4 ….

which in terms of the example is

  • intercept x genre_pop, intercept x genre_rock, intercept x instrument, intercept x genre_pop:instrument, intercept x genre_rock:instrument
  • genre_pop x genre_rock, genre_pop x instrument …..

Thus, in terms of the variance-covariance matrix this corresponds to entering the correlations column-wise for the lower diagonal of the matrix.

Appendix C: Bayesian Data Simulation with brms

The Bayesian approach to power simulation - or more technically correct True-Positive Rate simulation as power is not really a Bayesian concept - is a little different from what we did above. We will still generate a design matrix but instead of simulating the outcome variable using rnorm or other simulation functions, we can make use of the fact that Bayesian models have prior information as a part of the model itself. This means that when running a simulation, we can tell the model our expectations by using them as priors, and then tell the model to simulate those priors, which will give us the expected outcome variables. This may sound weird but the nice thing about it is that we do not need imitate the model by writing the model formula ourselves in the simulation but can just use the formula from the model directly.

I assume that you already know brms at this point, so I will not go into detail too much and just give a demonstration on how to simulate data using brms.

First we will define the prior, setting the parameters for the simulation. brms does not really support setting random correlations independently for each random effect so we will not do this here.

Afterwards we will generate a design, just like above, and fit a brms model that only samples from the prior. This model will serve as our “simulation engine” that will generate the dependent variable predictions.

When specifying the priors, the class argument sets which kind of effect it is about (intercept, b = fixed or sd = random) and coef defines the name of the parameter (the 1 is added here (e.g. instrument1) because of the contrast coding). The group in the random effects terms specifies which grouping factor the effect refers to. What we called epsilon before, i.e. the residual variation, is now called sigma. Note that we defined all parameters as normal distributions with a standard deviation of 1. We could also decrease the standard deviation further, but it does not really matter here, as long as the standard deviations are reasonably small.

library(brms)

priors_sim <- c(set_prior("normal(75,1)", class = "Intercept"), 
                set_prior("normal(0,1)", class = "b", coef = "genre1"),
                set_prior("normal(2.5,1)", class = "b", coef = "instrument1"),
                set_prior("normal(3.354102,1)", class = "sd", coef = "Intercept", group = "participant"),
                set_prior("normal(5,1)", class = "sd", coef = "genre1", group = "participant"),
                set_prior("normal(1.677051,1)", class = "sd", coef = "Intercept", group = "song"),
                set_prior("normal(0.4192628,1)", class = "sd", coef = "instrument1", group = "song"),
                set_prior("normal(6.708204,1)", class = "sigma"))

init_dat <- generate_design(n_participants = 100, n_genres = 2, 
                               n_songs = 50, prop_instrument = .20, genre_names = c("pop", "rock"))

init_dat$liking <- rep(-999)

brm_prior <- brm(liking ~ genre + instrument + (1 + genre | participant) + (1 + instrument | song), init_dat, cores = 4, chains = 4, sample_prior = "only", prior = priors_sim)
## Compiling Stan program...
## Start sampling

The next step is to predict values of the outcome variable given our simulation parameters that we specified in the prior. We do this in the code below by using the predict function. Importantly, we need to set the summary=FALSE argument to tell brms to not summarize the predictions but instead give us the raw predicted values. We add these values to the data frame and fit a new model now that will test whether these predictions show a credible effect of the variable of interest. For this we use a custom function called brm_postprop that will provide the proportion of posterior density that is smaller or larger than zero.

init_dat$liking <- predict(brm_prior, newdata = init_dat, summary=FALSE, nsamples=1)[1,] 

brm_post <- brm(liking ~ genre + instrument + (1 + genre | participant) + (1 + instrument | song), init_dat, cores = 4, chains = 4)
## Compiling Stan program...
## Start sampling
# compute posterior probability > or < 0 for parameter.
brm_postprop <- function(brm_fit, pars, direction = c("<", ">")){
    
    post_samples <- c(posterior_samples(brm_fit, pars = pars))[[1]]
    if (direction == "<") {
      postprob <- mean(post_samples < 0)
    } else if (direction == ">") {
      postprob <- mean(post_samples > 0)
    } else {
      warning("direction must be either '<' or '>'")
    }
    return(postprob)
}
  

postprop <- brm_postprop(brm_post, "instrument1", direction = "<")
postprop
## [1] 0.12025

The above code shows everything that we need to do to simulate data in brms.

  1. Fit the prior model
  2. predict the outcome
  3. fit the post model to see whether the result is credible.

We can use this to do a power analysis by performing the second code chunk here in a loop, just as we did above and test the proportion of posterior probabilities that are smaller than the desired alpha-value. Specifically, you should run the code as shown above and fit the first brm_post model outside of the loop, such that you can refit the models in the loop using the brms::update function, to prevent the model from being recompiled every iteration of the loop:

while(pp_song_mat$power[n] < power_crit){
  n <- n+1 # increase n for next iteration
  skips_at_n <- 0 # initialize counter for how many trials were skipped
  
  #### increasing simulation number ####
  for(i in 1:n_sims){
    
    # make design-matrix
    tmp_dat <- generate_design(n_participants = pp_song_mat$participant[n], n_genres = 2, 
                               n_songs = pp_song_mat$song[n], 
                               prop_instrument = .20, genre_names = c("pop", "rock"))
    tmp_dat$liking <- predict(brm_prior, newdata = tmp_dat, summary=FALSE, nsamples=1, 
                              allow_new_levels =TRUE)[1,] 
    brm_tmp <- update(brm_post, newdata = tmp_dat)
    postprops[i] <- brm_postprop(brm_tmp, "instrument1", direction = "<")
  }
  pp_song_mat$power[n] <- mean(postprops < alpha)

}

I do not run this here as it would take too long, but as you can see the structure of the loop is in principle the same as in the frequentist case, such that you can just replace the relevant code components, even in the parallel case.

Footnotes


  1. Running more than 6 sessions on a 6 core / 12 thread processor will not increase the speed linearly anymore, as using both threads per core will decrease the speed per parallel session. In the end, 12 sessions might still be faster than 6 sessions though, but from my personal experience it does not matter that much.↩︎

  2. In fact, this parallel loop is not super optimal, as the threads that have already finished simulating for a given sample size need to wait for the other threads to be done before moving on to the next blocks of n’s. That is not really nice but it is good enough to still provide some speed-up without giving up the flexibility that comes with the optional stopping of the while loop, which would otherwise not be possible, had we just used foreach to iterate over the participant number + song number combination directly.↩︎