__Click [HERE](https://julianquandt.com/rmd_files/2020-05-12-power-analysis-by-data-simulation-in-r-part-ii.Rmd) to download the .Rmd file__ _This blog is also available on [R-Bloggers](https://www.r-bloggers.com/)_

', '', '

', '',
'

", sep = "\n")
}
})
```
# The Power Analysis by simulation in `R` for really any design - Part II
This is Part II of my tutorial on how to do power-analysis by simulation.
In Part I, we saw how to do a simulation for a simple toy-example with a coin-toss.
In this part, we will use a more realistic problem that we might encounter in our daily research life and see how to simulate the power for these designs.
By looking at how to do power-simulation for the independent-samples t-test and the paired t-test we will learn how to simulate normal-distributions, how to specify their effect-sizes, in terms of $Cohen's\ d$. Moreover, we simulate correlated (i.e. multivariate) normal distributions in cases where we have correlated observations (e.g. paired-sample t-test).
This will be an important tool for later parts of this tutorial.
In part III of this tutorial we will learn how we can conceptualize basically _any_ design as a linear model and thereby be very flexible in our power analysis.
In part IV we will learn how to apply this technique to complicated designs such as linear mixed-effects models and generalized mixed-effects models.
## Simulating a between-subjects t-test
Let's get to it.
The first thing we will need again in our simulation is one of the implemented simulation functions in R (those that let `R` run theoretical experiments for us), but this time it is not `rbinom` as we are not working with coin-flips but `rnorm` - the simulation function for the normal distribution.
Let's have a short look at that function as we will keep working with it throughout the tutorial.
`rnorm(n, mean, sd)` takes three arguments, a sample-size `n`, a `mean` and a standard-deviation `sd`.
By specifying these values, we can sample random 'people' (or observations) that are participating in our simulated experiments.
Imagine, for example, that we have an intervention study in which we have a treatment group and a control group.
We can easily simulate both groups with `rnorm` but what should the means and sds of the groups be?
There are two ways we can approach this.
1. We could think about what group means we expect in our given case and what we expect the spread of the groups to be _on the measurment scale that we are working with_.
For example, if we use a 40-point scale for a clinical test we might know that a group with deficiencies on the thing that we measure would probably score around 10 points and that almost everyone from that group would score lower than 20 points.
This statement (most people score around 10, almost everyone scores lower than tified as normal distribution with a mean of 10 and a standard-deviation of 5. In this case only 2.5% of the values (i.e. the values outside the 95% CI) will be higher than 20.
1. In a new research project, we might not be able or willing to to make these statements.
In this case, by making some extra assumptions, we can fall back to the approach that we also use in power-calculation software in most cases and define a _standardized effect size_ that we can use to simulate data rather than defining the group means and standard-deviations directly.
I personally try to go with the first approach whenever possible, as I think that in many cases we know more about what we expect from our data than we think, even in new projects.
Even if we __do not__ know a lot about our data, we might still try out different assumptions (i.e. means and sds) for the groups in our simulation to see what power we would get for each of them.
This way, we can make informed decisions about our sample size that are more nuanced than the one in which we just assume a standardized effect size and see what sample-size it implies and are forced to think harder about our data - something that might seem difficult and annoying at first, but is extremely useful and eduucational.
Another advantage of specifying the groups directly is that we can do this for any arbitrarily complex design where standardized effect sizes are often difficult to calculate.
This said, for the cases where we might really not be willing to specify groups directly, and because it allows me to demonstrate some other interesting points, in this part I will discuss how we can use standardized effect-sizes in our simulation.
In part III and IV however, we will always specify effects on the raw scale.
If we were using GPower now, we would most likely just fill in a difference between groups in $Cohen's\ d$ and be done with it.
We could of course also follow this approac in a simulation by defining the groups based on the implied $Cohen's\ d$.
For instance, we can just assume that group 1 as `rnorm(n, 1,2)`.
Now, following from the formula for Cohen's d:
$$Cohen's\ d = \frac{(M_1 - M_2)}{pooled \ sd}$$
where
$$pooled\ sd = \sqrt\frac{(sd_1^2+sd_2^2)}{2}$$
and adhering to the student t-test assumption of equal variances we can fill in the pooled sd formula above as
$$pooled\ sd = \sqrt\frac{(2^2+2^2)}{2} = 2$$
to get a $Cohen's\ d$ of .50:
$$Cohen's\ d = \frac{(1 - 0)}{2} = 0.5$$
To get any other value for $Cohen's\ d$ we can just change the pooled sd value to whatever we want.
More generally, we want to solve the equation above for the pooled sd after specifying any $Cohen's\ d$, e.g.:
$$0.5= \frac{(1 - 0)}{pooled\ sd}$$
We can solve an equation like that with `R`'s somewhat unintuitive `solve` function like this:
```{r solve_cohensd}
solve(0.5,1) # cohens d of .5
solve(0.25,1) # cohens d of .25
solve(2,1) # cohens d of 2
```
giving us three examples of how we would need to specify pooled sd to arrive at a particular $Cohen's\ d$.
Thus, if we want to do a t-test with two simulated groups and a cohen's d of 0.5 we can simulate two groups of a particular sample-size by using the `rnorm` function.
Let's say we have 30 participants in each group.
```{r sim_groups1}
set.seed(1234)
group1 <- rnorm(30, 1, 2)
group2 <- rnorm(30, 0, 2)
```
We can visualize the groups that we got in a plot like this:
```{r plot_groups1}
hist(group1, col = "#addd8e", breaks = 10, main = "Histogram of both groups", xlab = "")
hist(group2, add = TRUE, breaks = 10, col= "#31a354")
```
We can already make important observations from this plot:
We wanted to get normal distributions, but what we got here does not really look normal.
Why is that? Because we only have 30 people per group and taking only 30 values from the specified normal distributions does not really give us a good approximation of the real distribution.
__This point is important__: The sampling variability in such small groups is high and often, if small sample-studies (i.e. underpowered studies) find "effects", they are often rather big and the consequence of this sampling variability rather than real differences of groups.
For example, by looking at the means of our sampled groups `mean(group1)` = `r mean(group1)` and `mean(group2)` = `r mean(group2)` we see that the group mean of group 1 is actually closer to the mean that we specified for group 2 (i.e. 0) than to its own mean, while the mean for group 2 is far away from our intended mean.
Looking at the sds actually shows that they are quite close to what we wanted `sd(group1)` = `r sd(group1)` and `sd(group2)` = `r sd(group2)`.
The $Cohen's\ d$ that we wanted is also not presented very accurately at `(mean(group1)-mean(group2))/(sqrt((sd(group1)^2+sd(group2)^2)/2))` = `r (mean(group1)-mean(group2))/(sqrt((sd(group1)^2+sd(group2)^2)/2))`.
Again, if we would do this in Gpower, and specify a $Cohen's\ d$, we will always work with an _exact_ $Cohen's\ d$, in a simulation approach we do __not__.
So let us run a t-test to see whether there is a significant difference here.
First, we need to decide on an alpha-level again.
What will we choose?
Well, to have a good justification we have to elaborate on what the groups actually represent.
Let us say that the difference between groups is related to an intervention that can elevate depressive symptoms.
Thus, the control group (group1) did not get the intervention and scores higher on depressive symptoms while the treatment group (group2) is expected to score lower.
Let us assume that this is the first study that we run and that, if we find anything we will follow it up by more extensive studies anyway. Therefore, we might not want to miss a possible effect by setting a too conservative alpha-level.
If we find something in this study, we will conduct further studies in which we are more strict about the alpha level.
Thus, we choose .10 for this first "pilot" study.
', sep = "\n")
} else {
paste("

", "__NOTE__: The alpha-level "jusficications" in this tutorial are for educational purposes and to provide a starting point. They are obviously not as rigorous as we would like in a real research project. If you find yourself in a situation where you want to justify your alpha-level see [Justify your alpha by Lakens et al.](https://www.nature.com/articles/s41562-018-0311-x) for a good discussion on this.