{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Group-level analyses\n",
"This week's lab is about group-level models, multiple comparison correction (MCC), and region-of-interest (ROI) analysis. In this notebook, we will focus on group-level models, which we'll demonstrate and explain using both FSL and Python. \n",
"\n",
"We'll focus on the \"summary statistics\" approach again, in which we'll demonstrate how we average $c\\beta$-terms across runs (in run-level analyses) and subjects (in grouplevel analyses) using the GLM. Then, we're going to show you how to test more extensive hypotheses in grouplevel models. \n",
"\n",
"**What you'll learn**: after this lab, you'll ...\n",
"\n",
"- understand the concept of the summary statistics approach\n",
"- be able to construct different grouplevel models (in FSL)\n",
"\n",
"**Estimated time needed to complete**: 2 hours
"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Some imports\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What are group-level models?\n",
"Last week, we discussed the multilevel models and the summary statistics approach. Specifically, we focused on how data from different runs are usually (in non-fMRI contexts) are analyzed in a single multilevel GLM by \"concatenating\" the data. And how, in fMRI, we usually don't take this approach due to the computational burden and use the summary statistics approach, which analyzes each run separately and subsequently aggregates the data in a second, run-level GLM. \n",
"\n",
"In this notebook, we will extend this idea of analyzing data from multiple levels by looking at data from multiple subjects and how to analyze this data in group-level models. We will look at two \"flavors\" of group-level analyses: parametric and non-parametric.\n",
"\n",
"![](https://docs.google.com/drawings/d/e/2PACX-1vQxCH3WU3nTqFlHUZb49rf9zioivGQ-flVfRpwmXQx7OF5Wm_1T6gFMYQqpqt-NPITNHUaRoVYEREgT/pub?w=965&h=745)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Parametric analyses\n",
"The most often used \"flavor\" of fMRI (group-level) analyses are *parametric*: it assumes that the data can be modeled using specific probability distributions. For example, we assume that the results of statistical tests of parameters (i.e., $t$-values), given that null hypothesis is true, are distributed according to the Students $t$-distribution (with a particular degrees-of-freedom):\n",
"\n",
"\\begin{align}\n",
"t_{c\\hat{\\beta}} \\sim \\mathcal{T}(\\mathrm{df})\n",
"\\end{align}\n",
"\n",
"where you can read the $\\sim$ symbol as \"is distributed as\". Importantly, the validity of the computed $p$-values depends on whether the choice of distribution is appropriate. If not, you might risk inflated type 1 or type 2 errors.\n",
"\n",
"The first-level and run-level GLMs that we have discussed so far are examples of parametric analyses. There are also *non-parametric* versions of the GLM that do not assume any particular form of distribution; while somewhat more computationally expensive, this is become a more and more popular alternative to (group-level) parametric analyses. Importantly, the difference between parametric and non-parametric analyses is only important for the *inference* (not the *estimation*) aspect of the (group-level) analyses."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, let's focus on the parametric version of group-level analyses. Basically, this amounts to doing the same thing as we did last week with the run-level analyses, but this time, the results from our run-level analyses ($c\\hat{\\beta}^{*}$) across different subjects will become our target ($y^{\\dagger}$). Note that we will use the \"dagger\" ($^{\\dagger}$) superscript to denote that the mathematical terms belong to the group-level model (just like the $^{*}$ superscript in our notebooks refers to terms belonging to the run-level models).\n",
"\n",
"To reiterate, the results from our run-level analyses ($c\\hat{\\beta}^{*}$), or first-level analyses if we only have a single run, become our dependent variable in our group-level analysis ($y^{\\dagger}$):\n",
"\n",
"\\begin{align}\n",
"y^{\\dagger} = c\\hat{\\beta}^{*}\n",
"\\end{align}\n",
"\n",
"Again, the group-level represents a GLM with a particular design matrix ($\\mathbf{X}^{\\dagger}$) and parameters ($\\beta^{\\dagger}$):\n",
"\n",
"\\begin{align}\n",
"y^{\\dagger} = \\mathbf{X}^{\\dagger}\\beta^{\\dagger} + \\epsilon^{\\dagger}\n",
"\\end{align}\n",
"\n",
"And the group-level parameters can be estimated with OLS:\n",
"\n",
"\\begin{align}\n",
"\\hat{\\beta}^{\\dagger} = (\\mathbf{X}^{\\dagger\\ T} \\mathbf{X}^{\\dagger})^{-1}\\mathbf{X}^{\\dagger}y^{\\dagger}\n",
"\\end{align}\n",
"\n",
"As mentioned last week, the parameter estimation procedure (i.e., estimating $\\hat{\\beta}^{\\dagger}$) is relatively straightforward, but the inference procedure depends on the specific variance approach: fixed, random, or mixed effects. If your aim is to perform inference to the population, *you should never use a fixed-effects type of analysis across subjects*, as this will typically inflate your type 1 error greatly.\n",
"\n",
"That leaves us with random effects and mixed effects GLMs. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Random effects\n",
"Let's first focus on a random effects type of analysis (which is commonly used in group-level analysis in the SPM software package, for example), as this is relatively easy to understand. For now, let's use simulated data. Suppose that we want to do a group-analysis of our run-level `flocBLOCKED` data. Specifically, we are interested in the difference between faces and places ($H_{0}: \\beta_{face}^{*} = \\beta_{house}^{*} = 0$, $H_{a}: \\beta_{face}^{*} > \\beta_{house}^{*}$). As such, we'll use the corresponding $c\\hat{\\beta}^{*}$ terms from our run-level analysis (i.e., the contrasts against baseline, COPE1 and COPE2) as our new target $y^{\\dagger}$ as follows:\n",
"\n",
"\\begin{align}\n",
"y^{\\dagger} = \\begin{bmatrix}\n",
"c\\hat{\\beta}^{*}_{1, F>0} \\\\\n",
"c\\hat{\\beta}^{*}_{2, F>0} \\\\\n",
"\\vdots \\\\\n",
"c\\hat{\\beta}^{*}_{N, F>0} \\\\\n",
"c\\hat{\\beta}^{*}_{1, P>0} \\\\\n",
"c\\hat{\\beta}^{*}_{2, P>0} \\\\\n",
"\\vdots \\\\\n",
"c\\hat{\\beta}^{*}_{N, P>0}\n",
"\\end{bmatrix}\n",
"\\end{align}\n",
"\n",
"For our simulation, we'll assume that we have 20 subjects ($N=20$)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"