{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Representational Similarity Analysis\n", "This week's tutorial is about Representational Similarity Analysis (RSA)! We'll be looking at how to transform patterns into RDMs using various distance measures, how to test the relation between \"feature RDMs\" and \"brain RDMs\", and take a look at exploratory RDM visualization using multidimensional-scaling (MDS).\n", "\n", "**What you'll learn**: At the end of this tutorial, you ...\n", "\n", "* know you to preprocess patterns for RSA\n", "* understand the concept of an RDM and how it can be computed\n", "* can list different types of candidate RDMs and their differences\n", "* are able to test the association between candidate and neural RDMs\n", "\n", "**Estimated time needed to complete**: 8-12 hours" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Some imports for the rest of the tutorial\n", "import os\n", "import numpy as np\n", "import nibabel as nib\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "from glob import glob\n", "from nilearn import image, datasets, plotting, masking\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading in the data\n", "In this notebook, we are going to work with real data straightaway! Like in the previous decoding tutorial, we'll work only with the data from a single run, though. In addition, to not use too much RAM, we'll analyze only the data from an specific ROI. In contrast to last week, we are going to use a \"functional ROI\" based on the localizer data from our \"flocBLOCKED\" task. To derive a functional ROI, we already computed (for each subject) multiple contrasts and the associated whole-brain $z$-score maps in both subject \"native\" space (*T1w*) and standard space (*MNI152NLin2009cAsym*):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "data_dir = os.path.join(os.path.expanduser('~'), 'NI-edu-data')\n", "\n", "print(\"Downloading ROIs for all subjects (+- ... MB) ...\")\n", "!aws s3 sync --no-sign-request s3://openneuro.org/ds003965 {data_dir} --exclude \"*\" --include \"derivatives/floc/*\"\n", "print(\"\\nDone!\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "floc_dir = os.path.join(data_dir, 'derivatives', 'floc', 'sub-03', 'rois')\n", "print(\"We have the following maps:\\n-\", '\\n- '.join(sorted(os.listdir(floc_dir))))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These maps have been computed using a \"condition > other conditions\". We are, of course, going to use the \"face > (place, character, body, object)\" map for our ROI and use the data in MNI space:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "floc_roi = os.path.join(floc_dir, 'sub-03_task-flocBLOCKED_space-MNI152NLin2009cAsym_desc-face_zscore.nii.gz')\n", "\n", "# Let's plot the unthresholded map as well\n", "plotting.plot_stat_map(floc_roi, cut_coords=(40, -46, -20))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see in the plot above, this subject shows a strong response to faces (relative to other conditions) in right temporal lobe, just where you'd expect to find the fusiform face area (FFA). But to derive an ROI from this map, we should somehow binarize this image. While this choice is somewhat arbitrary, let's threshold this map at $z > 3$:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "floc_roi_bin = image.math_img('(img > 3).astype(int)', img = floc_roi)\n", "plotting.plot_roi(floc_roi_bin, cut_coords=(40, -46, -20));" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The current ROI, however, contains *a lot* of voxels, also many far outside the location of where we'd expect the FFA. One \"trick\" we can use to further restrict the number of voxels is to constrain our functional ROI to a particular anatomical location. Here, we'll pick the right temporal ocipital fusiform (rTOF):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ho_atlas = datasets.fetch_atlas_harvard_oxford('cort-maxprob-thr25-2mm', symmetric_split=True)\n", "ho_map = ho_atlas['maps']\n", "rTOF_idx = ho_atlas['labels'].index('Right Temporal Occipital Fusiform Cortex')\n", "rTOF_roi = nib.Nifti1Image((ho_map.get_fdata() == rTOF_idx).astype(int), ho_map.affine)\n", "plotting.plot_roi(rTOF_roi, cut_coords=(40, -46, -20));" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the anatomical ROI has a slightly higher spatial resolution (2 mm$^2$, instead of $2.7 \\times 2.7 \\times 2.97$). As such, we need to resample the anatomical mask to our functional resolution:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rTOF_roi_resamp = image.resample_to_img(rTOF_roi, floc_roi_bin, interpolation='nearest')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can intersect the two masks using the intersect_masks from the Nilearn masking module:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Setting the threshold to 1 means: only select voxels that are in *both* masks\n", "ffa_mask = masking.intersect_masks((floc_roi_bin, rTOF_roi_resamp), threshold=1)\n", "plotting.plot_roi(ffa_mask, cut_coords=(40, -46, -20));" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alright, now let's download the patterns from sub-03, run 1 (if not downloaded already):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Download the patterns (if not done already)\n", "!aws s3 sync --no-sign-request s3://openneuro.org/ds003965 {data_dir} --exclude \"*\" --include \"derivatives/pattern_estimation/sub-03/ses-1/patterns/*\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "... and load them in:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "patterns_dir = os.path.join(data_dir, 'derivatives', 'pattern_estimation', 'sub-03', 'ses-1', 'patterns')\n", "betas_path = os.path.join(patterns_dir, 'sub-03_ses-1_task-face_run-1_space-MNI152NLin2009cAsym_desc-trial_beta.nii.gz')\n", "\n", "# Load 4D array and immediately mask it\n", "R = masking.apply_mask(betas_path, ffa_mask)\n", "print(\"Shape of R:\", R.shape)\n", "\n", "plt.imshow(R, aspect='auto')\n", "plt.xlabel(\"Voxels\")\n", "plt.ylabel(\"Samples (trials)\")\n", "plt.title(r\"$\\mathbf{R}$\", fontsize=20)\n", "plt.colorbar()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lastly, we need some experimental variable(s) to relate to this brain pattern. For convenience, we included the events file in the same directory as the estimated (variance of the) patterns:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "events_file = os.path.join(patterns_dir, 'sub-03_ses-1_task-face_run-1_events.tsv')\n", "events_df = pd.read_csv(events_file, sep='\\t')\n", "\n", "# Let's remove the rating/response events\n", "# The .query method is great for filtering!\n", "events_df = events_df.query(\"trial_type != 'rating' and trial_type != 'response'\")\n", "events_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are many different experimental features that we'd could use for our analysis, but for now, we'll stick with a single categorical (binary) one: face sex (\"male\" or \"female\"). " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "S = events_df['face_sex'].to_numpy()\n", "print(S)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " ToDo (1 point): For our analysis, we'll need the labels ($S$) in numeric format. Convert the string labels (\"male\", \"female\") to a numeric format (male: 1, female: 0) and store the result in a variable called S_num (which should be a numpy array).\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "a49a965d54f54f9b93eaf2636b9ad6c5", "grade": false, "grade_id": "cell-11d325a59d869f56", "locked": false, "schema_version": 3, "solution": true, "task": false }, "tags": [ "raises-exception", "remove-output" ] }, "outputs": [], "source": [ "''' Implement your ToDo here. '''\n", "\n", "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "1436ae14b864d3fb5a40999a60a99ce4", "grade": true, "grade_id": "cell-516938df38717e06", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false }, "tags": [ "raises-exception", "remove-output" ] }, "outputs": [], "source": [ "''' Tests the above ToDo. '''\n", "from niedu.tests.nipa.week_3 import test_str2num\n", "test_str2num(S, S_num)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preprocessing\n", "In terms of preprocessing, there are a couple of things that you need to keep in mind when planning to use representational similarity analyses. First, while standardization (ensuring zero mean and unit standard deviation for each brain feature) is a common preprocessing step in decoding analyses, it is somewhat of a controversial for RSA (see e.g. [this article](https://www.frontiersin.org/articles/10.3389/fnins.2013.00174/full)). As such, we are not going to apply standardization to our data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Multivariate noise normalization\n", "In week 1, we discussed univariate noise normalization, i.e., dividing each brain feature's activity estimate ($\\hat{\\beta}$) by the standard deviation of the noise ($\\hat{\\sigma}$), which allows you to \"downweigh\" noisy voxels. Specifically for RSA, some people use *multivariate* noise normalization, which additionally incorporates the noise *covariance* between voxels. Like the temporal \"uncorrelation\" method we discussed in week 1, multivariate noise normalization effectively uncorrelates the data, yet this time in the *spatial* dimension. One often-cited reason for multivariate noise normalization in representational similarity analyses is that some distance metrics (discussed later) assume that the brain feature (e.g., voxels) of the data are independent.\n", "\n", "So, before going on, let's first load in the residuals from run 1:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# First, let's download them!\n", "print(\"Downloading residuals for ses-1, run-1, sub-03 (+- ... MB) ...\")\n", "!aws s3 sync --no-sign-request s3://openneuro.org/ds003965 {data_dir} --exclude \"*\" --include \"derivatives/pattern_estimation/sub-03/ses-1/model/*task-face*run-1*residuals.nii.gz\"\n", "print(\"\\nDone!\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_dir = patterns_dir.replace('patterns', 'model')\n", "resids_path = os.path.join(model_dir, 'sub-03_ses-1_task-face_run-1_space-MNI152NLin2009cAsym_desc-model_residuals.nii.gz')\n", "# We immediately apply the FFA mask to the residuals\n", "resids = masking.apply_mask(resids_path, ffa_mask)\n", "print(\"Shape of masked residuals (TxK):\", resids.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In week 1, we computed the voxelwise noise standard deviation by simply called the numpy std function (or method) on our time axis:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "noise_std = np.std(resids, axis=0)\n", "\n", "plt.figure(figsize=(12, 3))\n", "plt.plot(noise_std)\n", "plt.xlabel(\"Voxel\", fontsize=15)\n", "plt.ylabel(r'$\\hat{\\sigma}$', fontsize=20)\n", "plt.xlim(0, noise_std.size)\n", "sns.despine()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, we can also get the voxelwise noise standard deviation by computing the noise variance-covariance matrix and extracting the (square root of the) diagonal, because the diagonal represents the variance of the voxels (the \"covariance with itself\", so to say).\n", "\n", "Let's first compute the covariance matrix:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We need to transpose (.T) the residuals,\n", "# otherwise we'd get a TxT covariance matrix\n", "noise_cov = np.cov(resids.T, bias=True)\n", "plt.imshow(noise_cov)\n", "plt.title(\"Noise covariance matrix\", fontsize=18)\n", "plt.xlabel('Voxels', fontsize=15)\n", "plt.ylabel('Voxels', fontsize=15)\n", "plt.colorbar()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just to convince you that the (square root of the) diagonal is the same as the standard deviation we computed earlier:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "noise_std_from_cov = np.sqrt(np.diag(noise_cov))\n", "fig, axes = plt.subplots(ncols=2, figsize=(15, 4))\n", "axes.plot(noise_std)\n", "axes.plot(noise_std_from_cov, c='tab:orange')\n", "axes.set_xlabel(\"Voxel\", fontsize=15)\n", "axes.set_xlabel(\"Voxel\", fontsize=15)\n", "axes.set_ylabel(r'$\\hat{\\sigma}$', fontsize=20)\n", "fig.tight_layout()\n", "sns.despine()\n", "fig.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The reason we need the variance-covariance matrix is that for *multivariate* noise normalization, we use the *full* matrix, i.e., including the off-diagonal elements (the covariance between voxels). To do so, we first need to compute the *whitening* matrix, which is often denoted by $D$ and is computed by taking the square root of the inverse of the estimated variance-covariance matrix ($\\hat{\\Sigma}$):\n", "\n", "\\begin{align}\n", "D = \\hat{\\Sigma}^{-\\frac{1}{2}}\n", "\\end{align}\n", "\n", "Whitening using this particular whitening matrix ($\\hat{\\Sigma}^{-\\frac{1}{2}}$) is also called [ZCA or Mahalanobis whitening](https://en.wikipedia.org/wiki/Whitening_transformation). To apply this whitening matrix to our patterns ($\\mathbf{R}$), we simply take the dot product between the patterns and the whitening matrix:\n", "\n", "\\begin{align}\n", "R_{\\mathrm{mnn}} = RD\n", "\\end{align}\n", "\n", "where $R_{\\mathrm{mnn}}$ is the multivariate noise normalized pattern matrix. Note that this operation is *very* similar to the temporal uncorrelation method, but instead of uncorrelating the trials (i.e., the rows of the pattern matrix), it uncorrelates the brain features (i.e., the columns of the pattern matrix).\n", "\n", "Let's first compute the whitening matrix. We can use the matrix square root function sqrtm from the scipy.linalg package:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# square root of inv = ^(-1/2)\n", "from scipy.linalg import sqrtm\n", "D = sqrtm(np.linalg.inv(noise_cov))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And to get the multivariate noise normalized patterns, we compute the dot product:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "R_mnn = R @ D" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's visualize the unnormalized, univariate noise normalized, and multivariate noise normalized patterns:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "R_unn = R / noise_std\n", "\n", "fig, axes = plt.subplots(ncols=3, figsize=(13, 3))\n", "axes.imshow(R)\n", "axes.set_title(\"No normalization\", fontsize=15)\n", "axes.imshow(R_unn)\n", "axes.set_title(\"UNN\", fontsize=15)\n", "axes.imshow(R_mnn)\n", "axes.set_title(\"MNN\", fontsize=15)\n", "for i in range(3):\n", " axes[i].set_xlabel(\"Voxels\", fontsize=15)\n", "axes.set_ylabel(\"Trials\", fontsize=15)\n", "fig.tight_layout()\n", "fig.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, our patterns consist of only 64 voxels. Often, however, you might want to use patterns with (many) more voxels. For multivariate noise normalization, which uses the full $K \\times K$ variance-covariance matrix, using more brain features ($K$) than samples ($N$) often lead to a very unstable variance-covariance matrix (or in technical terms, a matrix that is not \"positive semidefinite\"). One trick that makes the variance-covariance matrix estimation more stable is to apply regularization (sometimes called \"shrinkage\"). This regularization will \"shrink\" the matrix more towards the identity matrix ($I$, i.e., a matrix with all zeros except the diagonal, which contains ones) when the ratio between brain features and samples becomes larger.
\n", " \n", "One such shrinkage method is the \"Ledoit-Wolf\" covariance estimator. Below, we import this function:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.covariance import ledoit_wolf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " ToDo (1 point): Read through the documentation of the ledoit_wolf function and then use it to compute the regularized covariance matrix of our pattern matrix and subsequently use this to multivariate noise normalize our pattern matrix. Store the result in a new variable called R_mnn_reg.\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "ebd0422954133235bfdb1a67614995de", "grade": false, "grade_id": "cell-e36ec955d235d7be", "locked": false, "schema_version": 3, "solution": true, "task": false }, "tags": [ "raises-exception", "remove-output" ] }, "outputs": [], "source": [ "''' Implement your ToDo here. '''\n", "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "14fda138ce878ab1761565359a2f450d", "grade": true, "grade_id": "cell-b6a867862b399a43", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false }, "tags": [ "raises-exception", "remove-output" ] }, "outputs": [], "source": [ "''' Tests the ToDo above. '''\n", "from niedu.tests.nipa.week_3 import test_ledoit_wolf \n", "test_ledoit_wolf(R, resids, R_mnn_reg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Neural RDMs\n", "The first step in a representational similarity analysis is to create a \"neural representational dissimilarity matrix\" (RDM). This matrix is a symmetric $N \\times N$ matrix which represents how \"dissimilar\" patterns of different samples (i.e., rows in your pattern matrix, $\\mathbf{R}$). Note that the samples could be trials (e.g., \"face 1\", \"face 2\", \"face 3\", etc.) but could also be conditions (e.g., \"faces\", \"houses\", \"objects\", etc.). In our case, we're focusing on trials.\n", "\n", "For example, suppose we have only data from four trials (i.e., four rows in our pattern matrix $\\mathbf{R}$). A corresponding $4\\times 4$ RDM could like the following:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# some made up RDM\n", "example_rdm = np.array([\n", " [0, 7.2, 1.2, 5.2],\n", " [7.2, 0, 3.1, 4.7],\n", " [1.2, 3.1, 0, 6.5],\n", " [5.2, 4.7, 6.5, 0]\n", "])\n", "\n", "plt.imshow(example_rdm)\n", "plt.yticks(np.arange(4))\n", "plt.xlabel(\"Trials\", fontsize=15)\n", "plt.ylabel(\"Trials\", fontsize=15)\n", "plt.title(\"Example RDM\", fontsize=20)\n", "cbar = plt.colorbar()\n", "cbar.ax.set_ylabel('Dissimilarity', fontsize=15)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example RDM, each cell represents a particular distance between two patterns. For example, the cell in the bottom left corner represents the dissimilarity between trial 1 and trial 4. Note that the RDM is symmetric, because the dissimilarity between trial 1 and 4 is the same as trial 4 and 1; also, the cells on the diagonal are all zero, as they represent the distance between a particular pattern and itself, which is zero! Note that these two properties (symmetricity, zero diagonal) are only true when you compute your RDM on trials within the same run (unlike the between-run pattern distances, discussed previously).\n", "\n", "For now, we'll stick with within-run RDMs (as they're a little easier to compute).\n", "\n", "In a way, you can think about RDMs as \"inverse\" correlation matrices in which cells do not represent correlations (a kind of *similarity* metric) but distances.\n", "\n", "By now, you might ask youself: \"but how do you actually compute these dissimilarities?\" Well, this depends on what *distance metric* you use! There are many different functions you can use to quantify the dissimilarity between two vectors (i.e., two rows in our pattern matrix). Actually, reflecting the intuition that an RDM is basically the inverse of a correlation matrix, one metric that is sometimes used is the $1-r$ distance\\*. This distance simply quantifies distance as the 1 minus the correlation between two patterns. For example, the $1-r$ distance between \"pattern A\" and \"pattern B\" is $1-\\mathrm{corr(pattern\\ A, pattern\\ B})$.\n", "\n", "---\n", "\\* In the RSA literature, some people use the *cosine distance*, which is the angle between two vectors; when the patterns are mean centered (i.e., the rows in $\\mathbf{R}$ have a mean of 0), this is exactly the same as the $1-r$ distance!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " ToDo (1 point): For our data (the variable R), compute the RDM using the $1 -r$ distance and store it in a variable named rdm_1minr. Note: no need for a for-loop! (You might want to check out the np.corrcoef function.)\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "3cd8c64b87eab3cef0169986ab56537a", "grade": false, "grade_id": "cell-c3b557a3e217662d", "locked": false, "schema_version": 3, "solution": true, "task": false }, "tags": [ "raises-exception", "remove-output" ] }, "outputs": [], "source": [ "''' Implement your ToDo here. '''\n", "\n", "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "e26208fb216bb99d281d4df9d9b168aa", "grade": true, "grade_id": "cell-cc4e63642875bc97", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false }, "tags": [ "raises-exception", "remove-output" ] }, "outputs": [], "source": [ "''' Tests the above ToDo. '''\n", "from niedu.tests.nipa.week_3 import test_1minr \n", "test_1minr(R, rdm_1minr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another often-used distance metric used for RDMs, and perhaps the most intuitive one, is the Euclidean distance. This distance is computed as the square root of the sum of squared distances between two patterns (e.g., $p$ and $q$) consisting of $K$ elements:\n", "\n", "\\begin{align}\n", "\\delta_{euclidean} = \\sqrt{\\sum_{j=1}^{K}{(p_{j} - q_{j})^{2}}}\n", "\\end{align}\n", "\n", "Below, we define two example patterns with four features and compute the Euclidean distance between them:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "p = np.array([1, 3, 8, -4])\n", "q = np.array([5, -3, 2, 1])\n", "\n", "# No need for a for loop!\n", "euc_dist = np.sqrt(np.sum((p - q)**2))\n", "print(\"Euclidean distance between p and q:\", euc_dist)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " ToDo (2 points): While all pairwise distances were easy to calculate using the $1-r$ distance, it takes a little more code to do this using the Euclidean distance. Compute the RDM based on the Euclidean distance for our data (the variable R) and store this in a variable named rdm_euc. Also visualize the RDM using the pyplot imshow function. Unless you're a linear algebra wizard, you need to use for loops to compute the RDM. Do not use any external functions (beyond numpy). Hint: pre-allocate your RDM first and then fill it cell by cell in a nested for loop.\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "f258e6587105a6a492a105d0d5bd1dab", "grade": false, "grade_id": "cell-38a3aa078cf54f70", "locked": false, "schema_version": 3, "solution": true, "task": false }, "tags": [ "raises-exception", "remove-output" ] }, "outputs": [], "source": [ "''' Implement your ToDo here. '''\n", "\n", "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "6d4331401a7057166e2cfac08af352b1", "grade": true, "grade_id": "cell-8ba94c2fa75e9255", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false }, "tags": [ "raises-exception", "remove-output" ] }, "outputs": [], "source": [ "''' Tests the above ToDo. '''\n", "from sklearn.metrics import euclidean_distances\n", "np.testing.assert_array_almost_equal(euclidean_distances(R), rdm_euc, decimal=3)\n", "print(\"Well done!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you might have seen in the test cell, scikit-learn actually provides several functions to quickly compute distance matrices (RDMs) using various distance metrics. We recommend using the generic pairwise_distances function:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.metrics import pairwise_distances" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To compute an $N\\times N$ distance matrix from a $N\\times K$ pattern array ($\\mathbf{R}$), you can use it as follows:\n", "python\n", "rdm = pairwise_distances(R, metric='name_of_metric')\n", "\n", "\n", "For example, to compute an RDM based on the \"cosine\" distance (which is similar to the $1-r$ distance), you can run:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rdm_cosine = pairwise_distances(R, metric='cosine')\n", "\n", "plt.imshow(rdm_cosine)\n", "plt.xlabel(\"Trials\", fontsize=15)\n", "plt.ylabel(\"Trials\", fontsize=15)\n", "plt.title(\"Cosine-based RDM\", fontsize=20)\n", "cbar = plt.colorbar()\n", "cbar.ax.set_ylabel('Cosine distance', fontsize=15)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that there is no agreed-upon \"best\" distance metric! If you want to know more about the different (dis)similarity metrics for pattern analyses, check out [this article](https://link.springer.com/article/10.1007/s42113-019-00068-5).\n", "\n", "For the next couple of sections, we'll use the Euclidean distance-based RDM:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rdm_R = pairwise_distances(R, metric='euclidean')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " ToThink (1 point): Euclidean and $1-r$/cosine distances differ in one major aspect related to the \"type\" of information they can encode/pick up. What do you think this is?\n", "
" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "nbgrader": { "cell_type": "markdown", "checksum": "5a29df01772eb1575ba08e02cae1b2e7", "grade": true, "grade_id": "cell-c0afc38470e4125d", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "source": [ "YOUR ANSWER HERE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploratory analysis using MDS\n", "Most applications of RSA involve relating *neural* RDMs with RDMs based on experimental features (which we'll discuss in the next section). However, you can also do exploratory analyses on your neural RDM only! This is usually done by investigating the (dis)similarity structure (or \"representational geometry\" in RSA terms), usually in a 2D or 3D space. \n", "\n", "As pattern analyses are often applied to very high-dimensional data (i.e., patterns with many brain features, $K$), people often project the data (i.e., the patterns) into a lower dimensional space. We already encountered one such method in week 1: PCA! However, when interested in the (dis)similarity structure of your data, *multidimensional scaling* (MDS) is more appropriate. Just like PCA, this technique aims to create combinations of features into a lower-dimensional subset of components, such that the high-dimensional distances are presented as much as possible in the lower-dimensional space. For example, if the distance between A and B is 436 in high-dimensional space (e.g., $K=500$), MDS tries to create a lower-dimensional space (usually 2 components) in which the distance between A and B is as close as possible to 436 (as well as all other distances between patterns).\n", "\n", "Of course, scikit-learn contains an implementation of MDS that uses the familiar fit/transform methods. Importantly, it can take in a $N\\times K$ matrix (like our pattern matrix $\\mathbf{R}$) and compute the high-dimensional distance structure (i.e., the RDM) internally or you can give it your precomputed RDM. In the latter case, you need to initialize it with dissimilarity='prepcomputed', which is what we're going to do:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.manifold import MDS\n", "mds = MDS(dissimilarity='precomputed', n_components=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, we call the fit_transform method to compute lower-dimensional ($K=2$) representation of the data (which is, again, an $N \\times K$ array, but this time, $K=2$!):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mds_R = mds.fit_transform(rdm_R)\n", "print(\"Shape of mds_R:\", mds_R.shape)\n", "plt.figure(figsize=(8, 5))\n", "plt.grid()\n", "plt.scatter(mds_R[:, 0], mds_R[:, 1])\n", "plt.xlabel('MDS component 1', fontsize=20)\n", "plt.ylabel('MDS component 2', fontsize=20)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " ToDo (1 point): Below, re-plot the MDS scatterplot, but this time, color the datapoints (i.e., the trial patterns) according to their trial onset (you can set the color of the points with c parameter in the scatter function), which you can extract from the events_df dataframe. This way, datapoints (i.e., patterns) with a similar onset have a similar hue.\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "12bd81953e9dc46753c47eb5dbb1b46e", "grade": true, "grade_id": "cell-5e9e54173878c3e3", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false }, "tags": [ "raises-exception", "remove-output" ] }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " ToThink (1 point): Which phenomenon (that we discussed before) is clearly visible in this low-dimensional embedding of the data? \n", "
" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "nbgrader": { "cell_type": "markdown", "checksum": "6c634d6984186a43dd5410c48184b503", "grade": true, "grade_id": "cell-0cdeecd84c5b61e9", "locked": false, "points": 1, "schema_version": 3, "solution": true, "task": false } }, "source": [ "YOUR ANSWER HERE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Instead of coloring the datapoints according to some experimental property/feature (such as onset), an even more potent way to visualize MDS-embeddings is to plot the actual images that correspond with the trial patterns (here: the faces shown to the subject)! This way, you may find patterns in the data that you might not have thought of!\n", "\n", "However, in our opinion, the true strength of RSA lies in their ability to test hypotheses about complex representational structures using experimental features, which is discussed next." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Categorical RDMs\n", "In most pattern analyses, we'd like to evaluate the (possibly) association between experimental features and brain patterns. In RSA, this is done by comparing neural RDMs (discussed in the previous sections) with RDMs based on experimental features. These experimental feature RDMs (let's call them \"feature RDMs\") are constructed in largely the same way as neural RDMs: by computing the pairwise distance between samples!\n", "\n", "The experimental features ($P$) that you use for your feature RDM of course depend on your hypothesis! Importantly, unlike decoding models, RSA naturally handles high-dimensional feature spaces very well. Like neural RDMs, no matter how many features you use, you'll always analyze the resulting $N\\times N$ RDM! \n", "\n", "After constructing your feature RDM, you can test whether the \"geometry\" of your hypothesized feature space matches the geometry of your brain patterns. In other words, you test whether the pattern of distances is similar in your brain data and your experimental features. Technically, you can use *any* (set of) feature(s) that you believe match the geometry of the corresponding brain patterns. How to actually test this will be discussed in section 5.\n", "\n", "In this section, we'll focus on the most straightforward type of feature RDM: the categorical RDM. This RDM, basically, investigates whether patterns belonging to the same condition are more similar than patterns belonging to a different condition (note the similarity to decoding models). For example, we could hypothesize that the FFA represents face gender, which should accordingly lead to relatively small neural distances between images of the same face gender and relatively large neural distances between images of a different face gender. Before delving into how we should construct a corresponding feature RDM, let's first define our experimental feature: face gender. In the first section, we already extracted this from the events file. Now, let's convert it to a numeric format:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.preprocessing import LabelEncoder\n", "face_gen = LabelEncoder().fit_transform(S)\n", "print(\"Face gender, numeric:\", face_gen)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, to capture this feature into an RDM, we can construct an RDM with zeros in cells corresponding to trials with the same condition (both male or both female faces) and ones everywhere else, which capture the hypothesis that trials should have a smaller distance when they are of the same condition (0) than when they are of a different condition (1). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " ToDo (1 point): Create this face-gender RDM and store it in a variable named rdm_fg, an $N \\times N$ numpy array. Do not use any external functions for this. \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "064e0696b864162b2653a6b88b9a8097", "grade": false, "grade_id": "cell-f665a90f90a51748", "locked": false, "schema_version": 3, "solution": true, "task": false }, "tags": [ "raises-exception", "remove-output" ] }, "outputs": [], "source": [ "''' Implement your ToDo here. '''\n", "\n", "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "ede5ef16701b4274566f5b8888e84c5b", "grade": true, "grade_id": "cell-947cad7c5de5ea9b", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false }, "tags": [ "raises-exception", "remove-output" ] }, "outputs": [], "source": [ "''' Tests the above ToDo. '''\n", "from niedu.tests.nipa.week_3 import test_rdm_fg\n", "test_rdm_fg(face_gen, rdm_fg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Technically, to create this categorical RDM (with only two levels), you can also use the pairwise_distances function with the \"manhattan\" metric (the sum of *absolute* distances):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Note the np.newaxis, which is needed because pairwise_distances\n", "# assumes that the input is 2D\n", "rdm_fg_pd = pairwise_distances(face_gen[:, np.newaxis], metric='manhattan')\n", "plt.imshow(rdm_fg_pd)\n", "plt.colorbar()\n", "plt.title(\"Categorical RDM\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Continuous/computational RDMs\n", "The categorical RDMs discussed in the previous section are the most simple implementation of feature RDMs, based on only a single categorical feature. RSA really shines, though, when relating more complex feature sets (sometimes called \"feature spaces\") consisting of multiple (continuous) variables. \n", "\n", "For example, for our data, we have subject-specific ratings of dominance, trustworthiness, and attractiveness for all of the faces shown to (the same) subjects. If we'd want to investigate whether a particular brain region represents these face properties (you might call them \"[social judgements](https://www.sciencedirect.com/science/article/abs/pii/S0959438813000147)), we could create a feature RDM base on these three features; in fact, we'll do that below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sj = events_df.loc[:, ['subject_dominance', 'subject_trustworthiness', 'subject_attractiveness']].to_numpy()\n", "rdm_sj = pairwise_distances(sj, metric='euclidean')\n", "\n", "plt.imshow(rdm_sj)\n", "plt.title(\"Social judgements RDM\", fontsize=15)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note though, that all information about the individual features (attractiveness, trustworthiness, and dominance) is lost in this RDM! The feature RDM should represent the geometry of the entire feature space, not the effects of the individual features on brain activity (although we discuss a technique that allows for this type of inference in the next section). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " ToDo (2 point): Note that as long as you can formalize your hypothesis into a feature matrix ($N \\times P$), you can create an RDM from it (and test it with RSA)! Suppose that I believe that social judgements are not represented in a bipolar fashion (from very unattractive to very attractive, very untrustworthy to very trustworthy, etc.; like we assumed before) but in an unipolar fashion relative from the mean rating. For example, this would mean that a face with an attractiveness/trusworthiness/dominance rating of -4 would be represented in the same way as a face with a rating of 4 (assuming a mean rating of 0) Moreover, suppose I believe that this representation is quadratic, not linear. For example, a face with an attractiveness rating of 2 is four times as attractive as a face with an attractiveness rating of 1. Using the social judgement data (i.e., the sj variable), create an RDM that represents this hypothesis.\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "916924b6fd0ca2294e801926e83958e1", "grade": false, "grade_id": "cell-d559d825c36675e0", "locked": false, "schema_version": 3, "solution": true, "task": false }, "tags": [ "raises-exception", "remove-output" ] }, "outputs": [], "source": [ "''' Implement the ToDo here. '''\n", "\n", "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "169f734365a52c30cf1088308da79611", "grade": true, "grade_id": "cell-78c39806133dba91", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false }, "tags": [ "raises-exception", "remove-output" ] }, "outputs": [], "source": [ "''' Tests the above ToDo. '''\n", "from niedu.tests.nipa.week_3 import test_rdm_sj2\n", "test_rdm_sj2(sj, rdm_sj2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "RSA also lends itself very well for testing computational models, i.e., models that yield a (set of) feature(s) that were directly computed from the data. For example, in vision science, there are many computational models that yield a set of (visual) feature that are computed from the image directly (i.e., from the pixels). Again, as long as you can specify a set of features that embody your hypothesis, you can create an RDM from it! \n", "\n", "In the next ToDo, you're going to practice a bit to get into this \"computational mindset\" by applying a very simple (and theoretically meaningless) computational model to the face stimuli. In the current directory there is a subfolder stim, which contains all the stimuli from the first run:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "imgs = sorted(glob(os.path.join('stims', '*.jpg')))\n", "print(imgs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To load an image as a numpy array, we can use the imageio library. Note that the function returns a 3D numpy array, where the first two dimensions represent width and height, and the third dimension represents the three color channels (red, green, and blue)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import imageio\n", "example_stim = imageio.imread(imgs)\n", "print(\"Shape of image data:\", example_stim.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " ToDo (2 points): Suppose that I have a very naive theory about color representation in the brain. Specifically, suppose that I think that a particular brain area simply represents stimuli according to their average redness, greenness, and blueness. In other words, I believe that stimuli with similar average red/green/blue values have similar brain patterns. For the forty stimuli from run 1, construct an RDM that represents this hypothesis and store it in a variable named rdm_rgb (a $40\\times 40$ array). Use a Euclidean distance metric. Hint: before constructing your RDM, your feature matrix should be of shape $40 \\times 3$.\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "c6767c84d1f3755a3da70df9e6e74f11", "grade": false, "grade_id": "cell-df7ba50d7cbe4366", "locked": false, "schema_version": 3, "solution": true, "task": false }, "tags": [ "raises-exception", "remove-output" ] }, "outputs": [], "source": [ "''' Implement your ToDo here. '''\n", "\n", "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "97aa2d81cdbacea790719a684e2cf087", "grade": true, "grade_id": "cell-1e1db6ac3d3e3df1", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false }, "tags": [ "raises-exception", "remove-output" ] }, "outputs": [], "source": [ "''' Tests the above ToDo. '''\n", "from niedu.tests.nipa.week_3 import test_rdm_rgb \n", "test_rdm_rgb(imgs, rdm_rgb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just by eye, it's difficult to judge whether the neural RDM contains a similar representational geometry as the feature RDMs we created earlier. Fortunately, we have statistics!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Testing RDMs\n", "In this section, we'll discuss how to evaluate the \"fit\" of feature RDMs.\n", "\n", "### Correlation-based tests\n", "Alright, so now we got two RDMs: the feature RDM and the neural RDM. To evaluate to what extent the two RDMs share the same representational geometry, we can simply correlate them! Before doing so, we have to do one more thing: extract the lower (or upper) triangle of the RDM. This is because RDMs are symmetric: the values above and below the diagonal are exactly the same. If we used the entire (flattened) RDM, we'd \"artifically\" create twice as many datapoints (i.e. the pairwise dissimilarities) than there really are, which will inflate the significance of the correlation between the RDMs because of increased sample size. So, instead of using all $N\\cdot N$ pairwise differences from the RDM, we need to extract only the flattened $N\\times (N-1)/2$ pairwise dissimilarity values, the \"representational dissimilarity vector\" (RDV) if you will. This means that we do not include the diagonal!\n", "\n", "Fortunately, there is a function that easily extracts the lower triangle of a square distance matrix: squareform (from the scipy.spatial.distance module):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from scipy.spatial.distance import squareform\n", "\n", "# Let's extract the RDV from our neural RDM\n", "rdv_R = squareform(rdm_R.round(5))\n", "print(\"Shape rdv_R:\", rdv_R.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " Tip: Sometimes, an RDM might not be exactly symmetric due to floating point inaccuracies, which will give an error my trying to extract the lower triangle using squareform. To circumvent this, you can round the RDM values to, e.g., 5 decimals using the .round(decimals) array method.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, the shape of the rdv_R is as expected: $40 \\times (40-1) / 2 = 780$. Let's do the same for the face-gender RDM we created earlier (rdm_fg_pd):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rdv_fg = squareform(rdm_fg_pd)\n", "print(\"Shape rdv_fg:\", rdv_fg.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Importantly, the correlation between feature and neural RDMs is often evaluated using a rank-based correlation metric. For continuous feature RDMs, this is usually the Spearman correlation, but for categorical feature RDMs (such as our face-gender RDM), often the \"Kendall Tau $\\alpha$\" correlation is used, as it deals properly with tied ranks. Implementations of both correlations are available from the scipy.stats module. Here, we'll use Kendall's Tau $\\alpha$, because our face-gender RDM is categorical:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from scipy.stats import kendalltau\n", "rdm_corr, pval = kendalltau(rdv_fg, rdv_R)\n", "print(\"Correlation between RDMs (p-value): %.3f (%.3f)\" % (rdm_corr, pval))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given the slighly positive correlation between our face-gender RDM and the FFA RDM, this means that patterns related to trials with the same face gender are slightly less dissimilar than patterns related to trials with a different face gender! " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reweighting RDVs\n", "One more advanced RSA technique is \"reweighting\". This technique allows you to use multiple feature RDVs to explain your neural RDV. Essentially, you assume that the neural RDV can be approximated as a linear weighted sum of different feature RDVs. For example, for two feature RDVs ($\\mathrm{RDV}_{S_{1}}$ and $\\mathrm{RDV}_{S_{2}}$):\n", "\n", "\\begin{align}\n", "\\mathrm{RDV}_{R} = \\beta_{0} + \\mathrm{RDV}_{S_{1}}\\beta_{1} + \\mathrm{RDV}_{S_{2}}\\beta_{2} + \\epsilon\n", "\\end{align}\n", "\n", "You might recognize this formulation as a linear model (GLM) with the neural RDV as dependent variable and the feature RDVs as independent variables. Here, the parameters ($\\beta$) represent the \"reweighting\" factors. This technique is very useful to disentangle the contributions of different (possibly correlated) feature spaces. Note that, often, a variant of ordinary least squares (OLS) is used to determine the parameters: non-negative least squares (NNLS), which forces the parameters to be positive (for details about why NNLS should be used, see [this article](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003915), and more information about reweighting in general, see [this article](https://www.sciencedirect.com/science/article/pii/S0028393215301998#bib5)). \n", "\n", "After the reweighted RDV ($\\mathrm{RDV}_{S}\\hat{\\beta}$) is computed, it can again be evaluated using a (rank-based) correlation: $r(\\mathrm{RDV}_{S}\\hat{\\beta}, \\mathrm{RDV}_{R})$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " ToDo (2 points): Previously, we defined two different feature RDMs: one based on face-gender (rdm_fg) and another one based on social judgements (rdm_sj). While theoretically meaningless, perform a reweighting analysis with the two feature RDVs as independent variables (in addition to an intercept!) and the neural RDV (based on rdm_R) as dependent variable. You can use the nnls implementation from scipy.optimize to perform NNLS. Note that the nnls function returns two things — the first object is the array with parameters, and it takes two arguments: the design matrix (independent variables) and the dependent variable. Compute the Spearman correlation between the reweighted RDV and the neural RDV and store the result in a variable named corr_reweighted_analysis.\n", "
\n", " ToThink (2 points): Suppose that you have a very high-dimensional experimental feature space (e.g., $P=10,000$) and you decide, instead of using a single feature RDM, to use an individual RDM for each feature and analyze the neural RDM in with a reweighted analysis. Explain why, while this would guarantee amazing results, this is probably not a good idea and propose a practical solution. \n", "