8 Discussion

Psychology and cognitive neuroscience are relatively young scientific fields. Using the empirical cycle, these fields have developed and tested many different theories about the human mind and brain over the years, almost exclusively by means of hypothesis testing. In the past twenty years or so, we have seen a (renewed) interest in a more computational, predictive approach that is used in addition and complementary to the traditional hypothesis testing approach (Yarkoni & Westfall, 2017). In this thesis, inspired by techniques and models from machine learning, I explored how this predictive approach can be applied and adapted to behavioral research (chapter 6 and 7) and neuroimaging research (chapter 2 and 3). This thesis also addresses the importance of large, publicly accessible datasets, an example of which is described in chapter 4. Finally, highlighting the fact that arguably the most effective scientific methodology embraces both predictive modelling and hypothesis testing, chapter 5 outlined an example of a fully preregistered, confirmatory neuroimaging study. In the general introduction to this thesis, I described how predictive modelling can complement the hypothesis testing approach in psychology and cognitive neuroscience; in what follows, I will outline what I think is necessary to facilitate the adoption of this approach.

An important step in the adoption of a “predictive approach” is changing the way we derive research questions and hypotheses. The hypothetico-deductive tradition in psychology and cognitive neuroscience has taught us that research questions should be based on theories that, ideally, can be answered using statistical tests of binary hypotheses (Kellen, 2019). The consequence of this approach is that research questions only capture a very specific part of a target system. Instead, I believe we should steer our research questions towards the mechanisms behind particular cognitive capacities and behaviors (Rooij & Baggio, 2021). For example, instead of investigating whether certain categorical emotions are universally recognized, one could try to construct a model of emotion recognition and show if and how culture affects this model (Jack et al., 2009). As chapter 2 in this thesis shows, however, studies may feature both hypothesis tests as well as (elements of) predictive models. I think that if one aims to adopt a predictive approach, a good rule of thumb is to ask oneself whether the research question can be answered with “yes” or “no”; if that is the case, I would recommend to try to rephrase the question such that the research question revolves around the word “how” and cannot be answered using a simple “yes” or “no” answer.

Adopting a more predictive approach also means that we should perhaps not let theory guide our research as much as in the hypothesis testing approach. Of course, theories may inspire elements of predictive models (e.g., constructionism tells us that we should not limit our models of emotion to perceptual inputs only; Barrett et al., 2019), but they should not determine all aspects of the study design, experiment, and statistical model (Jack et al., 2017). Instead, the data used for predictive modelling, whether that is observational or experimental data, should ideally allow for exploration and comparison of different models (Gelfert, 2016). This way, the “information gain” from a single experiment or study can be much higher than when conducting a single hypothesis test.

The exploratory mindset of the predictive modelling approach calls for a different way to think about the data that we use for our models. In hypothesis testing, we usually want to limit our inferences to a single factor, which we explicitly manipulate in our experiments. For predictive modelling, on the other hand, the data should ideally vary in all the dimensions that are relevant for the capacity or behavior that is investigated. These “rich” datasets do not only allow for more exploration and a better generalizability, but are also less constrained (or “biased”) a priori by existing theories (Jack et al., 2017). Chapter 7 provides an illustrative example of an experimental approach to create datasets that are relatively rich and unbiased by theory. In this study, we used a “social psychophysics” approach (Jack & Schyns, 2017) to create facial expression stimuli that randomly varied in both facial movements (sampled uniformly from a large set of “action units”) as well as facial morphology (sampled from a large database of individuals). This allowed us to explore different models based on dynamic features (i.e., facial movement) and compare these to models based on static features (i.e., facial morphology).

The development of such rich datasets, however, may pose practical problems. One prominent issue is that, with each additional dimension that is considered, the space of the data grows exponentially, a phenomenon known as the “curse of dimensionality”. With increasingly higher dimensional spaces, randomly sampling data quickly becomes practically infeasible. One solution for this issue is to constrain the (co)variance of the data using prior information, like we did in chapter 7 by restricting facial movements to those that have been consistently shown to be important in affective face perception. For observational data, constraints on (random) sampling can also be achieved by the use of “naturalistic” or “ecologically valid” data, which has been increasingly popular in cognitive neuroscience (Nastase et al., 2020).

Rich and naturalistic datasets by themselves are, however, not enough. Like machine learning researchers have done in computer vision with datasets such as ImageNet, we should strive to collaboratively create large, rich, and importantly publicly available datasets that can be used as benchmarks for the development and evaluation of predictive models. An important prerequisite for this endeavor is that research communities can agree on which particular cognitive capacity or behavior should be targeted, how to operationalize this, and which stimulus or task dimensions should be sampled (Adjerid & Kelley, 2018). Although focused on hypothesis testing instead of predictive modelling, initiatives such as the Psychological Science Accelerator (Moshontz et al., 2018) and the different ManyLabs projects (Ebersole et al., 2016; Klein et al., 2018) have shown that such large-scale efforts are possible. Moreover, I think that competitions and challenges centered around these benchmark datasets can lead to rapid progress in explanation and understanding of specific cognitive capacities and behaviors, like ImageNet has done for object recognition.

The development of rich datasets afford the use of more complex predictive models, which in my opinion are necessary to capture the complex, high-dimensional nature of cognitive capacities and behavior (Jolly & Chang, 2019). Complexity, here, can mean two things. One interpretation of model complexity refers to the high dimensionality of models, i.e., models that work with data with many predictors. These high-dimensional inputs may either be directly measured or computationally derived from the data, which are subsequently related to the target variable using a (linear) model. An example of the former strategy in the context of neuroimaging is a “decoding model”, which aims to relate high-dimensional patterns of brain data to experimental features (see Chapter 2 and 3). In my opinion, the latter strategy that uses a computational model to explicitly derive model features is more promising. In cognitive neuroscience, such computational models are known as “linearizing models” (Naselaris et al., 2011), because the model relates features resulting from a potentially non-linear computational model to the target variable using a linear model (i.e., the approach linearizes the mapping from input to the target variable). We used this approach in chapter 7, in which we used a computational model from computer vision (a 3D morphable model; Yu et al., 2012) to generate facial shape features which were subsequently used to predict categorical emotion, valence, and arousal ratings using a logistic regression model.

Another, and more common, interpretation of model complexity is related to the number of model parameters. In the past, models usually limited the number of parameters to prevent overfitting, but the increase in available data and compute power in the 21st century enabled training models with an increasingly larger number of parameters. The most popular class of models resulting from these developments are known as “deep learning” models, a type of artificial neural network that maps inputs to outputs using a series of non-linear transformations (LeCun et al., 2015). Deep learning models represent the state-of-the-art in almost all domains of artificial intelligence, including object recognition, reinforcement learning, and natural language processing (Maas et al., 2021). In cognitive neuroscience, too, deep learning models tend to outperform traditional computational vision models in predicting neural activity in response to visual stimuli (Khaligh-Razavi & Kriegeskorte, 2014; Kriegeskorte, 2015).

As discussed in the general introduction to this thesis, using complex predictive models trained on observational data may feel like trading in one black box (the brain/mind) for another (a model), which does not yield an improved understanding of the investigated target system. This trade-off between prediction and explanation is to some extent unavoidable but, I would argue, a trade-off worth making. I would rather have a black box with 90% accuracy than a directly interpretable model with 10% accuracy. The reason for this preference stems from the fact that a highly predictive model can be used as a “surrogate” (or model organism; Scholte, 2018) of the target system, which can be inspected, manipulated, and experimented with in order to explain and gain understanding of it. With decades of experience with experimentation and deriving causal insights from empirical research, I think that psychologists and cognitive neuroscientists are superbly equipped for this role.

The development of a research climate that combines the strengths of both the predictive and the hypothesis testing approach is not something that will happen overnight. The predictive approach represents more than a choice of model. It requires a different type of experimental cycle, which revolves less on theory and hypotheses and more on exploration and post-hoc explanation. If we choose to focus more on prediction, we have to start asking different questions (about mechanisms, not effects); we have to embrace the complexity of human cognition and behavior and build datasets and models that reflect this complexity; and we have to sacrifice interpretability for accuracy. Like any break with tradition, this may feel uncomfortable at first, but in doing so, I believe that a promising future lies ahead.