The Narrative Brain Dataset: an fMRI dataset for the study of natural language processing in the brain

Alessandro Lopopolo, Stefan L. Frank, Antal van den Bosch, Annabel Nijhof, Roel M. Willems


We present the Narrative Brain Dataset, an fMRI dataset that was collected during spoken presentation of short excerpts of three stories in Dutch. Written versions of the stimuli are annotated with part of speech tags (PoS). In addition, the texts are accompanied with stochastic (perplexity and entropy) and semantic computational linguistic measures. The richness and unconstrained nature of the data allows the study of language processing in the brain, in a more naturalistic setting than is common for fMRI studies. We hope that by making NBD available we serve the double purpose of providing useful neural data to researchers interested in natural language processing in the brain and to further stimulate data sharing in the field of neuroscience of language.

Keywords: fMRI, neuro-linguistics, naturalistic stimuli, narrative, perplexity, surprisal, PoS