The Collaborative Puzzle Game corpus Any enquiries should be sent to dknutsen@essex.ac.uk The Collaborative Puzzle Game corpus is based on an experiment conducted in 2016 at the Department of Psychology of the University of Essex. Interactions between 30 pairs of participants (all native English speakers) were included in the corpus. The experiment is described in detail in: Knutsen, D., Col, G., & Le Bigot, L. (submitted). An investigation of the determinants of dialogue navigation in joint activities. Within each pair, one participant was the Director and the other participant was the Matcher. The participants played a puzzle game together. Specifically, the Director was given the solution to tangram puzzles, and the Matcher was given the pieces of the puzzle. Their task was to interact in order to enable the Matcher to recreate the puzzle based on the Director’s instructions. Each pair did two sessions of 10 minutes. During each session, the participants completed as many puzzles as possible. Two variables were manipulated in this experiment. The first variable was Visibility. In one condition (“With visibility”), the participants could see each other’s face as they performed the task; however, the Director could not see the Matcher’s progress, and the Matcher could not see the Director’s solutions to the puzzles, as they were separated by a small partition. In the other condition (“Without visibility”), the participants could not see each other’s face, nor could they monitor each other’s progress in the task at hand. The second variable was Mental Load. In one condition (“With time pressure”), the participants experienced a high level of mental load (manipulated through time pressure) as they performed the task. In the other condition (“Without time pressure”), the participants were not under mental load as they performed the task. A total of eight pictures were randomly used in the experiment; these were labelled A, B, C, D, E, F and G. Visibility was a between-dyad variable: each pair performed the entire experiment either in the “With visibility” condition or in the “Without visibility” condition. Mental Load was a within-dyad variable: each pair performed one of the sessions in the “With time pressure” condition and the other session in the “Without time pressure” condition. The data have been coded for project markers (Bangerter & Clark, 2003; Bangerter, Clark, & Katz, 2004). Specifically, yeah, ok, uhuh, right, alright and got it were initially coded. Furthermore, in some cases, these markers were produced using a rising intonation. The study was approved by the Ethics Committee of the Department of Psychology of the University of Essex. All participants signed an informed consent form at the beginning of the study and were fully debriefed after the end of the study. In cases where one of the participants mentioned their partner’s name during the study, the name was not transcribed in the corpus. Definition of column headings in the Excel spreadsheet: Column A: Row number. Column B: Dyad number. Column C: Visibility variable. Column D: Mental load variable. Column E: Picture label. Column F: Number of the picture being described (e.g., F01 = first figure discussed by the participants during the experimental session). Column G: Identity of the participant producing the speech turn. Column H: Session number. Column I: Content of the speech turn (Director turns only). Column J: Content of the speech turn (Matcher turns only). Column K: Number of words produced in the speech turn. Column L: Number of horizontal project markers produced in each turn. Column M: Number of vertical project markers produced in each turn. Overlapping speech is transcribed in brackets []. INC means that the participants’ speech temporarily became incomprehensible and was thus not transcribed fully.