GENERAL INFORMATION Title of dataset: Dairy herd bunching metrics and ambient temperatures. Authors involved with sample collection, processing and/or analysis: Kareemah Chopra, Holly R. Hodges, Zoe E. Barker, Jorge A. Vazquez Diosdado, Jonathan R. Amory, Tom C. Cameron, Darren P. Croft, Nick J. Bell, Andy Thurman, David Bartlett, Edward A. Codling Contact Author: Kareemah Chopra ORCID ID: 0009-0003-1427-3384 Institution: University of Essex Address: Wivenhoe Park, Colchester, Essex CO4 3SQ, UK Email: km19088@essex.ac.uk Data of data collection: 2014-08-01 to 2014-11-22. Geographic location of data collection: Southeast England, UK. Funding: This research was funded by the UK Biotechnology and Biological Sciences Research Council thorough grants BB/K002562/1 (EC and JVD), BB/K002376/1 (JA and ZB), BB/K001302/1 (DC), and BB/K003070/1 (NB). HH was supported by the Colin Spedding Memorial Research Studentship awarded by The Farm Animal Welfare Trust. KC was supported by an interdisciplinary faculty studentship from the University of Essex. SHARING/ACCESS INFORMATION Title of publication that uses the data: Bunching behaviour in housed dairy cows at higher ambient temperatures. METHODOLOGICAL INFORMATION Positioning data: A real-time local positioning system was used to track the spatial position and activity of a commercial dairy herd (c100 cows) in a freestall barn continuously at high temporal resolution. Bunching was determined using four different spatial measures determined on an hourly basis: herd full and core range size, mean herd inter-cow distance (ICD), and mean herd nearest neighbor distance (NND). Ambient temperature: Wall-mounted temperature sensors automatically recorded the ambient barn temperature (BT) continuously throughout the study period. A mean hourly measurement was calculated across all sensors (n = 26 sensors, where n = 22 recorded every eight seconds and n = four recorded every hour). Data selection: Extended interruption occurred on 15 of the study days (04/08/2014, 07/09/2014, 13/09/2014, 16/09/2014, 17/09/2014, 22/09/2014, 24/09/2014, 09/10/2014, 27/10/2014, 31/10/2014, 06/11/2014 and 22/11/2014 to 30/11/2014) due to the system malfunctioning and resetting part-way through the day; these days were excluded from the subsequent analysis. The number of cows in the barn on a given day across the study period (not including removed days) varied from n = 86 to 111, and a total of 88,576,716 location data points were collected from these cows. Data pre-processing: The hours when most cows were in the milking parlor or collecting yard (05:00-07:59, 12:00-14:59, 20:00-22:59), since their behaviour was constrained by farm staff during these periods. The sensor system reset at midnight each day for data upload so times between 23:00 and 00:59 were excluded. A total of 51,630,512 data points remained after removing milking and system reset periods. Subsequent data pre-processing consisted of: 1) Removing location data outside of a 3m buffer of the functional zones due to minor positional inaccuracies (8,201,449 data points removed), 2) Removing nonsensical positional data (e.g., sensors were stuck in exactly the same or a similar, point location for multiple consecutive time points; 975,291 data points removed), 3) Smoothing using a simple moving average with a two-sided window size of 15 data points (166,555 data points removed). 4) Plots of trajectories were visualised manually and any further nonsensical data was removed (1,119,142 data points removed points removed). A total of 41,167,975 data points remained for the analysis. Bunching metrics were derived from the raw positioning data. Range size was calculated by overlaying a virtual grid (1.5m x 1.5m = 2.25m2 cells) over the barn map. At each timestep, location data were used to assign each individual cow within the herd to a given cell. The number of individual data points assigned to each cell across the full herd and over a full hour (360-time steps at 0.1Hz) were counted giving a final hourly total for each virtual cell within the barn. The highest density cells culmatively adding to 50% or 95% were used to create an hourly utility distribution corresponding to the core range (CR) and full range (FR), respectively. The number of unique cells included in each hourly core and full range distribution were then defined as the range size. Inter-cow distance (ICD) was calculated by finding the mean inter-cow distance at each time step for each of the possible dyad pairs across the herd. An hourly value was calculated as the arithmetic mean of all (360) values recorded over that hour. Nearest neighbor distance (NND) was calculated as the distance between a given individual and its closest neighbor in the herd. This corresponded to the smallest inter-cow distance when considering all dyad pairs involving that individual. An hourly value was calculated across the full herd. DATA-SPECIFIC INFORMATION Number of variables: eight. Number of rows: 1281 (including column names). Variable List (note that all values are hourly averages): Column 1. Month- values include 8 (August 2014), 9 (September 2014), 10 (October 2014) and 11 (November 2014). Column 2. Day- values range from 1 (1st of a month) to 31st (31st of a month). Column 3. Hour- values are 1 to 4 (01:00 to 04:00), 8 to 11 (08:00 to 11:00), 16 to 19 (16:00 to 19:00). Column 4. Core range (50%; CR) Column 5. Full range (95%; FR). Column 6. Inter-cow distance (ICD; m). Column 7. Nearest neighbour distance (NND; m). Column 8. Barn temperature (BT; °C)- the ambient temperature recorded within the barn.