Dataset

A data set (or dataset) is a collection of data.

Most commonly a data set corresponds to the contents of a single database table, or a single statistical data matrix, where every column of the table represents a particular variable, and each row corresponds to a given member of the data set in question. The data set lists values for each of the variables, such as the height and weight of an object, for each member of the data set. Each value is known as a datum. The data set may comprise data for one or more members, corresponding to the number of rows.

The term data set may also be used more loosely, to refer to the data in a collection of closely related tables, corresponding to a particular experiment or event. An example of this type is the data sets collected by space agencies performing experiments with instruments aboard space probes.


Research in severe traumatic brain injury (TBI) has historically been limited by studies with relatively small sample sizes that result in low power to detect small, yet clinically meaningful outcomes. Data sharing and integration from existing sources hold promise to yield larger more robust sample sizes that improve the potential signal and generalizability of important research question. However, curation and harmonization of data of different types and of disparate provenance is challenging. We report our approach and experience integrating multiple TBI datasets containing collected physiological data, including both expected and unexpected challenges encountered in the integration process. The harmonized dataset included data on 1,536 patients from the COBRIT, EPO Severe TBI, BEST-TRIP, ProTECT III, TRACK-TBI, BOOST-2, and BTGH-Database studies. They conclude with process recommendations for data acquisition for future prospective studies to aid integration of these data with existing studies. These recommendations include using common data elements whenever possible, a standardized recording system for labeling and timing of high-frequency physiological data, and for secondary use of studies in systems like FITBIR, to engage investigators who collected the original data 1).


1)
Yaseen A, Robertson CS, Cruz Navarro J, Chen J, Heckler B, DeSantis S, Temkin N, Barber J, Foreman B, Diaz-Arrastia RR, Chesnut RM, Manley GT, Wright D, Vassar M, Ferguson AR, Markowitz AJ, Yamal JM. Integrating, Harmonizing, and Curating Studies with High-Frequency and Hourly Physiological Data: Proof of Concept from Seven Traumatic Brain Injury Datasets. J Neurotrauma. 2023 Jun 21. doi: 10.1089/neu.2023.0023. Epub ahead of print. PMID: 37341031.
  • dataset.txt
  • Last modified: 2024/10/27 13:58
  • by 127.0.0.1