Big Data
“Big Data” refers to large, often aggregated real world data sets that are beyond the ability of commonly available stand-alone personal computers and software to handle. “Big data studies” are not those studies that are undertaken as primary data collection from or about individual persons, but rather are studies that are undertaken on data that have already been (or are being, in a prospective manner) collected from or about individual persons in the course of other organized research, clinical care, and ordinary commercial and civic activities (SACHRP Recommendations – Attachment A: Human Subjects Research Implications of “Big Data” Studies).
‘Big data’ is a widely-used term without a commonly-accepted definition. The HMA/EMA Task Force on Big Data defines big data as ‘extremely large datasets which may be complex, multi-dimensional, unstructured and heterogeneous, which are accumulating rapidly and which may be analysed computationally to reveal patterns, trends, and associations. In general, big data sets require advanced or specialised methods to provide an answer within reliable constraints’ (EMA Website – Big Data)
Big Data includes real world data such as electronic health records, registry data and claims data, pooled clinical trials data, datasets from spontaneously reported suspected adverse drug reaction reports, and genomics, proteomics and metabolomics datasets. Big Data can complement clinical trials and offers major opportunities to improve the evidence upon which we take decisions on medicines. Understanding the quality and representativeness of Big Data will allow regulators to select the optimal data set to study an important question impacting the benefit-risk balance of a medicine (HMA/EMA Joint Task Force on Big Data – Final (Phase II) report – Evolving data-driven regulation, Jan 2020)
Big Data Studies
“Big data studies” are distinguished from real world data research using already-collected data (such as retrospective medical records research) by the massive quantities of data aggregated and analyzed, the wide variety of data sources, locations (e.g., local servers, network servers, the cloud) and forms (e.g., numbers, addresses, images, text, emails, video, genetic sequences) of those data, and the increased velocity of analysis, primarily through the use of algorithmic electronic programs designed to identify unique features or patterns in aggregated data (SACHRP Recommendations – Attachment A: Human Subjects Research Implications of “Big Data” Studies).
« Back to Glossary Index