Big data

Massive amounts of data are generated on a daily basis that could potentially be harnessed to support medicines regulation. The European Medicines Agency (EMA) and Heads of Medicines Agencies (HMA) set up a joint task force to describe the big data landscape from a regulatory perspective and identify practical steps for the European medicines regulatory network to make best use of big data in support of innovation and public health in the European Union (EU). 

'Big data' is a widely-used term without a commonly-accepted definition. The HMA/EMA Big Data Task Force defined big data as ‘extremely large datasets which may be complex, multi-dimensional, unstructured and heterogeneous, which are accumulating rapidly and which may be analysed computationally to reveal patterns, trends, and associations. In general, big data sets require advanced or specialised methods to provide an answer within reliable constraints’.

A single dataset may not strictly meet the definition of big data but, when pooled or linked with other datasets, they become sufficiently large or complex to analyse to assume the characteristics of big data. Sources include real-world data (such as electronic health records, insurance claims data and data from patient registries), genomics, clinical trials, spontaneous adverse drug reaction reports, social media and wearable devices.

Medicines regulators will increasingly use insights derived from big data to assess the benefit-risk of medicines across their lifecycle.

HMA/EMA Big Data Steering Group

The joint HMA/EMA Big Data Steering Group advises the EMA Management Board and HMA on prioritisation and planning of actions to implement the PDF icon ten priority recommendations in the PDF icon Big Data Task Force final report (phase two)

The Steering Group began its work in May 2020. It is co-chaired by Jesper Kjær, Director of Data Analytics Centre at the Danish Medicines Agency and Peter Arlett, Head of Data Analytics and Methods at EMA.

The Steering Group reviews the workplan annually to cover any new emerging topics. It last updated the workplan in July 2022.

The workplan aims to increase the utility of big data in regulation, from data quality through study methods to assessment and decision-making. It is patient-focused and guided by advances in science and technology.

Big data workplan 2022-2025

Implementation of the workplan will be flexible and certain actions may be re-scheduled, since the European medicines regulatory network has to prioritise the unprecedented public health challenge of the Coronavirus disease (COVID-19) pandemic.

Metadata list describing real world data

A list of metadata describing real-world data sources and studies is available below to help pharmaceutical companies and researchers to identify and use such data when investigating the use, safety and effectiveness of medicines.

Real-world data are observational data stored in repositories such electronic health records and disease registries. Making use of these data sources can improve the evidence available to support benefit-risk decisions and facilitate getting better medicines to patients.

This metadata list will feed into two future EU catalogues on real-world data sources and studies:

The catalogues have the following aims:

  • Help regulators, researchers and pharmaceutical companies identify studies and data sources suitable to address research questions, based on the so-called ‘FAIR’ (findable, accessible, interoperable and reusable) data principles
  • Boost transparency of observational studies
  • Improve the ability of the aforementioned stakeholders to assess evidence from observational studies and real-world data sources

The HMA/EMA Big Data Steering Group adopted the metadata list in June 2022.

Improving the discoverability of data sources via these EU catalogues is a priority for the HMA-EMA joint Big Data Task Force. It is also in line with the European medicines agencies network strategy to 2025.

For more information: 

good practice guide for the use of real-world metadata was available for public consultation until 16 November 2022.

    This draft guide aims to help regulators, data holders, researchers, pharmaceutical companies and other interested stakeholders to use the catalogue of data sources that will replace the currently available ENCePP catalogue.

    For instance, it provides recommendations on how to identify suitable real-world data sources for studies, and describes the required metadata elements.

    For more information: 

    Data quality framework for EU medicine regulation

    A draft data quality framework for medicine regulation was available for public consultation until 18 November 2022. 

    This guidance document sets out the criteria for a more consistent and standardised approach to the quality of data used in medicine regulation to support benefit-risk decisions

    It is meant to: 

    • help identify, define and further develop data quality assessment procedures and recommendations for current and novel data types;
    • support pharmaceutical companies and other stakeholders in selecting data sources for their studies;
    • ensure the trust of patients and healthcare professionals in data-driven regulatory decision-making.

    The data quality framework was co-produced by EMA, the Heads of Medicines Agency (HMA) and the Joint Action Towards the European Health Data Space (TEHDAS)

    Establishing this framework is a key element in the HMA-EMA Big Data Steering Group workplan 2022-2025.

    A list of metadata describing real-world data has also been made available for public consultation. For more information, see Metadata list describing real world data.

    Data standardisation strategy

    The European medicines regulatory network's data standardisation strategy sets out principles to guide the definition, adoption and implementation of international data standards by the network.

    It aims to:

    • enable quicker uptake of international data standards across the EU;
    • improve data quality;
    • enable data linkage and data analysis to support medicine regulation.

    The strategy is a key deliverable of the Big Data Steering Group workplan.

    EMA and HMA published the strategy in December 2021 and will maintain it over time to reflect any changing priorities or new requirements.

    Research projects

    EMA has contracted several institutions to conduct research projects collecting and analysing real-world data from clinical practice to help monitor the safety and effectiveness of medicines.

    For research projects related to COVID-19, see Treatments and vaccines for COVID-19: post-authorisation

    Darwin EU

    EMA is establishing a coordination centre to provide timely and reliable evidence on the use, safety and effectiveness of medicines for human use, including vaccines, from real world healthcare databases across the EU. 

    This capability is called the Data Analysis and Real World Interrogation Network (DARWIN EU®). For more information, see:

    Pilot on using raw data in medicine evaluation

    Through a proof-of-concept pilot, selected applicants can submit 'raw data' to EMA as part of their initial and post-authorisation marketing authorisation applications. 

    Raw data refers to individual patient data from clinical trials. These include:

    • clinical laboratory results;
    • imaging data;
    • patient medical charts.

    Currently, applicants are submitting data in an aggregated format as clinical summaries or as individual patient data in PDF listings. This can hinder data analysis and slow down the evaluation process.

    In contrast, raw data are stored in electronic structured format. This enables regulators to more easily visualise and analyse the data if needed.

    The pilot aims to assess whether using raw data can help speed up and improve the medicine-evaluation process. The goal of this is to allow patients faster and better informed access to innovative medicines.

    EMA launched the pilot in July 2022.

    It will run for up to two years and include approximately ten regulatory procedures submitted to EMA from September 2022.

    For any queries and to apply to take part in the pilot, write to

    The pilot is a key activity under the priority recommendations of the HMA/EMA Big Data Task Force. It refers to the priority of building network capability to analyse data. 


    Further information is available to support pharmaceutical companies with their participation in EMA's raw data pilot. 

    The documents include:

    • a questions and answers document on the raw data pilot;
    • a participation letter to confirm pilot participation for a specific regulatory procedure;
    • a cover letter for pilot participants to attach to their data packages.

    For information on data protection in the raw data proof-of-concept pilot, see:

    Progress updates

    EMA’s newsletter, published every three months, provides an update on progress in implementing the workplan of the HMA-EMA Big Data Steering Group.

    Work of the former HMA/EMA Big Data Task Force

    The HMA/EMA Big Data Task Force operated from 2017 until December 2019 to report on the challenges and opportunities posed by big data in medicines regulation. It carried out its work in two phases. 

    In phase one, the task force:

    • reviewed the landscape of big data from a regulatory perspective and identified opportunities for improvements in the operation of medicines regulation;
    • performed online surveys of national regulatory agencies and the pharmaceutical industry on perspectives, expertise and challenges. This helped develop an understanding of the challenges and the current state of expertise in the regulatory network.

    In phase two, the task force made practical recommendations to inform strategic decision-making and planning by the HMA and EMA and to contribute to the European medicines regulatory network's work on developing a five-year EU Network Strategy to 2025.

    The task force was composed of experienced medicines regulators and data experts appointed by the national competent authorities, EMA and the European Commission (EC). For more information, see HMA/EMA Big Data Task Force

    Meetings and workshops

    Veterinary big data

    EMA and HMA established the veterinary big data initiative to explore the use of new digital technologies in key veterinary regulatory activities.

    It takes account of the increasing amount of data generated via new digital systems put in place to implement the Veterinary Medicinal Products Regulation.

    A European veterinary big data strategy sets out how the European medicines regulatory network intends to implement this initiative:

    For more information:

    Data protection

    EMA is preparing dedicated guidance on the impact of EU data protection legislation on the secondary use of health data in support of the development, evaluation and supervision of medicines.

    The aim is to help medicine developers, data providers and research bodies comply with EU data protection rules, and to help patients and consumers understand their rights and the existing safeguards to protect personal data.

    Secondary use of data refers to the use of data for a different purpose than the one for which it was originally collected. It typically involves the use of electronic health records, health insurance claims data, registry data or drug consumption data for medicines research and public health purposes.

    The guidance will cover various operational scenarios, including the development of medicines, the evaluation of marketing authorisation applications and post-authorisation safety monitoring.

    By July 2020 EMA had gathered input from patients and consumers as data contributors as well as from medicines developers, research-performing and research-supporting infrastructures and other data providers (e.g. payers of healthcare).

    In September 2020, stakeholders discussed with EMA the key questions concerning the application of the General Data Protection Regulation (GDPR) in the health sector and the secondary use of health data for medicines and public health purposes:

    EMA aims to finalise the guidance in consultation with the European Commission and the European Data Protection Supervisor (EDPS) in the last quarter of 2021. It will take into account stakeholder input and guidance from the EDPS on the processing of health data for research.

    Ensuring that personal data are managed and analysed within a secure and ethical governance framework in compliance with EU data protection legislation is one of the recommended priorities of the HMA/EMA Big Data Task Force.

    EU data protection legislation includes:

    • Regulation (EU) 2016/679, known as the General Data Protection Regulation (GDPR), which applies to private and public entities in the Member States;
    • Regulation (EU) 2018/1725, known as the EU Data Protection Regulation (EUDPR), which applies to all EU institutions and bodies.

    International collaboration on real-world evidence

    EMA and theInternational Coalition of Medicines Regulatory Authorities (ICMRA) work together to help integrate real-world evidence into regulatory decision-making across the world. 

    ICMRA held a workshop for regulators to share experience in obtaining and using real-world evidence for the assessment of medicines, and issued a pledge in July 2022 to foster global efforts in this area. 

    For more information:


    How useful was this page?

    Add your rating
    11 ratings
    1 rating
    1 rating
    3 ratings