Data Use
Human Pangenome Reference Consortium Data Use: Best Practices
Authors: Human Pangenome Reference Consortium, Ethics, Legal and Social Implications Team
Date: 14 April 2025
Background:
The goal of the Human Pangenome Reference Consortium (HPRC) is to produce a new human genome reference that is more complete, more accurate, and better represents the broad range of human genetic variation. To do this, HPRC seeks to generate many complete high-quality phased reference genomes of genetically and biogeographically diverse individuals to create a Pangenome Reference (see Definition) containing almost all common human genetic variants. In 2023, HPRC released an initial draft version of the Pangenome Reference (Release I). In 2025, HPRC released an updated and expanded version of the Pangenome Reference (Release II) that is freely available on the internet through the HPRC data browser, AnVIL, and public nucleotide archives (INSDC). Release II includes:
- DNA sequence data of 230 samples, mostly derived from the 1000 Genomes Project, including long read (PacBio and Oxford Nanopore Technologies), proximity ligation (Dovetail Hi-C) data, and several dozen additional legacy samples with open consent,
- Corresponding high-quality, phased, and near-Telomere-2-Telomere assemblies,
- Annotations, including gene annotations, for all samples. Gene annotations integrate full-length RNA isoform data (PacBio Kinnex); available for most, but not all, samples.
- Graph-based assembly alignments that create an integrated view of the pangenome.
Files supplemental to the list above are also available, including long-read methylation calls. The data produced by the HPRC are in the public domain. They are not patented nor subject to copyright, and users may not claim intellectual property on the data in part or whole.
Purpose:
Below outlines the HPRC’s Ethics, Legal, and Social Implications Team’s recommended best practices for using the Release II Pangenome Reference. These best practices aim to promote the Pangenome Reference’s ethical, legal, and fair use, and apply to all users (see Definition).
The following best practices are rooted in several documents on research ethics, including the global Declaration of Helsinki, the Council for International Organizations of Medical Sciences, and the Belmont Principles, as well as the regulations and norms for the protection of human participants in the United States. HPRC expects that users will follow the best practices outlined below.
Best Practices:
- Scientific Publication: If publishing a research finding that has utilized the Pangenome Reference, the HPRC expects users to include the relevant HPRC attribution statement (see below). In addition, publications arising from such research must not contain the personal data of any participant (see Definitions). Such publications should not contain information that could reasonably be expected to cause a participant, their families, their communities, or members of specific populations to experience harm or stigmatization.
For HPRC member users: The recommended (internal) attribution statement:
“We would like to acknowledge the National Genome Research Institute (NHGRI) for funding the following grants supporting the creation of the human pangenome reference: U41HG010972, U01HG010971, U01HG013760, U01HG013755, U01HG013748, U01HG013744, R01HG011274, and the Human Pangenome Reference Consortium (BioProject ID: PRJNA730823).”
For other users: The recommended (external) attribution statement:
“We would like to acknowledge the Human Pangenome Reference Consortium (BioProject ID: PRJNA730823) and its funder, the National Human Genome Research Institute (NHGRI).”
This document will be updated to include the marker publication associated with HPRC Release II when it becomes available.
Please also see the HPRC Publication Policy.
- Intellectual Property: HPRC expects users not to sell all or part of the Pangenome Reference on any media. Users should not seek intellectual property protection in ways that would prevent or block access to, or use of, any element of the Pangenome Reference, or conclusions drawn directly from the Pangenome Reference. Users can elect to perform further research that would add value–intellectual, commercial, or both—to the Pangenome Reference and decide to obtain intellectual property rights on these downstream developments. Users should not implement licensing policies that obstruct further research or clinical use.
- Re-Identification and Harm: HPRC expects users not to take any actions that could reasonably be expected to result in participant re-identification, including contacting or communicating with participants. Users should not use the data in a manner that is reasonably anticipated to cause participants, their families, their communities, or members of specific populations to experience harm or stigmatization. HPRC expects users to follow all applicable international, national, and local laws and regulations, including the laws, regulations, and customary norms of Indigenous Peoples.
- Fairness and Beneficence: Users of the Pangenome Reference should carefully consider how the benefits and burdens of using the data—as well as applications and outcomes from such use—are distributed. This is of particular concern for groups who have historically not received the benefits of genomic research.
- Population Descriptors: Users should refer to the NASEM report on “Using Population Descriptors in Genetics and Genomics Research” to guide the use of population descriptions in HPRC datasets.
- Users should use the report to determine whether population descriptors are essential in the research study. If population descriptors are needed, Users should avoid using continental-level labels unless carefully described and justified.
- Users should also not attempt to assign an Indigenous population descriptor to any Pangenome Reference beyond what has already been assigned through community engagement for the 1000 Genomes Project samples.
- When using 1000 Genomes or HapMap Project sample genomics data, users should follow the “Guidelines for Referring to Populations”.
- Disclaimers: The Pangenome Reference is provided as-is, without warranties or guarantees, express or implied, as to its fitness for a particular purpose, its accuracy, quality, or comprehensiveness. HPRC accepts no liability or other responsibility for direct or indirect damages or losses arising from the use of the Pangenome Reference, nor for direct or indirect damages or losses occasioned due to the unavailability of the Pangenome Reference. Each User agrees to bear all liability arising from its use, storage, and disposal of the Pangenome Reference.
Definitions:
Pangenome Reference: These are the Release II data resources produced by the HPRC and include raw sequence data, individual genome assemblies, sequence alignments, and annotations.
Users: Any individual, institution, or entity using the Pangenome Reference, including both commercial and non-commercial use.
Participant: Any individual whose data are included in the Pangenome Reference.