Population Descriptors
Human Pangenome Reference Consortium Population Descriptors: Best Practices
Authors: Human Pangenome Reference Consortium, Ethics, Legal and Social Implications Team
Date: 30 May 2025
Background:
The goal of the Human Pangenome Reference Consortium (HPRC) is to produce a new human genome reference that is more complete, more accurate, and better represents the broad range of human genetic variation. To do this, HPRC seeks to generate many complete high-quality phased reference genomes of genetically and biogeographically diverse individuals to create a Pangenome Reference (see Definition) containing almost all common human genetic variants. In 2023, HPRC released an initial draft version of the Pangenome Reference (Release I). In 2025, HPRC released an updated and expanded version of the Pangenome Reference (Release II) that is freely available on the internet through the HPRC data browser, AnVIL, and public nucleotide archives (INSDC). Release II includes:
- DNA sequence data of >230 samples, mostly derived from the 1000 Genomes Project, including long read (PacBio and Oxford Nanopore Technologies), proximity ligation (Dovetail Hi-C) data, and several dozen additional legacy samples with open consent
- Corresponding high-quality, phased, and near Telomere-2-Telomere assemblies
- Annotations, including gene annotations, for all samples. Gene annotations integrate full-length RNA isoform data (PacBio Kinnex); available for most, but not all, samples
- Graph-based assembly alignments that create an integrated view of the pangenome
Files supplemental to the list above are also available, including long-read methylation calls. The data produced by the HPRC are in the public domain. They are not patented nor subject to copyright, and users may not claim intellectual property on the data in part or whole.
Purpose:
Below outlines the HPRC’s recommended best practices for using population descriptors when analysing the HPRC Pangenome Reference. These best practices aim to promote the Pangenome Reference’s ethical, legal, and fair use, and apply to all users (see Definition).
Best Practices:
Users should refer to the NASEM report on “Using Population Descriptors in Genetics and Genomics Research” to guide the use of population descriptions in HPRC datasets.
- Users should use the report to determine whether population descriptors are essential in the research study. If population descriptors are needed, Users should avoid using continental-level labels unless carefully described and justified.
- Users should also not attempt to assign an Indigenous population descriptor to any Pangenome Reference beyond what has already been assigned through community engagement for the 1000 Genomes Project samples.
- When using 1000 Genomes or HapMap Project sample genomics data, users should follow the “Guidelines for Referring to Populations”.