The Human Pangenome Reference Consortium generates raw sequencing data, high-quality assemblies, and pangenomes. All data generated is open, publicly accessible, and can be downloaded or used in AWS, GCP, AnVIL, or locally.
We have two “hot off the presses” data sets available at the links below for the HPRC project. These data sets are both using cutting edge technologies for evaluation for future mainstream use. A PacBio Revio sequence run performed at PacBio using an HPRC style library provided by Washington University (large discreet size fraction aiming for ~20kb), as well as an Oxford Nanopore duplex read data set generated at UCSC in close collaboration with Oxford Nanopore.
Revio data is now available for HG002!
The data was generated at PacBio using HG002 libraries created at Washington University. The data set has a large, tight size fraction targeting 20kb.

ONT duplex data for HG002 is now available!
The HPRC has publicly released a dataset for Oxford Nanopore's new Duplex technology with >70X data with excellent quality (~Q29) and read length (35kbp N50) for HG002.

Data Availability

Sequencing data, assemblies, and pangenomes are stored in publicly accesible cloud buckets, the AnVIL data ecosystem, and in SRA/ENA/DDBJ.

Data Sequencing

  • PacBio HiFi
  • Oxford Nanopore
  • Omni-C/Hi-C
In addition, we include high coverage Illumina data produced by the NYGC for parents and children (when available). If you would like to download the files, data indexes are available in the GitHub repository.


Assemblies produced with Hifiasm are available alongside annotations for the assemblies. If you would like to download the files, data indexes are available in the GitHub repository.


The HPRC has released pangenomes from its year 1 data. Currently there are three main approaches:
  • Minigraph
  • Minigraph-CACTUS
  • Pangenome Graph Builder (PGGB)
Each pangenome has different strengths and weaknessness. If you do not know which pangenome best suits your needs, see the GitHub repository.