Data

The Human Pangenome Reference Consortium generates raw sequencing data, high-quality assemblies and, for Data Release 1, pangenomes. All data are open and publicly accessible. Sequencing data and assemblies for Data Release 2 can be accessed through the HPRC Data Explorer, or through AnVIL and GitHub. Data Release 2 resources are being progressively added. Where Release 2 versions are not yet available, we advise using Data Release 1, accessible through the links below. Please refer to the HPRC Data Use Best Practices when using HPRC data.

HPRC Data Explorer

Data Availability

Sequencing data, assemblies, and pangenomes are stored in publicly accesible cloud buckets, the AnVIL data ecosystem, and in SRA/ENA/DDBJ.

Sequencing Data Release2

  • PacBio HiFi (with modification calls)
  • PacBio Kinnex
  • Oxford Nanopore Ultralong (with modification calls)
  • Dovetail
  • Omni-C/Hi-C

In addition, we include high coverage Illumina data produced by the NYGC for parents and children (when available). If you would like to download the files, data indexes are available in the GitHub repository.

‘AnVil Logo’ ‘GitHub Logo’ ‘NCBI Logo’

Assemblies Release2

Assemblies produced with Hifiasm are available alongside annotations for the assemblies. If you would like to download the files, data indexes are available in the GitHub repository.

‘AnVil Logo’ ‘NCBI Logo’ ‘UCSC Logo’ ‘Ensembl Logo’ ‘GitHub Logo’

Pangenomes Release1

The HPRC has released pangenomes from its year 1 data. Currently there are three main approaches:

  • Minigraph
  • Minigraph-CACTUS
  • Pangenome Graph Builder (PGGB)

Each pangenome has different strengths and weaknessness. If you do not know which pangenome best suits your needs, see the GitHub repository.

‘AnVil Logo’ ‘GitHub Logo’ ‘ENA Logo’