The Human Pangenome Reference Consortium generates raw sequencing data, high-quality assemblies, and
pangenomes. All data generated is open, publicly accessible, and can be downloaded or used in AWS,
GCP, AnVIL, or locally.
We have two “hot off the presses” data sets available at the links below for the HPRC project. These data
sets are both using cutting edge technologies for evaluation for future mainstream use. A PacBio Revio
sequence run performed at PacBio using an HPRC style library provided by Washington University (large
discreet size fraction aiming for ~20kb), as well as an Oxford Nanopore duplex read data set generated at
UCSC in close collaboration with Oxford Nanopore.
Sequencing data, assemblies, and pangenomes are stored in publicly accesible cloud buckets, the AnVIL
data ecosystem, and in SRA/ENA/DDBJ.
Data is uploaded to both AWS S3 and Google Cloud buckets. GitHub repositories have been created and
include details about the data generation as well as index files with locations of the data stored
in S3 and GCP. The HPRC S3 bucket does not charge egress fees making it a good option if you would
like to download data to your local machine.
AnVIL is a cloud environment that allows you to view the data organized in convenient data tables
that refer to copies of the data in GCP. AnVIL also includes a workflow runner so you can analyze
the data withough.
All data is also uploaded to INSDCs (SRA/ENA/DDBJ) in BioProjects for sequencing data, assemblies,
The HPRC produces:
- PacBio HiFi
- Oxford Nanopore
In addition, we include high coverage Illumina data produced by the NYGC for parents and children
(when available). If you would like to download the files, data indexes are available in the GitHub
The HPRC produces:
Assemblies produced with Hifiasm are available alongside annotations for the assemblies. If you
would like to download the files, data indexes are available in the GitHub repository.
The HPRC has released pangenomes from its year 1 data. Currently there are three main approaches:
- Pangenome Graph Builder (PGGB)
Each pangenome has different strengths and weaknessness. If you do not know which pangenome best
suits your needs, see the GitHub repository.