Polygenic scores
A polygenic score (PGS) aggregates the effects of many genetic variants into a single number which predicts genetic predisposition for a phenotype. PGS are typically composed of hundreds-to-millions of genetic variants (usually SNPs) which are combined using a weighted sum of allele dosages multiplied by their corresponding effect sizes, as estimated from a relevant genome-wide association study (GWAS).
PGS nomenclature is heterogeneous: they can also be referred to as genetic scores or genomic scores, and as polygenic risk scores (PRS) or genomic risk scores (GRS) if they predict a discrete phenotype, such as a disease.
PGS Catalog
The PGS Catalog is an open database of published polygenic scores (PGS). Each PGS in the Catalog is consistently annotated with relevant metadata; including scoring files (variants, effect alleles/weights), annotations of how the PGS was developed and applied, and evaluations of their predictive performance.
Getting PGS scores
You can search PGS scores by three criteria:
-
pgs_id
: PGS identifier -
efo_id
: EFO trait identifier -
pubmed_id
: PubMed identifier
While these criteria are not terribly useful, as normally you do not know these identifiers beforehand, these are the search criteria provided by the PGS Catalog REST API (the service quincunx communicates with).
Instead of using these criteria directly, we show you how you may
retrieve polygenic score information by starting with a trait or disease
of interest. See vignette('pgs-scores-mavaddat')
for how to
get polygenic scores if starting with a publication of interest.
Let’s say you are interested in basophil count. Basophils are one of the several kinds of white blood cells and make up less than 1% of all circulating white blood cells. Basophils play a part in immune surveillance, and varying levels of basophils are associated with different medical conditions, e.g., allergies, inflammation, infection, leukemia or anemia.
We start by querying the PGS Catalog for traits with the term
"basophil"
— you may check
vignette('getting-traits')
for more details on how to use
the get_traits()
function.
(basophil_traits <- get_traits(trait_term = 'basophil', exact_term = FALSE))
#> An object of class "traits"
#> Slot "traits":
#> # A tibble: 3 × 6
#> efo_id parent_efo_id is_child trait description url
#> <chr> <chr> <lgl> <chr> <chr> <chr>
#> 1 EFO_0005090 NA FALSE basophil count The number… http…
#> 2 EFO_0803539 NA FALSE basophil measurement Quantifica… http…
#> 3 EFO_0007992 NA FALSE basophil percentage of l… A calculat… http…
#>
#> Slot "pgs_ids":
#> # A tibble: 12 × 4
#> efo_id parent_efo_id is_child pgs_id
#> <chr> <chr> <lgl> <chr>
#> 1 EFO_0005090 NA FALSE PGS000088
#> 2 EFO_0005090 NA FALSE PGS000163
#> 3 EFO_0005090 NA FALSE PGS001378
#> 4 EFO_0005090 NA FALSE PGS003940
#> 5 EFO_0005090 NA FALSE PGS004727
#> 6 EFO_0005090 NA FALSE PGS004728
#> 7 EFO_0005090 NA FALSE PGS004729
#> 8 EFO_0005090 NA FALSE PGS004730
#> 9 EFO_0007992 NA FALSE PGS000089
#> 10 EFO_0007992 NA FALSE PGS000164
#> 11 EFO_0007992 NA FALSE PGS001377
#> 12 EFO_0007992 NA FALSE PGS003945
#>
#> Slot "child_pgs_ids":
#> # A tibble: 16 × 4
#> efo_id parent_efo_id is_child child_pgs_id
#> <chr> <chr> <lgl> <chr>
#> 1 EFO_0005090 NA FALSE PGS000089
#> 2 EFO_0005090 NA FALSE PGS000164
#> 3 EFO_0005090 NA FALSE PGS001377
#> 4 EFO_0005090 NA FALSE PGS003945
#> 5 EFO_0803539 NA FALSE PGS000088
#> 6 EFO_0803539 NA FALSE PGS000089
#> 7 EFO_0803539 NA FALSE PGS000163
#> 8 EFO_0803539 NA FALSE PGS000164
#> 9 EFO_0803539 NA FALSE PGS001377
#> 10 EFO_0803539 NA FALSE PGS001378
#> 11 EFO_0803539 NA FALSE PGS003940
#> 12 EFO_0803539 NA FALSE PGS003945
#> 13 EFO_0803539 NA FALSE PGS004727
#> 14 EFO_0803539 NA FALSE PGS004728
#> 15 EFO_0803539 NA FALSE PGS004729
#> 16 EFO_0803539 NA FALSE PGS004730
#>
#> Slot "trait_categories":
#> # A tibble: 5 × 4
#> efo_id parent_efo_id is_child trait_categories
#> <chr> <chr> <lgl> <chr>
#> 1 EFO_0005090 NA FALSE Hematological measurement
#> 2 EFO_0005090 NA FALSE Inflammatory measurement
#> 3 EFO_0803539 NA FALSE Hematological measurement
#> 4 EFO_0007992 NA FALSE Hematological measurement
#> 5 EFO_0007992 NA FALSE Inflammatory measurement
#>
#> Slot "trait_synonyms":
#> # A tibble: 6 × 4
#> efo_id parent_efo_id is_child trait_synonyms
#> <chr> <chr> <lgl> <chr>
#> 1 EFO_0005090 NA FALSE blood basophil count
#> 2 EFO_0007992 NA FALSE basophil count as percentage of total whit…
#> 3 EFO_0007992 NA FALSE basophil count to total WBC count ratio
#> 4 EFO_0007992 NA FALSE basophil percentage
#> 5 EFO_0007992 NA FALSE basophil percentage of white cells
#> 6 EFO_0007992 NA FALSE blood basophil count to total leukocyte co…
#>
#> Slot "trait_mapped_terms":
#> # A tibble: 6 × 4
#> efo_id parent_efo_id is_child trait_mapped_terms
#> <chr> <chr> <lgl> <chr>
#> 1 EFO_0005090 NA FALSE CMO:0000034
#> 2 EFO_0005090 NA FALSE CMO:0000111
#> 3 EFO_0005090 NA FALSE MedDRA:10049695
#> 4 EFO_0005090 NA FALSE SNOMEDCT:42351005
#> 5 EFO_0803539 NA FALSE PMID:37596262
#> 6 EFO_0007992 NA FALSE CMO:0000368
The table (slot) pgs_ids
in basophil_traits
provides the associated PGS identifiers.
basophil_traits@pgs_ids
#> # A tibble: 12 × 4
#> efo_id parent_efo_id is_child pgs_id
#> <chr> <chr> <lgl> <chr>
#> 1 EFO_0005090 NA FALSE PGS000088
#> 2 EFO_0005090 NA FALSE PGS000163
#> 3 EFO_0005090 NA FALSE PGS001378
#> 4 EFO_0005090 NA FALSE PGS003940
#> 5 EFO_0005090 NA FALSE PGS004727
#> 6 EFO_0005090 NA FALSE PGS004728
#> 7 EFO_0005090 NA FALSE PGS004729
#> 8 EFO_0005090 NA FALSE PGS004730
#> 9 EFO_0007992 NA FALSE PGS000089
#> 10 EFO_0007992 NA FALSE PGS000164
#> 11 EFO_0007992 NA FALSE PGS001377
#> 12 EFO_0007992 NA FALSE PGS003945
These identifiers can now be used to query score information using
the function get_scores()
:
get_scores(pgs_id = basophil_traits@pgs_ids$pgs_id)
#> ■■■■■■ 17% | ETA: 10s
#> ■■■■■■■■■■■■■ 42% | ETA: 8s
#> ■■■■■■■■■■■■■■■■■■■ 58% | ETA: 5s
#> ■■■■■■■■■■■■■■■■■■■■■■■■■■ 83% | ETA: 2s
#> An object of class "scores"
#> Slot "scores":
#> # A tibble: 12 × 12
#> pgs_id pgs_name scoring_file matches_publication reported_trait
#> <chr> <chr> <chr> <lgl> <chr>
#> 1 PGS000088 baso https://ftp… TRUE Basophil count
#> 2 PGS000163 baso https://ftp… TRUE Basophil count
#> 3 PGS001378 GBE_INI30160 https://ftp… TRUE Basophil count
#> 4 PGS003940 INI30160 https://ftp… TRUE Basophil count
#> 5 PGS004727 Basophils_PRSmix_e… https://ftp… TRUE Basophil count
#> 6 PGS004728 Basophils_PRSmix_s… https://ftp… TRUE Basophil count
#> 7 PGS004729 Basophils_PRSmixPl… https://ftp… TRUE Basophil count
#> 8 PGS004730 Basophils_PRSmixPl… https://ftp… TRUE Basophil count
#> 9 PGS000089 baso_p https://ftp… TRUE Basophil perc…
#> 10 PGS000164 baso_p https://ftp… TRUE Basophil perc…
#> 11 PGS001377 GBE_INI30220 https://ftp… TRUE Basophil perc…
#> 12 PGS003945 INI30220 https://ftp… TRUE Basophil perc…
#> # ℹ 7 more variables: trait_additional_description <chr>,
#> # pgs_method_name <chr>, pgs_method_params <chr>, n_variants <int>,
#> # n_variants_interactions <int>, assembly <chr>, license <chr>
#>
#> Slot "publications":
#> # A tibble: 12 × 8
#> pgs_id pgp_id pubmed_id publication_date publication title author_fullname
#> <chr> <chr> <int> <date> <chr> <chr> <chr>
#> 1 PGS000088 PGP00… 35072137 2022-01-12 Cell Genom Mach… Xu Y
#> 2 PGS000163 PGP00… 32888494 2020-09-01 Cell The … Vuckovic D
#> 3 PGS001378 PGP00… 35324888 2022-03-24 PLoS Genet Sign… Tanigawa Y
#> 4 PGS003940 PGP00… 37890495 2023-09-19 AJHG Powe… Tanigawa Y
#> 5 PGS004727 PGP00… 38508198 2024-03-12 Cell Genom Inte… Truong B
#> 6 PGS004728 PGP00… 38508198 2024-03-12 Cell Genom Inte… Truong B
#> 7 PGS004729 PGP00… 38508198 2024-03-12 Cell Genom Inte… Truong B
#> 8 PGS004730 PGP00… 38508198 2024-03-12 Cell Genom Inte… Truong B
#> 9 PGS000089 PGP00… 35072137 2022-01-12 Cell Genom Mach… Xu Y
#> 10 PGS000164 PGP00… 32888494 2020-09-01 Cell The … Vuckovic D
#> 11 PGS001377 PGP00… 35324888 2022-03-24 PLoS Genet Sign… Tanigawa Y
#> 12 PGS003945 PGP00… 37890495 2023-09-19 AJHG Powe… Tanigawa Y
#> # ℹ 1 more variable: doi <chr>
#>
#> Slot "samples":
#> # A tibble: 14 × 15
#> pgs_id sample_id stage sample_size sample_cases sample_controls
#> <chr> <int> <chr> <int> <int> <int>
#> 1 PGS000088 1 gwas 404718 NA NA
#> 2 PGS000088 2 dev 323774 NA NA
#> 3 PGS000163 1 gwas 408112 NA NA
#> 4 PGS001378 1 dev 261890 NA NA
#> 5 PGS003940 1 dev 315523 NA NA
#> 6 PGS004727 1 dev 13541 NA NA
#> 7 PGS004728 1 dev 28384 NA NA
#> 8 PGS004729 1 dev 13541 NA NA
#> 9 PGS004730 1 dev 28384 NA NA
#> 10 PGS000089 1 gwas 404532 NA NA
#> 11 PGS000089 2 dev 323626 NA NA
#> 12 PGS000164 1 gwas 408112 NA NA
#> 13 PGS001377 1 dev 261893 NA NA
#> 14 PGS003945 1 dev 315527 NA NA
#> # ℹ 9 more variables: sample_percent_male <dbl>, phenotype_description <chr>,
#> # ancestry_category <chr>, ancestry <chr>, country <chr>,
#> # ancestry_additional_description <chr>, study_id <chr>, pubmed_id <int>,
#> # cohorts_additional_description <chr>
#>
#> Slot "demographics":
#> # A tibble: 4 × 11
#> pgs_id sample_id variable estimate_type unit variability_type variability
#> <chr> <int> <chr> <chr> <chr> <chr> <dbl>
#> 1 PGS000088 1 age mean years NA NA
#> 2 PGS000088 2 age mean years NA NA
#> 3 PGS000089 1 age mean years NA NA
#> 4 PGS000089 2 age mean years NA NA
#> # ℹ 4 more variables: estimate <dbl>, interval_type <chr>,
#> # interval_lower <dbl>, interval_upper <dbl>
#>
#> Slot "cohorts":
#> # A tibble: 14 × 4
#> pgs_id sample_id cohort_symbol cohort_name
#> <chr> <int> <chr> <chr>
#> 1 PGS000088 1 UKB UK Biobank
#> 2 PGS000088 2 UKB UK Biobank
#> 3 PGS000163 1 UKB UK Biobank
#> 4 PGS001378 1 UKB UK Biobank
#> 5 PGS003940 1 UKB UK Biobank
#> 6 PGS004727 1 AllofUs All of Us Research Program | National Inst…
#> 7 PGS004728 1 G&H Genes & Health
#> 8 PGS004729 1 AllofUs All of Us Research Program | National Inst…
#> 9 PGS004730 1 G&H Genes & Health
#> 10 PGS000089 1 UKB UK Biobank
#> 11 PGS000089 2 UKB UK Biobank
#> 12 PGS000164 1 UKB UK Biobank
#> 13 PGS001377 1 UKB UK Biobank
#> 14 PGS003945 1 UKB UK Biobank
#>
#> Slot "traits":
#> # A tibble: 12 × 5
#> pgs_id efo_id trait description url
#> <chr> <chr> <chr> <chr> <chr>
#> 1 PGS000088 EFO_0005090 basophil count The number of … http…
#> 2 PGS000163 EFO_0005090 basophil count The number of … http…
#> 3 PGS001378 EFO_0005090 basophil count The number of … http…
#> 4 PGS003940 EFO_0005090 basophil count The number of … http…
#> 5 PGS004727 EFO_0005090 basophil count The number of … http…
#> 6 PGS004728 EFO_0005090 basophil count The number of … http…
#> 7 PGS004729 EFO_0005090 basophil count The number of … http…
#> 8 PGS004730 EFO_0005090 basophil count The number of … http…
#> 9 PGS000089 EFO_0007992 basophil percentage of leukocytes A calculated m… http…
#> 10 PGS000164 EFO_0007992 basophil percentage of leukocytes A calculated m… http…
#> 11 PGS001377 EFO_0007992 basophil percentage of leukocytes A calculated m… http…
#> 12 PGS003945 EFO_0007992 basophil percentage of leukocytes A calculated m… http…
#>
#> Slot "stages_tally":
#> # A tibble: 26 × 4
#> pgs_id stage sample_size n_sample_sets
#> <chr> <chr> <int> <int>
#> 1 PGS000088 gwas 404718 NA
#> 2 PGS000088 dev 323774 NA
#> 3 PGS000088 eval NA 2
#> 4 PGS000163 gwas 408112 NA
#> 5 PGS000163 eval NA 2
#> 6 PGS001378 dev 261890 NA
#> 7 PGS001378 eval NA 5
#> 8 PGS003940 dev 315523 NA
#> 9 PGS003940 eval NA 5
#> 10 PGS004727 dev 13541 NA
#> # ℹ 16 more rows
#>
#> Slot "ancestry_frequencies":
#> # A tibble: 38 × 4
#> pgs_id stage ancestry_class_symbol frequency
#> <chr> <chr> <chr> <dbl>
#> 1 PGS000088 gwas EUR 100
#> 2 PGS000088 dev EUR 100
#> 3 PGS000088 eval EUR 100
#> 4 PGS000163 gwas EUR 100
#> 5 PGS000163 eval EUR 100
#> 6 PGS001378 dev EUR 100
#> 7 PGS001378 eval AFR 20
#> 8 PGS001378 eval EAS 20
#> 9 PGS001378 eval EUR 40
#> 10 PGS001378 eval SAS 20
#> # ℹ 28 more rows
#>
#> Slot "multi_ancestry_composition":
#> # A tibble: 12 × 4
#> pgs_id stage multi_ancestry_class_symbol ancestry_class_symbol
#> <chr> <chr> <chr> <chr>
#> 1 PGS003940 dev MAE EUR
#> 2 PGS003940 dev MAE SAS
#> 3 PGS003940 dev MAE AFR
#> 4 PGS003940 dev MAE OTH
#> 5 PGS003940 eval MAO EAS
#> 6 PGS003940 eval MAO OTH
#> 7 PGS003945 dev MAE EUR
#> 8 PGS003945 dev MAE SAS
#> 9 PGS003945 dev MAE AFR
#> 10 PGS003945 dev MAE OTH
#> 11 PGS003945 eval MAO EAS
#> 12 PGS003945 eval MAO OTH