Introduction
Polygenic scores (PGSs) are annotated with information about the
phenotype that it predicts, i.e. the reported trait (as reported in the
original publication). This can be found as the column
reported_trait
in slot scores
of
scores
objects:
pgs_01 <- get_scores('PGS000001')
pgs_01@scores
#> # A tibble: 1 × 12
#> pgs_id pgs_name scoring_file matches_publication reported_trait
#> <chr> <chr> <chr> <lgl> <chr>
#> 1 PGS000001 PRS77_BC https://ftp.ebi.ac.uk/p… TRUE Breast cancer
#> # ℹ 7 more variables: trait_additional_description <chr>,
#> # pgs_method_name <chr>, pgs_method_params <chr>, n_variants <int>,
#> # n_variants_interactions <int>, assembly <chr>, license <chr>
The predicted phenotype is also mapped to Experimental Factor
Ontology (EFO) terms (a controlled vocabulary for the unambiguous
identification of traits and diseases, and their relationships), namely,
the EFO trait. The EFO traits associated with a polygenic score can also
be found in scores
objects in the slot traits
,
column trait
:
pgs_01@traits
#> # A tibble: 1 × 5
#> pgs_id efo_id trait description url
#> <chr> <chr> <chr> <chr> <chr>
#> 1 PGS000001 EFO_0000305 breast carcinoma A carcinoma that arises from epi… http…
Many PGSs have been developed and demonstrated to be predictive of common complex traits, e.g. body mass index (BMI)1, blood lipids2 and educational attainment3.
Similarly, PGSs for various diseases have been shown to be predictive of disease incidence, defining marked increases in risk over the life course or at earlier ages for people with high PGSs, e.g. coronary artery disease4,5, breast cancer6 and schizophrenia7.
Getting catalogued traits from PGS Catalog
If you are interested in retrieving polygenic scores from the
Catalog, you might want to search them by the trait they predict.
get_scores()
is the function that searches for
PGSs
, however, this function only allows to search by
pgs_id
, efo_id
or pubmed_id
. So
in order to search by a trait term, we need to first find the associated
EFO identifiers (efo_id
).
To search for traits (or diseases), you use the function
get_traits()
. With this function you can search by:
- The EFO trait identifier:
efo_id
; - or by the trait term: a term to be matched in the EFO identifier
(
efo_id
), label, description synonyms, trait categories, or external mapped terms.
The most useful search criteria is the trait term, and that is typically want you will want to use. Unless you already know the EFO trait you are interested in, and are looking for extra details about it, you won’t search directly with the EFO identifier.
Basic example
Let’s say you are interested in PGSs related to medical condition,
stroke. Then you can search for "stroke"
with
get_traits()
:
get_traits(trait_term = 'stroke')
#> An object of class "traits"
#> Slot "traits":
#> # A tibble: 2 × 6
#> efo_id parent_efo_id is_child trait description url
#> <chr> <chr> <lgl> <chr> <chr> <chr>
#> 1 EFO_0003763 NA FALSE cerebrovascular disorder A disorder … http…
#> 2 EFO_0000712 NA FALSE stroke A sudden lo… http…
#>
#> Slot "pgs_ids":
#> # A tibble: 29 × 4
#> efo_id parent_efo_id is_child pgs_id
#> <chr> <chr> <lgl> <chr>
#> 1 EFO_0003763 NA FALSE PGS002053
#> 2 EFO_0000712 NA FALSE PGS000038
#> 3 EFO_0000712 NA FALSE PGS000039
#> 4 EFO_0000712 NA FALSE PGS000665
#> 5 EFO_0000712 NA FALSE PGS000911
#> 6 EFO_0000712 NA FALSE PGS001793
#> 7 EFO_0000712 NA FALSE PGS001798
#> 8 EFO_0000712 NA FALSE PGS002259
#> 9 EFO_0000712 NA FALSE PGS002724
#> 10 EFO_0000712 NA FALSE PGS002725
#> # ℹ 19 more rows
#>
#> Slot "child_pgs_ids":
#> # A tibble: 38 × 4
#> efo_id parent_efo_id is_child child_pgs_id
#> <chr> <chr> <lgl> <chr>
#> 1 EFO_0003763 NA FALSE PGS000038
#> 2 EFO_0003763 NA FALSE PGS000039
#> 3 EFO_0003763 NA FALSE PGS000665
#> 4 EFO_0003763 NA FALSE PGS000911
#> 5 EFO_0003763 NA FALSE PGS001179
#> 6 EFO_0003763 NA FALSE PGS001793
#> 7 EFO_0003763 NA FALSE PGS001798
#> 8 EFO_0003763 NA FALSE PGS002052
#> 9 EFO_0003763 NA FALSE PGS002259
#> 10 EFO_0003763 NA FALSE PGS002724
#> # ℹ 28 more rows
#>
#> Slot "trait_categories":
#> # A tibble: 4 × 4
#> efo_id parent_efo_id is_child trait_categories
#> <chr> <chr> <lgl> <chr>
#> 1 EFO_0003763 NA FALSE Cardiovascular disease
#> 2 EFO_0003763 NA FALSE Neurological disorder
#> 3 EFO_0000712 NA FALSE Cardiovascular disease
#> 4 EFO_0000712 NA FALSE Neurological disorder
#>
#> Slot "trait_synonyms":
#> # A tibble: 87 × 4
#> efo_id parent_efo_id is_child trait_synonyms
#> <chr> <chr> <lgl> <chr>
#> 1 EFO_0003763 NA FALSE BRAIN VASCULAR DIS
#> 2 EFO_0003763 NA FALSE Brain Vascular Disorder
#> 3 EFO_0003763 NA FALSE Brain Vascular Disorders
#> 4 EFO_0003763 NA FALSE CEREBROVASCULAR DIS
#> 5 EFO_0003763 NA FALSE CVA
#> 6 EFO_0003763 NA FALSE CVA (cerebral vascular accident)
#> 7 EFO_0003763 NA FALSE Cerebrovascular Disease
#> 8 EFO_0003763 NA FALSE Cerebrovascular Disorders
#> 9 EFO_0003763 NA FALSE Cerebrovascular Insufficiencies
#> 10 EFO_0003763 NA FALSE Cerebrovascular Insufficiency
#> # ℹ 77 more rows
#>
#> Slot "trait_mapped_terms":
#> # A tibble: 35 × 4
#> efo_id parent_efo_id is_child trait_mapped_terms
#> <chr> <chr> <lgl> <chr>
#> 1 EFO_0003763 NA FALSE DOID:6713
#> 2 EFO_0003763 NA FALSE ICD10:I66
#> 3 EFO_0003763 NA FALSE ICD10:I67
#> 4 EFO_0003763 NA FALSE ICD10:I69
#> 5 EFO_0003763 NA FALSE ICD10CM:I60-I69
#> 6 EFO_0003763 NA FALSE ICD9:430-438.99
#> 7 EFO_0003763 NA FALSE ICD9:434.91
#> 8 EFO_0003763 NA FALSE ICD9:437.8
#> 9 EFO_0003763 NA FALSE ICD9:437.9
#> 10 EFO_0003763 NA FALSE MEDGEN:858
#> # ℹ 25 more rows
As can be seen from the returned traits
object, we get a
set of six tables (slots) that include several details about stroke.
In the first table traits
we got only one row,
indicating that this query returned only one trait in the Catalog. This
trait is named "stroke"
(column trait
), and is
unambiguously identified by the EFO identifier EFO_0000712.
Exact matching
By default, the trait term is matched exactly. If you want to relax
the matching, then indicate with the parameter exact_term
set to FALSE
. This way you will get, potentially, more
results, in this example case, ischemic stroke (HP_0002140) is now also
returned:
get_traits(trait_term = 'stroke', exact_term = FALSE)
#> An object of class "traits"
#> Slot "traits":
#> # A tibble: 6 × 6
#> efo_id parent_efo_id is_child trait description url
#> <chr> <chr> <lgl> <chr> <chr> <chr>
#> 1 EFO_0003763 NA FALSE cerebrovascular disorder A disorder… http…
#> 2 EFO_0005669 NA FALSE intracerebral hemorrhage Bleeding i… http…
#> 3 HP_0002140 NA FALSE Ischemic stroke Acute isch… http…
#> 4 EFO_0010555 NA FALSE left ventricular stroke … Quantifica… http…
#> 5 EFO_0000712 NA FALSE stroke A sudden l… http…
#> 6 HP_0001297 NA FALSE Stroke Sudden imp… http…
#>
#> Slot "pgs_ids":
#> # A tibble: 40 × 4
#> efo_id parent_efo_id is_child pgs_id
#> <chr> <chr> <lgl> <chr>
#> 1 EFO_0003763 NA FALSE PGS002053
#> 2 EFO_0005669 NA FALSE PGS003457
#> 3 EFO_0005669 NA FALSE PGS004943
#> 4 HP_0002140 NA FALSE PGS000039
#> 5 HP_0002140 NA FALSE PGS000665
#> 6 HP_0002140 NA FALSE PGS000911
#> 7 HP_0002140 NA FALSE PGS002724
#> 8 HP_0002140 NA FALSE PGS002725
#> 9 HP_0002140 NA FALSE PGS004322
#> 10 HP_0002140 NA FALSE PGS004597
#> # ℹ 30 more rows
#>
#> Slot "child_pgs_ids":
#> # A tibble: 46 × 4
#> efo_id parent_efo_id is_child child_pgs_id
#> <chr> <chr> <lgl> <chr>
#> 1 EFO_0003763 NA FALSE PGS000038
#> 2 EFO_0003763 NA FALSE PGS000039
#> 3 EFO_0003763 NA FALSE PGS000665
#> 4 EFO_0003763 NA FALSE PGS000911
#> 5 EFO_0003763 NA FALSE PGS001179
#> 6 EFO_0003763 NA FALSE PGS001793
#> 7 EFO_0003763 NA FALSE PGS001798
#> 8 EFO_0003763 NA FALSE PGS002052
#> 9 EFO_0003763 NA FALSE PGS002259
#> 10 EFO_0003763 NA FALSE PGS002724
#> # ℹ 36 more rows
#>
#> Slot "trait_categories":
#> # A tibble: 11 × 4
#> efo_id parent_efo_id is_child trait_categories
#> <chr> <chr> <lgl> <chr>
#> 1 EFO_0003763 NA FALSE Cardiovascular disease
#> 2 EFO_0003763 NA FALSE Neurological disorder
#> 3 EFO_0005669 NA FALSE Cardiovascular disease
#> 4 EFO_0005669 NA FALSE Neurological disorder
#> 5 HP_0002140 NA FALSE Cardiovascular disease
#> 6 HP_0002140 NA FALSE Neurological disorder
#> 7 EFO_0010555 NA FALSE Cardiovascular measurement
#> 8 EFO_0000712 NA FALSE Cardiovascular disease
#> 9 EFO_0000712 NA FALSE Neurological disorder
#> 10 HP_0001297 NA FALSE Cardiovascular disease
#> 11 HP_0001297 NA FALSE Neurological disorder
#>
#> Slot "trait_synonyms":
#> # A tibble: 94 × 4
#> efo_id parent_efo_id is_child trait_synonyms
#> <chr> <chr> <lgl> <chr>
#> 1 EFO_0003763 NA FALSE BRAIN VASCULAR DIS
#> 2 EFO_0003763 NA FALSE Brain Vascular Disorder
#> 3 EFO_0003763 NA FALSE Brain Vascular Disorders
#> 4 EFO_0003763 NA FALSE CEREBROVASCULAR DIS
#> 5 EFO_0003763 NA FALSE CVA
#> 6 EFO_0003763 NA FALSE CVA (cerebral vascular accident)
#> 7 EFO_0003763 NA FALSE Cerebrovascular Disease
#> 8 EFO_0003763 NA FALSE Cerebrovascular Disorders
#> 9 EFO_0003763 NA FALSE Cerebrovascular Insufficiencies
#> 10 EFO_0003763 NA FALSE Cerebrovascular Insufficiency
#> # ℹ 84 more rows
#>
#> Slot "trait_mapped_terms":
#> # A tibble: 49 × 4
#> efo_id parent_efo_id is_child trait_mapped_terms
#> <chr> <chr> <lgl> <chr>
#> 1 EFO_0003763 NA FALSE DOID:6713
#> 2 EFO_0003763 NA FALSE ICD10:I66
#> 3 EFO_0003763 NA FALSE ICD10:I67
#> 4 EFO_0003763 NA FALSE ICD10:I69
#> 5 EFO_0003763 NA FALSE ICD10CM:I60-I69
#> 6 EFO_0003763 NA FALSE ICD9:430-438.99
#> 7 EFO_0003763 NA FALSE ICD9:434.91
#> 8 EFO_0003763 NA FALSE ICD9:437.8
#> 9 EFO_0003763 NA FALSE ICD9:437.9
#> 10 EFO_0003763 NA FALSE MEDGEN:858
#> # ℹ 39 more rows
Subtraits (child traits)
By default, subtraits (child traits), are not retrieved by
get_traits()
. If you want to get all matching traits and
those that are child traits thereof, then indicate with the parameter
include_children
set to TRUE
. Here is an
example with "breast cancer"
:
get_traits(trait_term = 'breast cancer', include_children = TRUE)
#> An object of class "traits"
#> Slot "traits":
#> # A tibble: 17 × 6
#> efo_id parent_efo_id is_child trait description url
#> <chr> <chr> <lgl> <chr> <chr> <chr>
#> 1 MONDO_0007254 NA FALSE breast cancer A primary … http…
#> 2 EFO_1000650 MONDO_0007254 TRUE estrogen-receptor neg… A subtype … http…
#> 3 EFO_1000649 MONDO_0007254 TRUE estrogen-receptor pos… A subtype … http…
#> 4 EFO_0005537 MONDO_0007254 TRUE triple-negative breas… An invasiv… http…
#> 5 EFO_0000305 MONDO_0007254 TRUE breast carcinoma A carcinom… http…
#> 6 EFO_0009780 MONDO_0007254 TRUE HER2 negative breast … A biologic… http…
#> 7 MONDO_0021115 MONDO_0007254 TRUE luminal B breast carc… A biologic… http…
#> 8 MONDO_0021116 MONDO_0007254 TRUE luminal A breast carc… A biologic… http…
#> 9 EFO_1000294 MONDO_0007254 TRUE HER2 Positive Breast … A biologic… http…
#> 10 EFO_0000305 NA FALSE breast carcinoma A carcinom… http…
#> 11 EFO_1000650 EFO_0000305 TRUE estrogen-receptor neg… A subtype … http…
#> 12 EFO_1000649 EFO_0000305 TRUE estrogen-receptor pos… A subtype … http…
#> 13 EFO_0005537 EFO_0000305 TRUE triple-negative breas… An invasiv… http…
#> 14 EFO_0009780 EFO_0000305 TRUE HER2 negative breast … A biologic… http…
#> 15 MONDO_0021115 EFO_0000305 TRUE luminal B breast carc… A biologic… http…
#> 16 MONDO_0021116 EFO_0000305 TRUE luminal A breast carc… A biologic… http…
#> 17 EFO_1000294 EFO_0000305 TRUE HER2 Positive Breast … A biologic… http…
#>
#> Slot "pgs_ids":
#> # A tibble: 326 × 4
#> efo_id parent_efo_id is_child pgs_id
#> <chr> <chr> <lgl> <chr>
#> 1 EFO_1000650 MONDO_0007254 TRUE PGS000003
#> 2 EFO_1000650 MONDO_0007254 TRUE PGS000006
#> 3 EFO_1000650 MONDO_0007254 TRUE PGS000009
#> 4 EFO_1000650 MONDO_0007254 TRUE PGS000047
#> 5 EFO_1000650 MONDO_0007254 TRUE PGS000346
#> 6 EFO_1000650 MONDO_0007254 TRUE PGS000775
#> 7 EFO_1000650 MONDO_0007254 TRUE PGS004867
#> 8 EFO_1000650 MONDO_0007254 TRUE PGS004895
#> 9 EFO_1000650 MONDO_0007254 TRUE PGS005106
#> 10 EFO_1000649 MONDO_0007254 TRUE PGS000002
#> # ℹ 316 more rows
#>
#> Slot "child_pgs_ids":
#> # A tibble: 231 × 4
#> efo_id parent_efo_id is_child child_pgs_id
#> <chr> <chr> <lgl> <chr>
#> 1 MONDO_0007254 NA FALSE PGS000001
#> 2 MONDO_0007254 NA FALSE PGS000002
#> 3 MONDO_0007254 NA FALSE PGS000003
#> 4 MONDO_0007254 NA FALSE PGS000004
#> 5 MONDO_0007254 NA FALSE PGS000005
#> 6 MONDO_0007254 NA FALSE PGS000006
#> 7 MONDO_0007254 NA FALSE PGS000007
#> 8 MONDO_0007254 NA FALSE PGS000008
#> 9 MONDO_0007254 NA FALSE PGS000009
#> 10 MONDO_0007254 NA FALSE PGS000015
#> # ℹ 221 more rows
#>
#> Slot "trait_categories":
#> # A tibble: 17 × 4
#> efo_id parent_efo_id is_child trait_categories
#> <chr> <chr> <lgl> <chr>
#> 1 MONDO_0007254 NA FALSE Cancer
#> 2 EFO_1000650 MONDO_0007254 TRUE Cancer
#> 3 EFO_1000649 MONDO_0007254 TRUE Cancer
#> 4 EFO_0005537 MONDO_0007254 TRUE Cancer
#> 5 EFO_0000305 MONDO_0007254 TRUE Cancer
#> 6 EFO_0009780 MONDO_0007254 TRUE Cancer
#> 7 MONDO_0021115 MONDO_0007254 TRUE Cancer
#> 8 MONDO_0021116 MONDO_0007254 TRUE Cancer
#> 9 EFO_1000294 MONDO_0007254 TRUE Cancer
#> 10 EFO_0000305 NA FALSE Cancer
#> 11 EFO_1000650 EFO_0000305 TRUE Cancer
#> 12 EFO_1000649 EFO_0000305 TRUE Cancer
#> 13 EFO_0005537 EFO_0000305 TRUE Cancer
#> 14 EFO_0009780 EFO_0000305 TRUE Cancer
#> 15 MONDO_0021115 EFO_0000305 TRUE Cancer
#> 16 MONDO_0021116 EFO_0000305 TRUE Cancer
#> 17 EFO_1000294 EFO_0000305 TRUE Cancer
#>
#> Slot "trait_synonyms":
#> # A tibble: 95 × 4
#> efo_id parent_efo_id is_child trait_synonyms
#> <chr> <chr> <lgl> <chr>
#> 1 MONDO_0007254 NA FALSE BC
#> 2 MONDO_0007254 NA FALSE breast cancer
#> 3 MONDO_0007254 NA FALSE breast tumor
#> 4 MONDO_0007254 NA FALSE breast tumour
#> 5 MONDO_0007254 NA FALSE cancer of breast
#> 6 MONDO_0007254 NA FALSE malignant breast neoplasm
#> 7 MONDO_0007254 NA FALSE malignant breast tumor
#> 8 MONDO_0007254 NA FALSE malignant breast tumour
#> 9 MONDO_0007254 NA FALSE malignant neoplasm of breast
#> 10 MONDO_0007254 NA FALSE malignant neoplasm of the breast
#> # ℹ 85 more rows
#>
#> Slot "trait_mapped_terms":
#> # A tibble: 101 × 4
#> efo_id parent_efo_id is_child trait_mapped_terms
#> <chr> <chr> <lgl> <chr>
#> 1 MONDO_0007254 NA FALSE DOID:1612
#> 2 MONDO_0007254 NA FALSE ICD10CM:C50
#> 3 MONDO_0007254 NA FALSE ICD9:174.8
#> 4 MONDO_0007254 NA FALSE MEDGEN:651
#> 5 MONDO_0007254 NA FALSE NCIT:C9335
#> 6 MONDO_0007254 NA FALSE SCTID:254837009
#> 7 MONDO_0007254 NA FALSE UMLS:C0006142
#> 8 EFO_1000650 MONDO_0007254 TRUE DOID:0060076
#> 9 EFO_1000650 MONDO_0007254 TRUE EFO:1000650
#> 10 EFO_1000650 MONDO_0007254 TRUE MONDO:0006513
#> # ℹ 91 more rows
The column is_child
indicates whether that trait is
being retrieved because it is a direct result of the query or not.
is_child
is TRUE
when the trait is returned
because it is a child trait of a matching trait, and FALSE
if a direct result of the query.
In the case of child traits, the column parent_efo_id
indicates the EFO trait identifier of the parent trait, i.e. the direct
matching trait, or NA
otherwise.
Getting all traits
To retrieve all traits simply leave the parameters
efo_id
and trait_term
as NULL
(default):