Skip to contents

Introduction

Polygenic scores (PGSs) are annotated with information about the phenotype that it predicts, i.e. the reported trait (as reported in the original publication). This can be found as the column reported_trait in slot scores of scores objects:

pgs_01 <- get_scores('PGS000001')
pgs_01@scores
#> # A tibble: 1 × 12
#>   pgs_id    pgs_name scoring_file             matches_publication reported_trait
#>   <chr>     <chr>    <chr>                    <lgl>               <chr>         
#> 1 PGS000001 PRS77_BC https://ftp.ebi.ac.uk/p… TRUE                Breast cancer 
#> # ℹ 7 more variables: trait_additional_description <chr>,
#> #   pgs_method_name <chr>, pgs_method_params <chr>, n_variants <int>,
#> #   n_variants_interactions <int>, assembly <chr>, license <chr>

The predicted phenotype is also mapped to Experimental Factor Ontology (EFO) terms (a controlled vocabulary for the unambiguous identification of traits and diseases, and their relationships), namely, the EFO trait. The EFO traits associated with a polygenic score can also be found in scores objects in the slot traits, column trait:

pgs_01@traits
#> # A tibble: 1 × 5
#>   pgs_id    efo_id      trait            description                       url  
#>   <chr>     <chr>       <chr>            <chr>                             <chr>
#> 1 PGS000001 EFO_0000305 breast carcinoma A carcinoma that arises from epi… http…

Many PGSs have been developed and demonstrated to be predictive of common complex traits, e.g. body mass index (BMI)1, blood lipids2 and educational attainment3.

Similarly, PGSs for various diseases have been shown to be predictive of disease incidence, defining marked increases in risk over the life course or at earlier ages for people with high PGSs, e.g. coronary artery disease4,5, breast cancer6 and schizophrenia7.

Getting catalogued traits from PGS Catalog

If you are interested in retrieving polygenic scores from the Catalog, you might want to search them by the trait they predict. get_scores() is the function that searches for PGSs, however, this function only allows to search by pgs_id, efo_id or pubmed_id. So in order to search by a trait term, we need to first find the associated EFO identifiers (efo_id).

To search for traits (or diseases), you use the function get_traits(). With this function you can search by:

  • The EFO trait identifier: efo_id;
  • or by the trait term: a term to be matched in the EFO identifier (efo_id), label, description synonyms, trait categories, or external mapped terms.

The most useful search criteria is the trait term, and that is typically want you will want to use. Unless you already know the EFO trait you are interested in, and are looking for extra details about it, you won’t search directly with the EFO identifier.

Basic example

Let’s say you are interested in PGSs related to medical condition, stroke. Then you can search for "stroke" with get_traits():

get_traits(trait_term = 'stroke')
#> An object of class "traits"
#> Slot "traits":
#> # A tibble: 2 × 6
#>   efo_id      parent_efo_id is_child trait                    description  url  
#>   <chr>       <chr>         <lgl>    <chr>                    <chr>        <chr>
#> 1 EFO_0003763 NA            FALSE    cerebrovascular disorder A disorder … http…
#> 2 EFO_0000712 NA            FALSE    stroke                   A sudden lo… http…
#> 
#> Slot "pgs_ids":
#> # A tibble: 29 × 4
#>    efo_id      parent_efo_id is_child pgs_id   
#>    <chr>       <chr>         <lgl>    <chr>    
#>  1 EFO_0003763 NA            FALSE    PGS002053
#>  2 EFO_0000712 NA            FALSE    PGS000038
#>  3 EFO_0000712 NA            FALSE    PGS000039
#>  4 EFO_0000712 NA            FALSE    PGS000665
#>  5 EFO_0000712 NA            FALSE    PGS000911
#>  6 EFO_0000712 NA            FALSE    PGS001793
#>  7 EFO_0000712 NA            FALSE    PGS001798
#>  8 EFO_0000712 NA            FALSE    PGS002259
#>  9 EFO_0000712 NA            FALSE    PGS002724
#> 10 EFO_0000712 NA            FALSE    PGS002725
#> # ℹ 19 more rows
#> 
#> Slot "child_pgs_ids":
#> # A tibble: 38 × 4
#>    efo_id      parent_efo_id is_child child_pgs_id
#>    <chr>       <chr>         <lgl>    <chr>       
#>  1 EFO_0003763 NA            FALSE    PGS000038   
#>  2 EFO_0003763 NA            FALSE    PGS000039   
#>  3 EFO_0003763 NA            FALSE    PGS000665   
#>  4 EFO_0003763 NA            FALSE    PGS000911   
#>  5 EFO_0003763 NA            FALSE    PGS001179   
#>  6 EFO_0003763 NA            FALSE    PGS001793   
#>  7 EFO_0003763 NA            FALSE    PGS001798   
#>  8 EFO_0003763 NA            FALSE    PGS002052   
#>  9 EFO_0003763 NA            FALSE    PGS002259   
#> 10 EFO_0003763 NA            FALSE    PGS002724   
#> # ℹ 28 more rows
#> 
#> Slot "trait_categories":
#> # A tibble: 4 × 4
#>   efo_id      parent_efo_id is_child trait_categories      
#>   <chr>       <chr>         <lgl>    <chr>                 
#> 1 EFO_0003763 NA            FALSE    Cardiovascular disease
#> 2 EFO_0003763 NA            FALSE    Neurological disorder 
#> 3 EFO_0000712 NA            FALSE    Cardiovascular disease
#> 4 EFO_0000712 NA            FALSE    Neurological disorder 
#> 
#> Slot "trait_synonyms":
#> # A tibble: 87 × 4
#>    efo_id      parent_efo_id is_child trait_synonyms                  
#>    <chr>       <chr>         <lgl>    <chr>                           
#>  1 EFO_0003763 NA            FALSE    BRAIN VASCULAR DIS              
#>  2 EFO_0003763 NA            FALSE    Brain Vascular Disorder         
#>  3 EFO_0003763 NA            FALSE    Brain Vascular Disorders        
#>  4 EFO_0003763 NA            FALSE    CEREBROVASCULAR DIS             
#>  5 EFO_0003763 NA            FALSE    CVA                             
#>  6 EFO_0003763 NA            FALSE    CVA (cerebral vascular accident)
#>  7 EFO_0003763 NA            FALSE    Cerebrovascular Disease         
#>  8 EFO_0003763 NA            FALSE    Cerebrovascular Disorders       
#>  9 EFO_0003763 NA            FALSE    Cerebrovascular Insufficiencies 
#> 10 EFO_0003763 NA            FALSE    Cerebrovascular Insufficiency   
#> # ℹ 77 more rows
#> 
#> Slot "trait_mapped_terms":
#> # A tibble: 35 × 4
#>    efo_id      parent_efo_id is_child trait_mapped_terms
#>    <chr>       <chr>         <lgl>    <chr>             
#>  1 EFO_0003763 NA            FALSE    DOID:6713         
#>  2 EFO_0003763 NA            FALSE    ICD10:I66         
#>  3 EFO_0003763 NA            FALSE    ICD10:I67         
#>  4 EFO_0003763 NA            FALSE    ICD10:I69         
#>  5 EFO_0003763 NA            FALSE    ICD10CM:I60-I69   
#>  6 EFO_0003763 NA            FALSE    ICD9:430-438.99   
#>  7 EFO_0003763 NA            FALSE    ICD9:434.91       
#>  8 EFO_0003763 NA            FALSE    ICD9:437.8        
#>  9 EFO_0003763 NA            FALSE    ICD9:437.9        
#> 10 EFO_0003763 NA            FALSE    MEDGEN:858        
#> # ℹ 25 more rows

As can be seen from the returned traits object, we get a set of six tables (slots) that include several details about stroke.

In the first table traits we got only one row, indicating that this query returned only one trait in the Catalog. This trait is named "stroke" (column trait), and is unambiguously identified by the EFO identifier EFO_0000712.

Exact matching

By default, the trait term is matched exactly. If you want to relax the matching, then indicate with the parameter exact_term set to FALSE. This way you will get, potentially, more results, in this example case, ischemic stroke (HP_0002140) is now also returned:

get_traits(trait_term = 'stroke', exact_term = FALSE)
#> An object of class "traits"
#> Slot "traits":
#> # A tibble: 6 × 6
#>   efo_id      parent_efo_id is_child trait                     description url  
#>   <chr>       <chr>         <lgl>    <chr>                     <chr>       <chr>
#> 1 EFO_0003763 NA            FALSE    cerebrovascular disorder  A disorder… http…
#> 2 EFO_0005669 NA            FALSE    intracerebral hemorrhage  Bleeding i… http…
#> 3 HP_0002140  NA            FALSE    Ischemic stroke           Acute isch… http…
#> 4 EFO_0010555 NA            FALSE    left ventricular stroke … Quantifica… http…
#> 5 EFO_0000712 NA            FALSE    stroke                    A sudden l… http…
#> 6 HP_0001297  NA            FALSE    Stroke                    Sudden imp… http…
#> 
#> Slot "pgs_ids":
#> # A tibble: 40 × 4
#>    efo_id      parent_efo_id is_child pgs_id   
#>    <chr>       <chr>         <lgl>    <chr>    
#>  1 EFO_0003763 NA            FALSE    PGS002053
#>  2 EFO_0005669 NA            FALSE    PGS003457
#>  3 EFO_0005669 NA            FALSE    PGS004943
#>  4 HP_0002140  NA            FALSE    PGS000039
#>  5 HP_0002140  NA            FALSE    PGS000665
#>  6 HP_0002140  NA            FALSE    PGS000911
#>  7 HP_0002140  NA            FALSE    PGS002724
#>  8 HP_0002140  NA            FALSE    PGS002725
#>  9 HP_0002140  NA            FALSE    PGS004322
#> 10 HP_0002140  NA            FALSE    PGS004597
#> # ℹ 30 more rows
#> 
#> Slot "child_pgs_ids":
#> # A tibble: 46 × 4
#>    efo_id      parent_efo_id is_child child_pgs_id
#>    <chr>       <chr>         <lgl>    <chr>       
#>  1 EFO_0003763 NA            FALSE    PGS000038   
#>  2 EFO_0003763 NA            FALSE    PGS000039   
#>  3 EFO_0003763 NA            FALSE    PGS000665   
#>  4 EFO_0003763 NA            FALSE    PGS000911   
#>  5 EFO_0003763 NA            FALSE    PGS001179   
#>  6 EFO_0003763 NA            FALSE    PGS001793   
#>  7 EFO_0003763 NA            FALSE    PGS001798   
#>  8 EFO_0003763 NA            FALSE    PGS002052   
#>  9 EFO_0003763 NA            FALSE    PGS002259   
#> 10 EFO_0003763 NA            FALSE    PGS002724   
#> # ℹ 36 more rows
#> 
#> Slot "trait_categories":
#> # A tibble: 11 × 4
#>    efo_id      parent_efo_id is_child trait_categories          
#>    <chr>       <chr>         <lgl>    <chr>                     
#>  1 EFO_0003763 NA            FALSE    Cardiovascular disease    
#>  2 EFO_0003763 NA            FALSE    Neurological disorder     
#>  3 EFO_0005669 NA            FALSE    Cardiovascular disease    
#>  4 EFO_0005669 NA            FALSE    Neurological disorder     
#>  5 HP_0002140  NA            FALSE    Cardiovascular disease    
#>  6 HP_0002140  NA            FALSE    Neurological disorder     
#>  7 EFO_0010555 NA            FALSE    Cardiovascular measurement
#>  8 EFO_0000712 NA            FALSE    Cardiovascular disease    
#>  9 EFO_0000712 NA            FALSE    Neurological disorder     
#> 10 HP_0001297  NA            FALSE    Cardiovascular disease    
#> 11 HP_0001297  NA            FALSE    Neurological disorder     
#> 
#> Slot "trait_synonyms":
#> # A tibble: 94 × 4
#>    efo_id      parent_efo_id is_child trait_synonyms                  
#>    <chr>       <chr>         <lgl>    <chr>                           
#>  1 EFO_0003763 NA            FALSE    BRAIN VASCULAR DIS              
#>  2 EFO_0003763 NA            FALSE    Brain Vascular Disorder         
#>  3 EFO_0003763 NA            FALSE    Brain Vascular Disorders        
#>  4 EFO_0003763 NA            FALSE    CEREBROVASCULAR DIS             
#>  5 EFO_0003763 NA            FALSE    CVA                             
#>  6 EFO_0003763 NA            FALSE    CVA (cerebral vascular accident)
#>  7 EFO_0003763 NA            FALSE    Cerebrovascular Disease         
#>  8 EFO_0003763 NA            FALSE    Cerebrovascular Disorders       
#>  9 EFO_0003763 NA            FALSE    Cerebrovascular Insufficiencies 
#> 10 EFO_0003763 NA            FALSE    Cerebrovascular Insufficiency   
#> # ℹ 84 more rows
#> 
#> Slot "trait_mapped_terms":
#> # A tibble: 49 × 4
#>    efo_id      parent_efo_id is_child trait_mapped_terms
#>    <chr>       <chr>         <lgl>    <chr>             
#>  1 EFO_0003763 NA            FALSE    DOID:6713         
#>  2 EFO_0003763 NA            FALSE    ICD10:I66         
#>  3 EFO_0003763 NA            FALSE    ICD10:I67         
#>  4 EFO_0003763 NA            FALSE    ICD10:I69         
#>  5 EFO_0003763 NA            FALSE    ICD10CM:I60-I69   
#>  6 EFO_0003763 NA            FALSE    ICD9:430-438.99   
#>  7 EFO_0003763 NA            FALSE    ICD9:434.91       
#>  8 EFO_0003763 NA            FALSE    ICD9:437.8        
#>  9 EFO_0003763 NA            FALSE    ICD9:437.9        
#> 10 EFO_0003763 NA            FALSE    MEDGEN:858        
#> # ℹ 39 more rows

Subtraits (child traits)

By default, subtraits (child traits), are not retrieved by get_traits(). If you want to get all matching traits and those that are child traits thereof, then indicate with the parameter include_children set to TRUE. Here is an example with "breast cancer":

get_traits(trait_term = 'breast cancer', include_children = TRUE)
#> An object of class "traits"
#> Slot "traits":
#> # A tibble: 17 × 6
#>    efo_id        parent_efo_id is_child trait                  description url  
#>    <chr>         <chr>         <lgl>    <chr>                  <chr>       <chr>
#>  1 MONDO_0007254 NA            FALSE    breast cancer          A primary … http…
#>  2 EFO_1000650   MONDO_0007254 TRUE     estrogen-receptor neg… A subtype … http…
#>  3 EFO_1000649   MONDO_0007254 TRUE     estrogen-receptor pos… A subtype … http…
#>  4 EFO_0005537   MONDO_0007254 TRUE     triple-negative breas… An invasiv… http…
#>  5 EFO_0000305   MONDO_0007254 TRUE     breast carcinoma       A carcinom… http…
#>  6 EFO_0009780   MONDO_0007254 TRUE     HER2 negative breast … A biologic… http…
#>  7 MONDO_0021115 MONDO_0007254 TRUE     luminal B breast carc… A biologic… http…
#>  8 MONDO_0021116 MONDO_0007254 TRUE     luminal A breast carc… A biologic… http…
#>  9 EFO_1000294   MONDO_0007254 TRUE     HER2 Positive Breast … A biologic… http…
#> 10 EFO_0000305   NA            FALSE    breast carcinoma       A carcinom… http…
#> 11 EFO_1000650   EFO_0000305   TRUE     estrogen-receptor neg… A subtype … http…
#> 12 EFO_1000649   EFO_0000305   TRUE     estrogen-receptor pos… A subtype … http…
#> 13 EFO_0005537   EFO_0000305   TRUE     triple-negative breas… An invasiv… http…
#> 14 EFO_0009780   EFO_0000305   TRUE     HER2 negative breast … A biologic… http…
#> 15 MONDO_0021115 EFO_0000305   TRUE     luminal B breast carc… A biologic… http…
#> 16 MONDO_0021116 EFO_0000305   TRUE     luminal A breast carc… A biologic… http…
#> 17 EFO_1000294   EFO_0000305   TRUE     HER2 Positive Breast … A biologic… http…
#> 
#> Slot "pgs_ids":
#> # A tibble: 326 × 4
#>    efo_id      parent_efo_id is_child pgs_id   
#>    <chr>       <chr>         <lgl>    <chr>    
#>  1 EFO_1000650 MONDO_0007254 TRUE     PGS000003
#>  2 EFO_1000650 MONDO_0007254 TRUE     PGS000006
#>  3 EFO_1000650 MONDO_0007254 TRUE     PGS000009
#>  4 EFO_1000650 MONDO_0007254 TRUE     PGS000047
#>  5 EFO_1000650 MONDO_0007254 TRUE     PGS000346
#>  6 EFO_1000650 MONDO_0007254 TRUE     PGS000775
#>  7 EFO_1000650 MONDO_0007254 TRUE     PGS004867
#>  8 EFO_1000650 MONDO_0007254 TRUE     PGS004895
#>  9 EFO_1000650 MONDO_0007254 TRUE     PGS005106
#> 10 EFO_1000649 MONDO_0007254 TRUE     PGS000002
#> # ℹ 316 more rows
#> 
#> Slot "child_pgs_ids":
#> # A tibble: 231 × 4
#>    efo_id        parent_efo_id is_child child_pgs_id
#>    <chr>         <chr>         <lgl>    <chr>       
#>  1 MONDO_0007254 NA            FALSE    PGS000001   
#>  2 MONDO_0007254 NA            FALSE    PGS000002   
#>  3 MONDO_0007254 NA            FALSE    PGS000003   
#>  4 MONDO_0007254 NA            FALSE    PGS000004   
#>  5 MONDO_0007254 NA            FALSE    PGS000005   
#>  6 MONDO_0007254 NA            FALSE    PGS000006   
#>  7 MONDO_0007254 NA            FALSE    PGS000007   
#>  8 MONDO_0007254 NA            FALSE    PGS000008   
#>  9 MONDO_0007254 NA            FALSE    PGS000009   
#> 10 MONDO_0007254 NA            FALSE    PGS000015   
#> # ℹ 221 more rows
#> 
#> Slot "trait_categories":
#> # A tibble: 17 × 4
#>    efo_id        parent_efo_id is_child trait_categories
#>    <chr>         <chr>         <lgl>    <chr>           
#>  1 MONDO_0007254 NA            FALSE    Cancer          
#>  2 EFO_1000650   MONDO_0007254 TRUE     Cancer          
#>  3 EFO_1000649   MONDO_0007254 TRUE     Cancer          
#>  4 EFO_0005537   MONDO_0007254 TRUE     Cancer          
#>  5 EFO_0000305   MONDO_0007254 TRUE     Cancer          
#>  6 EFO_0009780   MONDO_0007254 TRUE     Cancer          
#>  7 MONDO_0021115 MONDO_0007254 TRUE     Cancer          
#>  8 MONDO_0021116 MONDO_0007254 TRUE     Cancer          
#>  9 EFO_1000294   MONDO_0007254 TRUE     Cancer          
#> 10 EFO_0000305   NA            FALSE    Cancer          
#> 11 EFO_1000650   EFO_0000305   TRUE     Cancer          
#> 12 EFO_1000649   EFO_0000305   TRUE     Cancer          
#> 13 EFO_0005537   EFO_0000305   TRUE     Cancer          
#> 14 EFO_0009780   EFO_0000305   TRUE     Cancer          
#> 15 MONDO_0021115 EFO_0000305   TRUE     Cancer          
#> 16 MONDO_0021116 EFO_0000305   TRUE     Cancer          
#> 17 EFO_1000294   EFO_0000305   TRUE     Cancer          
#> 
#> Slot "trait_synonyms":
#> # A tibble: 95 × 4
#>    efo_id        parent_efo_id is_child trait_synonyms                  
#>    <chr>         <chr>         <lgl>    <chr>                           
#>  1 MONDO_0007254 NA            FALSE    BC                              
#>  2 MONDO_0007254 NA            FALSE    breast cancer                   
#>  3 MONDO_0007254 NA            FALSE    breast tumor                    
#>  4 MONDO_0007254 NA            FALSE    breast tumour                   
#>  5 MONDO_0007254 NA            FALSE    cancer of breast                
#>  6 MONDO_0007254 NA            FALSE    malignant breast neoplasm       
#>  7 MONDO_0007254 NA            FALSE    malignant breast tumor          
#>  8 MONDO_0007254 NA            FALSE    malignant breast tumour         
#>  9 MONDO_0007254 NA            FALSE    malignant neoplasm of breast    
#> 10 MONDO_0007254 NA            FALSE    malignant neoplasm of the breast
#> # ℹ 85 more rows
#> 
#> Slot "trait_mapped_terms":
#> # A tibble: 101 × 4
#>    efo_id        parent_efo_id is_child trait_mapped_terms
#>    <chr>         <chr>         <lgl>    <chr>             
#>  1 MONDO_0007254 NA            FALSE    DOID:1612         
#>  2 MONDO_0007254 NA            FALSE    ICD10CM:C50       
#>  3 MONDO_0007254 NA            FALSE    ICD9:174.8        
#>  4 MONDO_0007254 NA            FALSE    MEDGEN:651        
#>  5 MONDO_0007254 NA            FALSE    NCIT:C9335        
#>  6 MONDO_0007254 NA            FALSE    SCTID:254837009   
#>  7 MONDO_0007254 NA            FALSE    UMLS:C0006142     
#>  8 EFO_1000650   MONDO_0007254 TRUE     DOID:0060076      
#>  9 EFO_1000650   MONDO_0007254 TRUE     EFO:1000650       
#> 10 EFO_1000650   MONDO_0007254 TRUE     MONDO:0006513     
#> # ℹ 91 more rows

The column is_child indicates whether that trait is being retrieved because it is a direct result of the query or not. is_child is TRUE when the trait is returned because it is a child trait of a matching trait, and FALSE if a direct result of the query.

In the case of child traits, the column parent_efo_id indicates the EFO trait identifier of the parent trait, i.e. the direct matching trait, or NA otherwise.

Getting all traits

To retrieve all traits simply leave the parameters efo_id and trait_term as NULL (default):

References

1.
2.
Kuchenbaecker, K. et al. The transferability of lipid loci across african, asian and european cohorts. Nature Communications 10, (2019).
3.
4.
Inouye, M. et al. Genomic risk prediction of coronary artery disease in 480,000 adults. Journal of the American College of Cardiology 72, 1883–1893 (2018).
5.
6.
Mavaddat, N. et al. Polygenic Risk Scores for Prediction of Breast Cancer and Breast Cancer Subtypes. The American Journal of Human Genetics 104, 21–34 (2019).
7.