The GWAS Catalog

The GWAS Catalog is a service provided by the EMBL-EBI and NHGRI that offers a manually curated and freely available database of published genome-wide association studies (GWAS). The Catalog website and infrastructure is hosted by the EMBL-EBI.

There are three ways to access the Catalog database:

gwasrapidd facilitates the access to the Catalog via the REST API, allowing you to programmatically retrieve data directly into R.

GWAS Catalog Entities

The Catalog REST API is organized around four core entities: studies, associations, variants, and traits. gwasrapidd provides four corresponding functions to get each of the entities: get_studies(), get_associations(), get_variants(), and get_traits().

Each function maps to an appropriately named S4 classed object: studies, associations, variants, and traits (see Figure 1).

Figure 1 | gwasrapidd retrieval functions.

Figure 1 | gwasrapidd retrieval functions.

You can use a combination of several search criteria with each retrieval function as shown in Figure 2. For example, if you want to get studies using either one of these two criteria: study accession identifier and variant identifier, you could run the following code:

library(gwasrapidd)
my_studies <- get_studies(study_id = 'GCST000858', variant_id = 'rs12752552')

This command returns all studies that match either 'GCST000858' or 'rs12752552'. This is equivalent to running get_studies separately on each criteria, and combining the results afterwards:

s1 <- get_studies(study_id = 'GCST000858')
s2 <- get_studies(variant_id = 'rs12752552')
my_studies <- union(s1, s2)

All four retrieval functions accept the set_operation parameter which defines the way the results obtained with each criterion are combined. The two options for this parameter are 'union' (default) or 'intersection', resulting, respectively, in an OR or AND operation.

Figure 2 | gwasrapidd arguments for retrieval functions. Colors indicate the criteria that can be used for retrieving GWAS Catalog entities: studies (green), associations (red), variants (purple), and traits (orange).

Figure 2 | gwasrapidd arguments for retrieval functions. Colors indicate the criteria that can be used for retrieving GWAS Catalog entities: studies (green), associations (red), variants (purple), and traits (orange).

Example 1 | Finding Risk Alleles Associated with Autoimmune Disease

As a first example, take the work by Light et al. (2014). In this work the authors focused on variants that had been previously reported in genome-wide association studies (GWAS) for autoimmune disease.

With gwasrapidd we can interrogate the GWAS Catalog for the study/studies by searching by autoimmune disease (an EFO trait). To do that let’s load gwasrapidd first:

library(gwasrapidd)

Then query the GWAS Catalog by EFO trait:

my_studies <- get_studies(efo_trait = 'autoimmune disease')

We can now check how many GWAS studies we got back:

n(my_studies)
#> [1] 3
my_studies@studies$study_id
#> [1] "GCST003097" "GCST007071" "GCST009873"

Apparently only 3 studies: GCST003097, GCST007071, GCST009873. Let’s see the associated publication titles:

my_studies@publications$title
#> [1] "Meta-analysis of shared genetic architecture across ten pediatric autoimmune diseases."                                     
#> [2] "Leveraging Polygenic Functional Enrichment to Improve GWAS Power."                                                          
#> [3] "Meta-analysis of Immunochip data of four autoimmune diseases reveals novel single-disease and cross-phenotype associations."

If you want to further inspect these publications, you can quickly browse the respective PubMed entries:

# This launches your web browser at https://www.ncbi.nlm.nih.gov/pubmed/26301688
open_in_pubmed(my_studies@publications$pubmed_id)

Now if we want to know the variants previously associated with autoimmune disease, as used by Light et al. (2014), we need to retrieve statistical association information on these variants, and then filter them based on the same level of significance \(P < 1\times 10^{-6}\) (Light et al. 2014).

So let’s start by getting the associations by study_id:

# You could have also used get_associations(efo_trait = 'autoimmune disease')
my_associations <- get_associations(study_id = my_studies@studies$study_id)

Seemingly, there are 169 associations.

n(my_associations)
#> [1] 169

However, not all variants meet the level of significance, as required by Light et al. (2014):

# Get association ids for which pvalue is less than 1e-6.
dplyr::filter(my_associations@associations, pvalue < 1e-6) %>% # Filter by p-value
  tidyr::drop_na(pvalue) %>%
  dplyr::pull(association_id) -> association_ids # Extract column association_id

Here we subset the my_associations object by a vector of association identifiers (association_ids) into a smaller object, my_associations2:

# Extract associations by association id
my_associations2 <- my_associations[association_ids]
n(my_associations2)
#> [1] 167

Of the 169 associations found in GWAS Catalog, 167 meet the p-value threshold of \(1\times 10^{-6}\). Here are the variants, and their respective risk allele and risk frequency:

my_associations2@risk_alleles[c('variant_id', 'risk_allele', 'risk_frequency')] %>%
  print(n = Inf)
#> # A tibble: 167 x 3
#>     variant_id  risk_allele risk_frequency
#>     <chr>       <chr>                <dbl>
#>   1 rs11580078  G                    0.43 
#>   2 rs6679677   A                    0.09 
#>   3 rs34884278  C                    0.3  
#>   4 rs6689858   C                    0.290
#>   5 rs2075184   T                    0.23 
#>   6 rs36001488  C                    0.48 
#>   7 rs4676410   A                    0.19 
#>   8 rs4625      G                    0.31 
#>   9 rs62324212  A                    0.42 
#>  10 rs7725052   C                    0.43 
#>  11 rs7731626   A                    0.39 
#>  12 rs4869313   T                    0.42 
#>  13 rs11741255  A                    0.42 
#>  14 rs755374    T                    0.32 
#>  15 rs36051895  T                    0.290
#>  16 rs4246905   T                    0.28 
#>  17 rs11145763  C                    0.4  
#>  18 rs706778    T                    0.41 
#>  19 rs10822050  C                    0.39 
#>  20 rs1250563   C                    0.290
#>  21 rs1332099   T                    0.46 
#>  22 rs17885785  T                    0.2  
#>  23 rs17466626  G                    0.02 
#>  24 rs1689510   C                    0.31 
#>  25 rs72743477  G                    0.21 
#>  26 rs12598357  G                    0.39 
#>  27 rs12928404  C                    0.38 
#>  28 rs117372389 T                    0.02 
#>  29 rs12232497  C                    0.45 
#>  30 rs62131887  T                    0.28 
#>  31 rs602662    G                    0.49 
#>  32 rs2738774   A                    0.32 
#>  33 rs2836882   A                    0.27 
#>  34 rs2066363   C                    0.34 
#>  35 rs114846446 A                    0.01 
#>  36 rs7672495   C                    0.18 
#>  37 rs7660520   A                    0.26 
#>  38 rs7831697   G                    0.25 
#>  39 rs7042370   T                    0.43 
#>  40 rs10988542  C                    0.08 
#>  41 rs7100025   G                    0.34 
#>  42 rs77150043  T                    0.23 
#>  43 rs2807264   C                    0.21 
#>  44 rs12863738  T                    0.17 
#>  45 rs10797431  <NA>                NA    
#>  46 rs72920202  <NA>                NA    
#>  47 rs10494079  <NA>                NA    
#>  48 rs2476601   <NA>                NA    
#>  49 rs1800601   <NA>                NA    
#>  50 rs11675342  <NA>                NA    
#>  51 rs1534430   <NA>                NA    
#>  52 rs67927699  <NA>                NA    
#>  53 rs5865      <NA>                NA    
#>  54 rs2075302   <NA>                NA    
#>  55 rs142647938 <NA>                NA    
#>  56 rs10202630  <NA>                NA    
#>  57 rs7568275   <NA>                NA    
#>  58 rs3087243   <NA>                NA    
#>  59 rs145268310 <NA>                NA    
#>  60 rs1921445   <NA>                NA    
#>  61 rs28583049  <NA>                NA    
#>  62 rs1530687   <NA>                NA    
#>  63 rs57791671  <NA>                NA    
#>  64 rs114558062 <NA>                NA    
#>  65 rs2030519   <NA>                NA    
#>  66 rs10937560  <NA>                NA    
#>  67 rs56817615  <NA>                NA    
#>  68 rs7441808   <NA>                NA    
#>  69 rs9683415   <NA>                NA    
#>  70 rs6840978   <NA>                NA    
#>  71 rs7655915   <NA>                NA    
#>  72 rs391851    <NA>                NA    
#>  73 rs114378220 <NA>                NA    
#>  74 rs11746555  <NA>                NA    
#>  75 rs1549922   <NA>                NA    
#>  76 rs9392504   <NA>                NA    
#>  77 rs72928038  <NA>                NA    
#>  78 rs761357    <NA>                NA    
#>  79 rs11757201  <NA>                NA    
#>  80 rs6914622   <NA>                NA    
#>  81 rs9356551   <NA>                NA    
#>  82 rs60600003  <NA>                NA    
#>  83 rs221781    <NA>                NA    
#>  84 rs3807307   <NA>                NA    
#>  85 rs1032129   <NA>                NA    
#>  86 rs11785816  <NA>                NA    
#>  87 rs865488    <NA>                NA    
#>  88 rs7005834   <NA>                NA    
#>  89 rs970987    <NA>                NA    
#>  90 rs1443438   <NA>                NA    
#>  91 rs13299616  <NA>                NA    
#>  92 rs10986284  <NA>                NA    
#>  93 rs706778    <NA>                NA    
#>  94 rs2181622   <NA>                NA    
#>  95 rs71508903  <NA>                NA    
#>  96 rs10748781  <NA>                NA    
#>  97 rs7088058   <NA>                NA    
#>  98 rs1199047   <NA>                NA    
#>  99 rs4409785   <NA>                NA    
#> 100 rs773107    <NA>                NA    
#> 101 rs4761587   <NA>                NA    
#> 102 rs1320344   <NA>                NA    
#> 103 rs7310615   <NA>                NA    
#> 104 rs191252491 <NA>                NA    
#> 105 rs9507287   <NA>                NA    
#> 106 rs76428106  <NA>                NA    
#> 107 rs2093816   <NA>                NA    
#> 108 rs9591325   <NA>                NA    
#> 109 rs55984493  <NA>                NA    
#> 110 rs11622435  <NA>                NA    
#> 111 rs10444776  <NA>                NA    
#> 112 rs11073337  <NA>                NA    
#> 113 rs8061370   <NA>                NA    
#> 114 rs78534766  <NA>                NA    
#> 115 rs11117433  <NA>                NA    
#> 116 rs35776863  <NA>                NA    
#> 117 rs13380830  <NA>                NA    
#> 118 rs73316435  <NA>                NA    
#> 119 rs1893217   <NA>                NA    
#> 120 rs1790588   <NA>                NA    
#> 121 rs10425559  <NA>                NA    
#> 122 rs34536443  <NA>                NA    
#> 123 rs11086102  <NA>                NA    
#> 124 rs12980063  <NA>                NA    
#> 125 rs240753    <NA>                NA    
#> 126 rs3765209   <NA>                NA    
#> 127 rs12482947  <NA>                NA    
#> 128 rs5754100   <NA>                NA    
#> 129 rs229541    <NA>                NA    
#> 130 rs6664969   A                   NA    
#> 131 rs1748041   C                   NA    
#> 132 rs2476601   A                   NA    
#> 133 rs1217403   C                   NA    
#> 134 rs10912267  A                   NA    
#> 135 rs13415465  G                   NA    
#> 136 rs12619531  G                   NA    
#> 137 rs10931468  A                   NA    
#> 138 rs6749371   T                   NA    
#> 139 rs7574865   T                   NA    
#> 140 rs7426056   A                   NA    
#> 141 rs3087243   A                   NA    
#> 142 rs35677470  A                   NA    
#> 143 rs17753641  G                   NA    
#> 144 rs16878091  A                   NA    
#> 145 rs1422673   T                   NA    
#> 146 rs72928038  A                   NA    
#> 147 rs11757201  C                   NA    
#> 148 rs58721818  T                   NA    
#> 149 rs212407    G                   NA    
#> 150 rs60600003  G                   NA    
#> 151 rs7780389   T                   NA    
#> 152 rs4731532   A                   NA    
#> 153 rs2812378   G                   NA    
#> 154 rs3118470   C                   NA    
#> 155 rs72776098  A                   NA    
#> 156 rs947474    G                   NA    
#> 157 rs3802604   G                   NA    
#> 158 rs1250568   C                   NA    
#> 159 rs10892299  T                   NA    
#> 160 rs11171739  C                   NA    
#> 161 rs8043085   T                   NA    
#> 162 rs34593439  A                   NA    
#> 163 rs1054609   C                   NA    
#> 164 rs2542148   C                   NA    
#> 165 rs74956615  A                   NA    
#> 166 rs1893592   C                   NA    
#> 167 rs66534072  G                   NA

References

Light, Nicholas, Véronique Adoue, Bing Ge, Shu-Huang Chen, Tony Kwan, and Tomi Pastinen. 2014. “Interrogation of Allelic Chromatin States in Human Cells by High-Density Chip-Genotyping.” Epigenetics 9 (9). Taylor & Francis: 1238–51.