The GWAS Catalog

The GWAS Catalog is a service provided by the EMBL-EBI and NHGRI that offers a manually curated and freely available database of published genome-wide association studies (GWAS). The Catalog website and infrastructure is hosted by the EMBL-EBI.

There are three ways to access the Catalog database:

gwasrapidd facilitates the access to the Catalog via the REST API, allowing you to programmatically retrieve data directly into R.

GWAS Catalog Entities

The Catalog REST API is organized around four core entities: studies, associations, variants, and traits. gwasrapidd provides four corresponding functions to get each of the entities: get_studies(), get_associations(), get_variants(), and get_traits().

Each function maps to an appropriately named S4 classed object: studies, associations, variants, and traits (see Figure 1).

Figure 1 | gwasrapidd retrieval functions.

Figure 1 | gwasrapidd retrieval functions.

You can use a combination of several search criteria with each retrieval function as shown in Figure 2. For example, if you want to get studies using either one of these two criteria: study accession identifier and variant identifier, you could run the following code:

library(gwasrapidd)
my_studies <- get_studies(study_id = 'GCST000858', variant_id = 'rs12752552')

This command returns all studies that match either 'GCST000858' or 'rs12752552'. This is equivalent to running get_studies separately on each criteria, and combining the results afterwards:

s1 <- get_studies(study_id = 'GCST000858')
s2 <- get_studies(variant_id = 'rs12752552')
my_studies <- union(s1, s2)

All four retrieval functions accept the set_operation parameter which defines the way the results obtained with each criterion are combined. The two options for this parameter are 'union' (default) or 'intersection', resulting, respectively, in an OR or AND operation.

Figure 2 | gwasrapidd arguments for retrieval functions. Colors indicate the criteria that can be used for retrieving GWAS Catalog entities: studies (green), associations (red), variants (purple), and traits (orange).

Figure 2 | gwasrapidd arguments for retrieval functions. Colors indicate the criteria that can be used for retrieving GWAS Catalog entities: studies (green), associations (red), variants (purple), and traits (orange).

Example 1 | Finding Risk Alleles Associated with Autoimmune Disease

As a first example, take the work by Light et al. (2014). In this work the authors focused on variants that had been previously reported in genome-wide association studies (GWAS) for autoimmune disease.

With gwasrapidd we can interrogate the GWAS Catalog for the study/studies by searching by autoimmune disease (an EFO trait). To do that let’s load gwasrapidd first:

library(gwasrapidd)

Then query the GWAS Catalog by EFO trait:

my_studies <- get_studies(efo_trait = 'autoimmune disease')

We can now check how many GWAS studies we got back:

Apparently only 2 studies: GCST003097, GCST007071. Let’s see the associated publication titles:

If you want to further inspect these publications, you can quickly browse the respective PubMed entries:

Now if we want to know the variants previously associated with autoimmune disease, as used by Light et al. (2014), we need to retrieve statistical association information on these variants, and then filter them based on the same level of significance \(P < 1\times 10^{-6}\) (Light et al. 2014).

So let’s start by getting the associations by study_id:

Seemingly, there are 131 associations.

n(my_associations)
#> [1] 131

However, not all variants meet the level of significance, as required by Light et al. (2014):

# Get association ids for which pvalue is less than 1e-6.
dplyr::filter(my_associations@associations, pvalue < 1e-6) %>% # Filter by p-value
  tidyr::drop_na(pvalue) %>%
  dplyr::pull(association_id) -> association_ids # Extract column association_id

Here we subset the my_associations object by a vector of association identifiers (association_ids) into a smaller object, my_associations2:

Of the 131 associations found in GWAS Catalog, 129 meet the p-value threshold of \(1\times 10^{-6}\). Here are the variants, and their respective risk allele and risk frequency:

my_associations2@risk_alleles[c('variant_id', 'risk_allele', 'risk_frequency')] %>%
  print(n = Inf)
#> # A tibble: 129 x 3
#>     variant_id  risk_allele risk_frequency
#>     <chr>       <chr>                <dbl>
#>   1 rs2066363   C                    0.34 
#>   2 rs114846446 A                    0.01 
#>   3 rs7672495   C                    0.18 
#>   4 rs7660520   A                    0.26 
#>   5 rs7831697   G                    0.25 
#>   6 rs7042370   T                    0.43 
#>   7 rs10988542  C                    0.08 
#>   8 rs7100025   G                    0.34 
#>   9 rs77150043  T                    0.23 
#>  10 rs2807264   C                    0.21 
#>  11 rs12863738  T                    0.17 
#>  12 rs11580078  G                    0.43 
#>  13 rs6679677   A                    0.09 
#>  14 rs34884278  C                    0.3  
#>  15 rs6689858   C                    0.290
#>  16 rs2075184   T                    0.23 
#>  17 rs36001488  C                    0.48 
#>  18 rs4676410   A                    0.19 
#>  19 rs4625      G                    0.31 
#>  20 rs7725052   C                    0.43 
#>  21 rs7731626   A                    0.39 
#>  22 rs755374    T                    0.32 
#>  23 rs36051895  T                    0.290
#>  24 rs4246905   T                    0.28 
#>  25 rs706778    T                    0.41 
#>  26 rs10822050  C                    0.39 
#>  27 rs17885785  T                    0.2  
#>  28 rs17466626  G                    0.02 
#>  29 rs12598357  G                    0.39 
#>  30 rs12928404  C                    0.38 
#>  31 rs62131887  T                    0.28 
#>  32 rs2836882   A                    0.27 
#>  33 rs62324212  A                    0.42 
#>  34 rs4869313   T                    0.42 
#>  35 rs11741255  A                    0.42 
#>  36 rs11145763  C                    0.4  
#>  37 rs1250563   C                    0.290
#>  38 rs1332099   T                    0.46 
#>  39 rs72743477  G                    0.21 
#>  40 rs117372389 T                    0.02 
#>  41 rs602662    G                    0.49 
#>  42 rs2738774   A                    0.32 
#>  43 rs1689510   C                    0.31 
#>  44 rs12232497  C                    0.45 
#>  45 rs7088058   <NA>                NA    
#>  46 rs773107    <NA>                NA    
#>  47 rs4761587   <NA>                NA    
#>  48 rs191252491 <NA>                NA    
#>  49 rs2093816   <NA>                NA    
#>  50 rs9591325   <NA>                NA    
#>  51 rs11073337  <NA>                NA    
#>  52 rs8061370   <NA>                NA    
#>  53 rs13380830  <NA>                NA    
#>  54 rs73316435  <NA>                NA    
#>  55 rs1790588   <NA>                NA    
#>  56 rs12980063  <NA>                NA    
#>  57 rs240753    <NA>                NA    
#>  58 rs229541    <NA>                NA    
#>  59 rs10797431  <NA>                NA    
#>  60 rs72920202  <NA>                NA    
#>  61 rs10494079  <NA>                NA    
#>  62 rs2476601   <NA>                NA    
#>  63 rs1800601   <NA>                NA    
#>  64 rs11675342  <NA>                NA    
#>  65 rs1534430   <NA>                NA    
#>  66 rs67927699  <NA>                NA    
#>  67 rs5865      <NA>                NA    
#>  68 rs2075302   <NA>                NA    
#>  69 rs142647938 <NA>                NA    
#>  70 rs10202630  <NA>                NA    
#>  71 rs7568275   <NA>                NA    
#>  72 rs3087243   <NA>                NA    
#>  73 rs145268310 <NA>                NA    
#>  74 rs1921445   <NA>                NA    
#>  75 rs28583049  <NA>                NA    
#>  76 rs1530687   <NA>                NA    
#>  77 rs57791671  <NA>                NA    
#>  78 rs114558062 <NA>                NA    
#>  79 rs2030519   <NA>                NA    
#>  80 rs10937560  <NA>                NA    
#>  81 rs56817615  <NA>                NA    
#>  82 rs7441808   <NA>                NA    
#>  83 rs9683415   <NA>                NA    
#>  84 rs6840978   <NA>                NA    
#>  85 rs7655915   <NA>                NA    
#>  86 rs391851    <NA>                NA    
#>  87 rs114378220 <NA>                NA    
#>  88 rs11746555  <NA>                NA    
#>  89 rs1549922   <NA>                NA    
#>  90 rs9392504   <NA>                NA    
#>  91 rs72928038  <NA>                NA    
#>  92 rs761357    <NA>                NA    
#>  93 rs11757201  <NA>                NA    
#>  94 rs6914622   <NA>                NA    
#>  95 rs9356551   <NA>                NA    
#>  96 rs60600003  <NA>                NA    
#>  97 rs221781    <NA>                NA    
#>  98 rs3807307   <NA>                NA    
#>  99 rs1032129   <NA>                NA    
#> 100 rs11785816  <NA>                NA    
#> 101 rs865488    <NA>                NA    
#> 102 rs7005834   <NA>                NA    
#> 103 rs970987    <NA>                NA    
#> 104 rs1443438   <NA>                NA    
#> 105 rs13299616  <NA>                NA    
#> 106 rs10986284  <NA>                NA    
#> 107 rs706778    <NA>                NA    
#> 108 rs2181622   <NA>                NA    
#> 109 rs71508903  <NA>                NA    
#> 110 rs10748781  <NA>                NA    
#> 111 rs1199047   <NA>                NA    
#> 112 rs1320344   <NA>                NA    
#> 113 rs7310615   <NA>                NA    
#> 114 rs76428106  <NA>                NA    
#> 115 rs11622435  <NA>                NA    
#> 116 rs10444776  <NA>                NA    
#> 117 rs11117433  <NA>                NA    
#> 118 rs1893217   <NA>                NA    
#> 119 rs10425559  <NA>                NA    
#> 120 rs11086102  <NA>                NA    
#> 121 rs12482947  <NA>                NA    
#> 122 rs4409785   <NA>                NA    
#> 123 rs9507287   <NA>                NA    
#> 124 rs55984493  <NA>                NA    
#> 125 rs78534766  <NA>                NA    
#> 126 rs35776863  <NA>                NA    
#> 127 rs34536443  <NA>                NA    
#> 128 rs3765209   <NA>                NA    
#> 129 rs5754100   <NA>                NA

References

Light, Nicholas, Véronique Adoue, Bing Ge, Shu-Huang Chen, Tony Kwan, and Tomi Pastinen. 2014. “Interrogation of Allelic Chromatin States in Human Cells by High-Density Chip-Genotyping.” Epigenetics 9 (9): 1238–51.