Skip to contents

Retrieves variants via the NHGRI-EBI GWAS Catalog REST API. The REST API is queried multiple times with the criteria passed as arguments (see below). By default all variants that match the criteria supplied in the arguments are retrieved: this corresponds to the default option set_operation set to 'union'. If you rather have only the variants that match simultaneously all criteria provided, then set set_operation to 'intersection'.

Usage

get_variants(
  study_id = NULL,
  association_id = NULL,
  variant_id = NULL,
  efo_id = NULL,
  pubmed_id = NULL,
  genomic_range = NULL,
  cytogenetic_band = NULL,
  gene_name = NULL,
  efo_trait = NULL,
  reported_trait = NULL,
  set_operation = "union",
  interactive = TRUE,
  std_chromosomes_only = TRUE,
  verbose = FALSE,
  warnings = TRUE
)

Arguments

study_id

A character vector of GWAS Catalog study accession identifiers.

association_id

A character vector of GWAS Catalog association identifiers.

variant_id

A character vector of GWAS Catalog variant identifiers.

efo_id

A character vector of EFO identifiers.

pubmed_id

An integer vector of PubMed identifiers.

genomic_range

A named list of three vectors:

chromosome

A character vector of chromosome names of the form 1--22, X or Y.

start

A numeric vector of start positions, starting at 1.

end

A numeric vector of end positions.

The three vectors need to be of the same length so that chromosome names, start and end positions can be matched by position.

cytogenetic_band

A character vector of cytogenetic bands of the form '1p36.11'.

gene_name

Gene symbol according to HUGO Gene Nomenclature (HGNC).

efo_trait

A character vector of EFO trait descriptions, e.g., 'uric acid measurement'.

reported_trait

A character vector of phenotypic traits as reported by the original authors of the study.

set_operation

Either 'union' or 'intersection'. This tells how variants retrieved by different criteria should be combined: 'union' binds together all results removing duplicates and 'intersection' only keeps same variants found with different criteria.

interactive

A logical. If all variants are requested, whether to ask interactively if we really want to proceed.

std_chromosomes_only

Whether to return only variants mapped to standard chromosomes: 1 thru 22, X, Y, and MT.

verbose

Whether the function should be verbose about the different queries or not.

warnings

Whether to print warnings.

Value

A variants object.

Details

Please note that all search criteria are vectorised, thus allowing for batch mode search, e.g., one can search by multiple variant identifiers at once by passing a vector of identifiers to variant_id.

Examples

# Get variants by study identifier
get_variants(study_id = 'GCST001085', warnings = FALSE)
#> An object of class "variants"
#> Slot "variants":
#> # A tibble: 18 × 7
#>    variant_id merged functional_class        chromosome_name chromosome_position
#>    <chr>       <int> <chr>                   <chr>                         <int>
#>  1 rs6469823       1 intergenic_variant      8                         119341744
#>  2 rs200752        0 intron_variant          20                         15618886
#>  3 rs1372662       0 intron_variant          8                         134554803
#>  4 rs2469997       0 intergenic_variant      8                         119341027
#>  5 rs12522034      0 intergenic_variant      5                          36425491
#>  6 rs7735940       0 regulatory_region_vari… 5                          36423829
#>  7 rs7960483       0 intron_variant          12                         45531972
#>  8 rs6887846       0 intron_variant          5                          83140542
#>  9 rs3798440       0 intron_variant          6                          75846902
#> 10 rs6452524       0 intron_variant          5                          83137962
#> 11 rs7827545       0 intron_variant          8                         134554324
#> 12 rs10496288      0 intergenic_variant      2                          83065441
#> 13 rs13420028      0 intron_variant          2                         132430533
#> 14 rs10785581      0 intron_variant          12                         45537190
#> 15 rs10188442      0 intron_variant          2                         132431666
#> 16 rs10496289      0 intergenic_variant      2                          83066256
#> 17 rs9350602       0 intron_variant          6                          75850781
#> 18 rs200759        0 intron_variant          20                         15625776
#> # ℹ 2 more variables: chromosome_region <chr>, last_update_date <dttm>
#> 
#> Slot "genomic_contexts":
#> # A tibble: 138 × 12
#>    variant_id gene_name    chromosome_name chromosome_position distance
#>    <chr>      <chr>        <chr>                         <int>    <int>
#>  1 rs6469823  MIR548AZ     8                         119341744    16479
#>  2 rs6469823  CCN3         8                         119341744    74702
#>  3 rs6469823  MIR548AZ     8                         119341744    16479
#>  4 rs6469823  MAL2-AS1     8                         119341744    94887
#>  5 rs6469823  LOC124900266 8                         119341744    46604
#>  6 rs6469823  CCN3         8                         119341744    74702
#>  7 rs6469823  MAL2-AS1     8                         119341744    94896
#>  8 rs6469823  MAL2         8                         119341744    96071
#>  9 rs6469823  MAL2         8                         119341744    96071
#> 10 rs6469823  LOC124902009 8                         119341744     8146
#> # ℹ 128 more rows
#> # ℹ 7 more variables: is_mapped_gene <lgl>, is_closest_gene <lgl>,
#> #   is_intergenic <lgl>, is_upstream <lgl>, is_downstream <lgl>, source <chr>,
#> #   mapping_method <chr>
#> 
#> Slot "ensembl_ids":
#> # A tibble: 62 × 3
#>    variant_id gene_name ensembl_id     
#>    <chr>      <chr>     <chr>          
#>  1 rs6469823  MIR548AZ  ENSG00000276479
#>  2 rs6469823  CCN3      ENSG00000136999
#>  3 rs6469823  MAL2-AS1  ENSG00000253972
#>  4 rs6469823  MAL2      ENSG00000147676
#>  5 rs200752   ENSAP1    ENSG00000224274
#>  6 rs200752   RNA5SP475 ENSG00000222500
#>  7 rs200752   MACROD2   ENSG00000172264
#>  8 rs1372662  MTCO1P49  ENSG00000253916
#>  9 rs1372662  ZFAT      ENSG00000066827
#> 10 rs1372662  ZFAT-AS1  ENSG00000248492
#> # ℹ 52 more rows
#> 
#> Slot "entrez_ids":
#> # A tibble: 89 × 3
#>    variant_id gene_name    entrez_id
#>    <chr>      <chr>        <chr>    
#>  1 rs6469823  MIR548AZ     102466162
#>  2 rs6469823  CCN3         4856     
#>  3 rs6469823  MAL2-AS1     105375726
#>  4 rs6469823  LOC124900266 124900266
#>  5 rs6469823  MAL2         114569   
#>  6 rs6469823  LOC124902009 124902009
#>  7 rs200752   ENSAP1       170511   
#>  8 rs200752   RNA5SP475    100873717
#>  9 rs200752   MACROD2      140733   
#> 10 rs1372662  MTCO1P49     107075177
#> # ℹ 79 more rows
#> 

# Get a variant by its identifier
if (FALSE) {
get_variants(variant_id = 'rs3798440', warnings = FALSE)
}