Retrieves variants via the NHGRI-EBI GWAS Catalog REST API. The REST API is queried multiple times with the criteria passed as arguments (see below). By default all variants that match the criteria supplied in the arguments are retrieved: this corresponds to the default option set_operation set to 'union'. If you rather have only the variants that match simultaneously all criteria provided, then set set_operation to 'intersection'.

get_variants(
  study_id = NULL,
  association_id = NULL,
  variant_id = NULL,
  efo_id = NULL,
  pubmed_id = NULL,
  genomic_range = NULL,
  cytogenetic_band = NULL,
  gene_name = NULL,
  efo_trait = NULL,
  reported_trait = NULL,
  set_operation = "union",
  interactive = TRUE,
  std_chromosomes_only = TRUE,
  verbose = FALSE,
  warnings = TRUE
)

Arguments

study_id

A character vector of GWAS Catalog study accession identifiers.

association_id

A character vector of GWAS Catalog association identifiers.

variant_id

A character vector of GWAS Catalog variant identifiers.

efo_id

A character vector of EFO identifiers.

pubmed_id

An integer vector of PubMed identifiers.

genomic_range

A named list of three vectors:

chromosome

A character vector of chromosome names of the form 1--22, X or Y.

start

A numeric vector of start positions, starting at 1.

end

A numeric vector of end positions.

The three vectors need to be of the same length so that chromosome names, start and end positions can be matched by position.

cytogenetic_band

A character vector of cytogenetic bands of the form '1p36.11'.

gene_name

Gene symbol according to HUGO Gene Nomenclature (HGNC).

efo_trait

A character vector of EFO trait descriptions, e.g., 'uric acid measurement'.

reported_trait

A character vector of phenotypic traits as reported by the original authors of the study.

set_operation

Either 'union' or 'intersection'. This tells how variants retrieved by different criteria should be combined: 'union' binds together all results removing duplicates and 'intersection' only keeps same variants found with different criteria.

interactive

A logical. If all variants are requested, whether to ask interactively if we really want to proceed.

std_chromosomes_only

Whether to return only variants mapped to standard chromosomes: 1 thru 22, X, Y, and MT.

verbose

Whether the function should be verbose about the different queries or not.

warnings

Whether to print warnings.

Value

A variants object.

Details

Please note that all search criteria are vectorised, thus allowing for batch mode search, e.g., one can search by multiple variant identifiers at once by passing a vector of identifiers to variant_id.

Examples

# Get variants by study identifier get_variants(study_id = 'GCST001085')
#> An object of class "variants" #> Slot "variants": #> # A tibble: 18 x 7 #> variant_id merged functional_class chromosome_name chromosome_posi… #> <chr> <int> <chr> <chr> <int> #> 1 rs10496288 0 intergenic_vari… 2 83065441 #> 2 rs7827545 0 intron_variant 8 134554324 #> 3 rs10785581 0 intron_variant 12 45537190 #> 4 rs200759 0 intron_variant 20 15625776 #> 5 rs7960483 0 intron_variant 12 45531972 #> 6 rs1372662 0 intron_variant 8 134554803 #> 7 rs10496289 0 intergenic_vari… 2 83066256 #> 8 rs6452524 0 intron_variant 5 83137962 #> 9 rs200752 0 intron_variant 20 15618886 #> 10 rs3798440 0 intron_variant 6 75846902 #> 11 rs7735940 0 regulatory_regi… 5 36423829 #> 12 rs9350602 0 intron_variant 6 75850781 #> 13 rs10188442 0 intron_variant 2 132431666 #> 14 rs12522034 0 intergenic_vari… 5 36425491 #> 15 rs13420028 0 intron_variant 2 132430533 #> 16 rs2469997 0 intergenic_vari… 8 119341027 #> 17 rs6887846 0 intron_variant 5 83140542 #> 18 rs6469823 1 intergenic_vari… 8 119341744 #> # … with 2 more variables: chromosome_region <chr>, last_update_date <dttm> #> #> Slot "genomic_contexts": #> # A tibble: 148 x 12 #> variant_id gene_name chromosome_name chromosome_posi… distance is_mapped_gene #> <chr> <chr> <chr> <int> <int> <lgl> #> 1 rs10496288 AC138623… 2 83065441 153449 TRUE #> 2 rs10496288 LOC10537… 2 83065441 127905 FALSE #> 3 rs10496288 AC098817… 2 83065441 198933 TRUE #> 4 rs10496288 LOC11226… 2 83065441 148019 FALSE #> 5 rs7827545 AC015599… 8 134554324 43747 FALSE #> 6 rs7827545 AC015599… 8 134554324 43994 FALSE #> 7 rs7827545 LOC10041… 8 134554324 606901 FALSE #> 8 rs7827545 AC105180… 8 134554324 163837 FALSE #> 9 rs7827545 ZFAT 8 134554324 0 TRUE #> 10 rs7827545 ZFAT-AS1 8 134554324 43747 FALSE #> # … with 138 more rows, and 6 more variables: is_closest_gene <lgl>, #> # is_intergenic <lgl>, is_upstream <lgl>, is_downstream <lgl>, source <chr>, #> # mapping_method <chr> #> #> Slot "ensembl_ids": #> # A tibble: 86 x 3 #> variant_id gene_name ensembl_id #> <chr> <chr> <chr> #> 1 rs10496288 AC138623.1 ENSG00000223977 #> 2 rs10496288 AC098817.1 ENSG00000286211 #> 3 rs7827545 AC015599.1 ENSG00000276140 #> 4 rs7827545 AC015599.2 ENSG00000277732 #> 5 rs7827545 AC105180.2 ENSG00000288067 #> 6 rs7827545 ZFAT ENSG00000066827 #> 7 rs7827545 ZFAT-AS1 ENSG00000248492 #> 8 rs7827545 AC015599.3 ENSG00000278454 #> 9 rs10785581 AC079950.2 ENSG00000283545 #> 10 rs10785581 ANO6 ENSG00000177119 #> # … with 76 more rows #> #> Slot "entrez_ids": #> # A tibble: 65 x 3 #> variant_id gene_name entrez_id #> <chr> <chr> <chr> #> 1 rs10496288 LOC105374832 105374832 #> 2 rs10496288 LOC112268410 112268410 #> 3 rs7827545 LOC100419617 100419617 #> 4 rs7827545 ZFAT 57623 #> 5 rs7827545 ZFAT-AS1 594840 #> 6 rs10785581 LOC105369744 105369744 #> 7 rs10785581 ANO6 196527 #> 8 rs10785581 LOC100128248 100128248 #> 9 rs10785581 LOC105369743 105369743 #> 10 rs200759 MACROD2 140733 #> # … with 55 more rows #>
# Get variants by association identifier get_variants(association_id = '25389945')
#> An object of class "variants" #> Slot "variants": #> # A tibble: 27 x 7 #> variant_id merged functional_class chromosome_name chromosome_posi… #> <chr> <int> <chr> <chr> <int> #> 1 rs9486815 0 intron_variant 6 108126842 #> 2 rs4245535 0 intron_variant 6 108128008 #> 3 rs17069173 0 intergenic_vari… 6 108017063 #> 4 rs9374021 0 intron_variant 6 108106220 #> 5 rs9374007 0 intergenic_vari… 6 108037124 #> 6 rs9386694 0 intergenic_vari… 6 108025473 #> 7 rs9374002 0 non_coding_tran… 6 108030386 #> 8 rs218289 0 intron_variant 6 108133233 #> 9 rs1064346 0 3_prime_UTR_var… 6 108043747 #> 10 rs9374013 0 intron_variant 6 108060143 #> # … with 17 more rows, and 2 more variables: chromosome_region <chr>, #> # last_update_date <dttm> #> #> Slot "genomic_contexts": #> # A tibble: 388 x 12 #> variant_id gene_name chromosome_name chromosome_posi… distance is_mapped_gene #> <chr> <chr> <chr> <int> <int> <lgl> #> 1 rs9486815 NR2E1 6 108126842 39180 FALSE #> 2 rs9486815 OSTM1 6 108126842 0 TRUE #> 3 rs9486815 OSTM1-AS1 6 108126842 0 FALSE #> 4 rs9486815 Z98200.2 6 108126842 96124 FALSE #> 5 rs9486815 SNX3 6 108126842 84380 FALSE #> 6 rs9486815 OSTM1-AS1 6 108126842 0 TRUE #> 7 rs9486815 NR2E1 6 108126842 39180 FALSE #> 8 rs9486815 OSTM1 6 108126842 52101 FALSE #> 9 rs9486815 SNX3 6 108126842 84380 FALSE #> 10 rs9486815 AL078596… 6 108126842 52029 FALSE #> # … with 378 more rows, and 6 more variables: is_closest_gene <lgl>, #> # is_intergenic <lgl>, is_upstream <lgl>, is_downstream <lgl>, source <chr>, #> # mapping_method <chr> #> #> Slot "ensembl_ids": #> # A tibble: 234 x 3 #> variant_id gene_name ensembl_id #> <chr> <chr> <chr> #> 1 rs9486815 NR2E1 ENSG00000112333 #> 2 rs9486815 OSTM1 ENSG00000081087 #> 3 rs9486815 OSTM1-AS1 ENSG00000225174 #> 4 rs9486815 Z98200.2 ENSG00000271734 #> 5 rs9486815 SNX3 ENSG00000112335 #> 6 rs9486815 AL078596.1 ENSG00000279398 #> 7 rs9486815 Z98200.1 ENSG00000238490 #> 8 rs4245535 OSTM1-AS1 ENSG00000225174 #> 9 rs4245535 AL078596.1 ENSG00000279398 #> 10 rs4245535 NR2E1 ENSG00000112333 #> # … with 224 more rows #> #> Slot "entrez_ids": #> # A tibble: 154 x 3 #> variant_id gene_name entrez_id #> <chr> <chr> <chr> #> 1 rs9486815 NR2E1 7101 #> 2 rs9486815 OSTM1 28962 #> 3 rs9486815 OSTM1-AS1 100287366 #> 4 rs9486815 SNX3 8724 #> 5 rs9486815 LOC105377929 105377929 #> 6 rs4245535 OSTM1-AS1 100287366 #> 7 rs4245535 LOC105377929 105377929 #> 8 rs4245535 NR2E1 7101 #> 9 rs4245535 SNX3 8724 #> 10 rs4245535 OSTM1 28962 #> # … with 144 more rows #>
# Get a variant by its identifier get_variants(variant_id = 'rs3798440')
#> An object of class "variants" #> Slot "variants": #> # A tibble: 1 x 7 #> variant_id merged functional_class chromosome_name chromosome_posi… #> <chr> <int> <chr> <chr> <int> #> 1 rs3798440 0 intron_variant 6 75846902 #> # … with 2 more variables: chromosome_region <chr>, last_update_date <dttm> #> #> Slot "genomic_contexts": #> # A tibble: 9 x 12 #> variant_id gene_name chromosome_name chromosome_posi… distance is_mapped_gene #> <chr> <chr> <chr> <int> <int> <lgl> #> 1 rs3798440 RNA5SP209 6 75846902 18337 FALSE #> 2 rs3798440 RNU6-155P 6 75846902 78744 FALSE #> 3 rs3798440 AL392166… 6 75846902 81611 FALSE #> 4 rs3798440 IMPG1 6 75846902 74212 FALSE #> 5 rs3798440 RNU6-155P 6 75846902 78744 FALSE #> 6 rs3798440 RNA5SP209 6 75846902 18337 FALSE #> 7 rs3798440 MYO6 6 75846902 0 FALSE #> 8 rs3798440 IMPG1 6 75846902 74212 FALSE #> 9 rs3798440 MYO6 6 75846902 0 TRUE #> # … with 6 more variables: is_closest_gene <lgl>, is_intergenic <lgl>, #> # is_upstream <lgl>, is_downstream <lgl>, source <chr>, mapping_method <chr> #> #> Slot "ensembl_ids": #> # A tibble: 5 x 3 #> variant_id gene_name ensembl_id #> <chr> <chr> <chr> #> 1 rs3798440 RNA5SP209 ENSG00000223169 #> 2 rs3798440 RNU6-155P ENSG00000252156 #> 3 rs3798440 AL392166.1 ENSG00000200040 #> 4 rs3798440 IMPG1 ENSG00000112706 #> 5 rs3798440 MYO6 ENSG00000196586 #> #> Slot "entrez_ids": #> # A tibble: 4 x 3 #> variant_id gene_name entrez_id #> <chr> <chr> <chr> #> 1 rs3798440 RNA5SP209 100873469 #> 2 rs3798440 RNU6-155P 106481224 #> 3 rs3798440 IMPG1 3617 #> 4 rs3798440 MYO6 4646 #>
# Get variants by EFO trait identifier get_variants(efo_id = 'EFO_0005537')
#> An object of class "variants" #> Slot "variants": #> # A tibble: 8 x 7 #> variant_id merged functional_class chromosome_name chromosome_posi… #> <chr> <int> <chr> <chr> <int> #> 1 rs2464195 0 missense_variant 12 120997672 #> 2 rs17215231 0 5_prime_UTR_var… 6 33272092 #> 3 rs10069690 0 intron_variant 5 1279675 #> 4 rs2363956 0 missense_variant 19 17283315 #> 5 rs4245739 0 3_prime_UTR_var… 1 204549714 #> 6 rs10771399 0 intergenic_vari… 12 28002147 #> 7 rs3757318 0 intron_variant 6 151592978 #> 8 rs78378222 0 3_prime_UTR_var… 17 7668434 #> # … with 2 more variables: chromosome_region <chr>, last_update_date <dttm> #> #> Slot "genomic_contexts": #> # A tibble: 384 x 12 #> variant_id gene_name chromosome_name chromosome_posi… distance is_mapped_gene #> <chr> <chr> <chr> <int> <int> <lgl> #> 1 rs2464195 HNF1A-AS1 12 120997672 25380 FALSE #> 2 rs2464195 CLIC1P1 12 120997672 82349 FALSE #> 3 rs2464195 SPPL3 12 120997672 93314 FALSE #> 4 rs2464195 AC069214… 12 120997672 89735 FALSE #> 5 rs2464195 LOC10537… 12 120997672 71187 FALSE #> 6 rs2464195 OASL 12 120997672 21439 FALSE #> 7 rs2464195 OASL2P 12 120997672 56060 FALSE #> 8 rs2464195 HNF1A-AS1 12 120997672 16707 FALSE #> 9 rs2464195 XLOC_009… 12 120997672 89598 FALSE #> 10 rs2464195 SPPL3 12 120997672 93314 FALSE #> # … with 374 more rows, and 6 more variables: is_closest_gene <lgl>, #> # is_intergenic <lgl>, is_upstream <lgl>, is_downstream <lgl>, source <chr>, #> # mapping_method <chr> #> #> Slot "ensembl_ids": #> # A tibble: 188 x 3 #> variant_id gene_name ensembl_id #> <chr> <chr> <chr> #> 1 rs2464195 HNF1A-AS1 ENSG00000241388 #> 2 rs2464195 CLIC1P1 ENSG00000231313 #> 3 rs2464195 SPPL3 ENSG00000157837 #> 4 rs2464195 AC069214.1 ENSG00000286493 #> 5 rs2464195 OASL ENSG00000135114 #> 6 rs2464195 OASL2P ENSG00000283542 #> 7 rs2464195 HNF1A ENSG00000135100 #> 8 rs2464195 AC079602.4 ENSG00000279001 #> 9 rs2464195 AC079602.3 ENSG00000256963 #> 10 rs2464195 C12orf43 ENSG00000157895 #> # … with 178 more rows #> #> Slot "entrez_ids": #> # A tibble: 91 x 3 #> variant_id gene_name entrez_id #> <chr> <chr> <chr> #> 1 rs2464195 HNF1A-AS1 283460 #> 2 rs2464195 CLIC1P1 390363 #> 3 rs2464195 SPPL3 121665 #> 4 rs2464195 LOC105378258 105378258 #> 5 rs2464195 OASL 8638 #> 6 rs2464195 OASL2P 111216278 #> 7 rs2464195 XLOC_009911 105500240 #> 8 rs2464195 HNF1A 6927 #> 9 rs2464195 C12orf43 64897 #> 10 rs2464195 RPL12P33 643550 #> # … with 81 more rows #>