Retrieves variants via the NHGRI-EBI GWAS Catalog REST API. The REST API is queried multiple times with the criteria passed as arguments (see below). By default all variants that match the criteria supplied in the arguments are retrieved: this corresponds to the default option set_operation set to 'union'. If you rather have only the variants that match simultaneously all criteria provided, then set set_operation to 'intersection'.

get_variants(study_id = NULL, association_id = NULL,
  variant_id = NULL, efo_id = NULL, pubmed_id = NULL,
  genomic_range = NULL, cytogenetic_band = NULL, gene_name = NULL,
  efo_trait = NULL, reported_trait = NULL, set_operation = "union",
  interactive = TRUE, std_chromosomes_only = TRUE, verbose = FALSE,
  warnings = TRUE)

Arguments

study_id

A character vector of GWAS Catalog study accession identifiers.

association_id

A character vector of GWAS Catalog association identifiers.

variant_id

A character vector of GWAS Catalog variant identifiers.

efo_id

A character vector of EFO identifiers.

pubmed_id

An integer vector of PubMed identifiers.

genomic_range

A named list of three vectors:

chromosome

A character vector of chromosome names of the form 1--22, X or Y.

start

A numeric vector of start positions, starting at 1.

end

A numeric vector of end positions.

The three vectors need to be of the same length so that chromosome names, start and end positions can be matched by position.

cytogenetic_band

A character vector of cytogenetic bands of the form '1p36.11'.

gene_name

Gene symbol according to HUGO Gene Nomenclature (HGNC).

efo_trait

A character vector of EFO trait descriptions, e.g., 'uric acid measurement'.

reported_trait

A character vector of phenotypic traits as reported by the original authors of the study.

set_operation

Either 'union' or 'intersection'. This tells how variants retrieved by different criteria should be combined: 'union' binds together all results removing duplicates and 'intersection' only keeps same variants found with different criteria.

interactive

A logical. If all variants are requested, whether to ask interactively if we really want to proceed.

std_chromosomes_only

Whether to return only variants mapped to standard chromosomes: 1 thru 22, X, Y, and MT.

verbose

Whether the function should be verbose about the different queries or not.

warnings

Whether to print warnings.

Value

A variants object.

Details

Please note that all search criteria are vectorised, thus allowing for batch mode search, e.g., one can search by multiple variant identifiers at once by passing a vector of identifiers to variant_id.

Examples

# Get variants by study identifier get_variants(study_id = 'GCST001085')
#> An object of class "variants" #> Slot "variants": #> # A tibble: 18 x 7 #> variant_id merged functional_class chromosome_name chromosome_posi… #> <chr> <int> <chr> <chr> <int> #> 1 rs10496288 0 intergenic_vari… 2 83065441 #> 2 rs7827545 0 intron_variant 8 134554324 #> 3 rs10785581 0 intron_variant 12 45537190 #> 4 rs200759 0 intron_variant 20 15625776 #> 5 rs7960483 0 intron_variant 12 45531972 #> 6 rs1372662 0 intron_variant 8 134554803 #> 7 rs10496289 0 intergenic_vari… 2 83066256 #> 8 rs6452524 0 intron_variant 5 83137962 #> 9 rs200752 0 intron_variant 20 15618886 #> 10 rs3798440 0 intron_variant 6 75846902 #> 11 rs7735940 0 regulatory_regi… 5 36423829 #> 12 rs9350602 0 intron_variant 6 75850781 #> 13 rs10188442 0 intron_variant 2 132431666 #> 14 rs12522034 0 intergenic_vari… 5 36425491 #> 15 rs13420028 0 intron_variant 2 132430533 #> 16 rs2469997 0 intergenic_vari… 8 119341027 #> 17 rs6887846 0 intron_variant 5 83140542 #> 18 rs6469823 1 intergenic_vari… 8 119341744 #> # … with 2 more variables: chromosome_region <chr>, last_update_date <dttm> #> #> Slot "genomic_contexts": #> # A tibble: 142 x 12 #> variant_id gene_name chromosome_name chromosome_posi… distance is_mapped_gene #> <chr> <chr> <chr> <int> <int> <lgl> #> 1 rs10496288 LOC11226… 2 83065441 148019 FALSE #> 2 rs10496288 AC138623… 2 83065441 153449 TRUE #> 3 rs10496288 AC098817… 2 83065441 198933 TRUE #> 4 rs10496288 LOC10537… 2 83065441 127905 FALSE #> 5 rs7827545 ZFAT-AS1 8 134554324 43747 FALSE #> 6 rs7827545 RF02213 8 134554324 45933 FALSE #> 7 rs7827545 AC105180… 8 134554324 233877 FALSE #> 8 rs7827545 LOC10041… 8 134554324 606901 FALSE #> 9 rs7827545 RF02212 8 134554324 43994 FALSE #> 10 rs7827545 ZFAT-AS1 8 134554324 43747 FALSE #> # … with 132 more rows, and 6 more variables: is_closest_gene <lgl>, #> # is_intergenic <lgl>, is_upstream <lgl>, is_downstream <lgl>, source <chr>, #> # mapping_method <chr> #> #> Slot "ensembl_ids": #> # A tibble: 80 x 3 #> variant_id gene_name ensembl_id #> <chr> <chr> <chr> #> 1 rs10496288 AC138623.1 ENSG00000223977 #> 2 rs10496288 AC098817.1 ENSG00000286211 #> 3 rs7827545 ZFAT-AS1 ENSG00000248492 #> 4 rs7827545 RF02213 ENSG00000278454 #> 5 rs7827545 AC105180.1 ENSG00000253627 #> 6 rs7827545 RF02212 ENSG00000277732 #> 7 rs7827545 ZFAT ENSG00000066827 #> 8 rs7827545 RF02211 ENSG00000276140 #> 9 rs10785581 AC079950.1 ENSG00000257657 #> 10 rs10785581 RF00026 ENSG00000283564 #> # … with 70 more rows #> #> Slot "entrez_ids": #> # A tibble: 64 x 3 #> variant_id gene_name entrez_id #> <chr> <chr> <chr> #> 1 rs10496288 LOC112268410 112268410 #> 2 rs10496288 LOC105374832 105374832 #> 3 rs7827545 ZFAT-AS1 594840 #> 4 rs7827545 LOC100419617 100419617 #> 5 rs7827545 ZFAT 57623 #> 6 rs10785581 LOC105369743 105369743 #> 7 rs10785581 ANO6 196527 #> 8 rs10785581 LOC100128248 100128248 #> 9 rs10785581 LOC105369744 105369744 #> 10 rs200759 LOC107985394 107985394 #> # … with 54 more rows #>
# Get variants by association identifier get_variants(association_id = '25389945')
#> An object of class "variants" #> Slot "variants": #> # A tibble: 27 x 7 #> variant_id merged functional_class chromosome_name chromosome_posi… #> <chr> <int> <chr> <chr> <int> #> 1 rs9486815 0 intron_variant 6 108126842 #> 2 rs4245535 0 intron_variant 6 108128008 #> 3 rs17069173 0 intergenic_vari… 6 108017063 #> 4 rs9374021 0 intron_variant 6 108106220 #> 5 rs9374007 0 intergenic_vari… 6 108037124 #> 6 rs9386694 0 intergenic_vari… 6 108025473 #> 7 rs9374002 0 non_coding_tran… 6 108030386 #> 8 rs218289 0 intron_variant 6 108133233 #> 9 rs1064346 0 3_prime_UTR_var… 6 108043747 #> 10 rs9374013 0 intron_variant 6 108060143 #> # … with 17 more rows, and 2 more variables: chromosome_region <chr>, #> # last_update_date <dttm> #> #> Slot "genomic_contexts": #> # A tibble: 386 x 12 #> variant_id gene_name chromosome_name chromosome_posi… distance is_mapped_gene #> <chr> <chr> <chr> <int> <int> <lgl> #> 1 rs9486815 OSTM1-AS1 6 108126842 0 FALSE #> 2 rs9486815 NR2E1 6 108126842 39216 FALSE #> 3 rs9486815 OSTM1 6 108126842 52101 FALSE #> 4 rs9486815 Z98200.1 6 108126842 96124 FALSE #> 5 rs9486815 SNX3 6 108126842 84380 FALSE #> 6 rs9486815 RF00019 6 108126842 65912 FALSE #> 7 rs9486815 LOC10537… 6 108126842 86455 FALSE #> 8 rs9486815 AL078596… 6 108126842 52029 FALSE #> 9 rs9486815 OSTM1 6 108126842 0 TRUE #> 10 rs9486815 SNX3 6 108126842 84375 FALSE #> # … with 376 more rows, and 6 more variables: is_closest_gene <lgl>, #> # is_intergenic <lgl>, is_upstream <lgl>, is_downstream <lgl>, source <chr>, #> # mapping_method <chr> #> #> Slot "ensembl_ids": #> # A tibble: 233 x 3 #> variant_id gene_name ensembl_id #> <chr> <chr> <chr> #> 1 rs9486815 OSTM1-AS1 ENSG00000225174 #> 2 rs9486815 NR2E1 ENSG00000112333 #> 3 rs9486815 OSTM1 ENSG00000081087 #> 4 rs9486815 Z98200.1 ENSG00000271734 #> 5 rs9486815 SNX3 ENSG00000112335 #> 6 rs9486815 RF00019 ENSG00000200298 #> 7 rs9486815 AL078596.1 ENSG00000279398 #> 8 rs4245535 OSTM1 ENSG00000081087 #> 9 rs4245535 AL078596.1 ENSG00000279398 #> 10 rs4245535 NR2E1 ENSG00000112333 #> # … with 223 more rows #> #> Slot "entrez_ids": #> # A tibble: 153 x 3 #> variant_id gene_name entrez_id #> <chr> <chr> <chr> #> 1 rs9486815 OSTM1-AS1 100287366 #> 2 rs9486815 NR2E1 7101 #> 3 rs9486815 OSTM1 28962 #> 4 rs9486815 SNX3 8724 #> 5 rs9486815 LOC105377929 105377929 #> 6 rs4245535 OSTM1 28962 #> 7 rs4245535 NR2E1 7101 #> 8 rs4245535 SNX3 8724 #> 9 rs4245535 LOC105377929 105377929 #> 10 rs4245535 OSTM1-AS1 100287366 #> # … with 143 more rows #>
# Get a variant by its identifier get_variants(variant_id = 'rs3798440')
#> An object of class "variants" #> Slot "variants": #> # A tibble: 1 x 7 #> variant_id merged functional_class chromosome_name chromosome_posi… #> <chr> <int> <chr> <chr> <int> #> 1 rs3798440 0 intron_variant 6 75846902 #> # … with 2 more variables: chromosome_region <chr>, last_update_date <dttm> #> #> Slot "genomic_contexts": #> # A tibble: 9 x 12 #> variant_id gene_name chromosome_name chromosome_posi… distance is_mapped_gene #> <chr> <chr> <chr> <int> <int> <lgl> #> 1 rs3798440 IMPG1 6 75846902 74212 FALSE #> 2 rs3798440 MYO6 6 75846902 0 TRUE #> 3 rs3798440 RNA5SP209 6 75846902 18337 FALSE #> 4 rs3798440 RNU6-155P 6 75846902 78744 FALSE #> 5 rs3798440 RNA5SP209 6 75846902 18337 FALSE #> 6 rs3798440 RF00019 6 75846902 81611 FALSE #> 7 rs3798440 MYO6 6 75846902 0 FALSE #> 8 rs3798440 RNU6-155P 6 75846902 78744 FALSE #> 9 rs3798440 IMPG1 6 75846902 74212 FALSE #> # … with 6 more variables: is_closest_gene <lgl>, is_intergenic <lgl>, #> # is_upstream <lgl>, is_downstream <lgl>, source <chr>, mapping_method <chr> #> #> Slot "ensembl_ids": #> # A tibble: 5 x 3 #> variant_id gene_name ensembl_id #> <chr> <chr> <chr> #> 1 rs3798440 IMPG1 ENSG00000112706 #> 2 rs3798440 MYO6 ENSG00000196586 #> 3 rs3798440 RNA5SP209 ENSG00000223169 #> 4 rs3798440 RNU6-155P ENSG00000252156 #> 5 rs3798440 RF00019 ENSG00000200298 #> #> Slot "entrez_ids": #> # A tibble: 4 x 3 #> variant_id gene_name entrez_id #> <chr> <chr> <chr> #> 1 rs3798440 IMPG1 3617 #> 2 rs3798440 MYO6 4646 #> 3 rs3798440 RNA5SP209 100873469 #> 4 rs3798440 RNU6-155P 106481224 #>
# Get variants by EFO trait identifier get_variants(efo_id = 'EFO_0005537')
#> An object of class "variants" #> Slot "variants": #> # A tibble: 5 x 7 #> variant_id merged functional_class chromosome_name chromosome_posi… #> <chr> <int> <chr> <chr> <int> #> 1 rs4245739 0 3_prime_UTR_var… 1 204549714 #> 2 rs10069690 0 intron_variant 5 1279675 #> 3 rs3757318 0 intron_variant 6 151592978 #> 4 rs2363956 0 missense_variant 19 17283315 #> 5 rs10771399 0 intergenic_vari… 12 28002147 #> # … with 2 more variables: chromosome_region <chr>, last_update_date <dttm> #> #> Slot "genomic_contexts": #> # A tibble: 85 x 12 #> variant_id gene_name chromosome_name chromosome_posi… distance is_mapped_gene #> <chr> <chr> <chr> <int> <int> <lgl> #> 1 rs4245739 LOC10029… 1 204549714 20007 FALSE #> 2 rs4245739 AL512306… 1 204549714 20022 FALSE #> 3 rs4245739 PIK3C2B 1 204549714 54899 FALSE #> 4 rs4245739 AL512306… 1 204549714 53321 FALSE #> 5 rs4245739 TRK-TTT3… 1 204549714 43115 FALSE #> 6 rs4245739 AL512306… 1 204549714 77061 FALSE #> 7 rs4245739 PIK3C2B 1 204549714 54990 FALSE #> 8 rs4245739 MDM4 1 204549714 0 TRUE #> 9 rs4245739 LOC10537… 1 204549714 9872 FALSE #> 10 rs4245739 RNA5SP74 1 204549714 12699 FALSE #> # … with 75 more rows, and 6 more variables: is_closest_gene <lgl>, #> # is_intergenic <lgl>, is_upstream <lgl>, is_downstream <lgl>, source <chr>, #> # mapping_method <chr> #> #> Slot "ensembl_ids": #> # A tibble: 43 x 3 #> variant_id gene_name ensembl_id #> <chr> <chr> <chr> #> 1 rs4245739 AL512306.1 ENSG00000236779 #> 2 rs4245739 PIK3C2B ENSG00000133056 #> 3 rs4245739 AL512306.3 ENSG00000240710 #> 4 rs4245739 AL512306.2 ENSG00000240219 #> 5 rs4245739 MDM4 ENSG00000198625 #> 6 rs4245739 RNA5SP74 ENSG00000200408 #> 7 rs4245739 LRRN2 ENSG00000170382 #> 8 rs10069690 TERT ENSG00000164362 #> 9 rs10069690 MIR4457 ENSG00000263670 #> 10 rs10069690 CLPTM1L ENSG00000049656 #> # … with 33 more rows #> #> Slot "entrez_ids": #> # A tibble: 39 x 3 #> variant_id gene_name entrez_id #> <chr> <chr> <chr> #> 1 rs4245739 LOC100291628 100291628 #> 2 rs4245739 PIK3C2B 5287 #> 3 rs4245739 TRK-TTT3-1 100189122 #> 4 rs4245739 MDM4 4194 #> 5 rs4245739 LOC105371692 105371692 #> 6 rs4245739 RNA5SP74 100873308 #> 7 rs4245739 TRK-TTT3-2 100189425 #> 8 rs4245739 LRRN2 10446 #> 9 rs10069690 TERT 7015 #> 10 rs10069690 MIR4457 100616235 #> # … with 29 more rows #>