Retrieves variants via the NHGRI-EBI GWAS Catalog REST API. The REST
API is queried multiple times with the criteria passed as arguments (see
below). By default all variants that match the criteria supplied in the
arguments are retrieved: this corresponds to the default option
set_operation
set to 'union'
. If you rather have only the
variants that match simultaneously all criteria provided, then set
set_operation
to 'intersection'
.
Usage
get_variants(
study_id = NULL,
association_id = NULL,
variant_id = NULL,
efo_id = NULL,
pubmed_id = NULL,
genomic_range = NULL,
cytogenetic_band = NULL,
gene_name = NULL,
efo_trait = NULL,
reported_trait = NULL,
set_operation = "union",
interactive = TRUE,
std_chromosomes_only = TRUE,
verbose = FALSE,
warnings = TRUE
)
Arguments
- study_id
A character vector of GWAS Catalog study accession identifiers.
- association_id
A character vector of GWAS Catalog association identifiers.
- variant_id
A character vector of GWAS Catalog variant identifiers.
- efo_id
A character vector of EFO identifiers.
- pubmed_id
An integer vector of PubMed identifiers.
- genomic_range
A named list of three vectors:
- chromosome
A character vector of chromosome names of the form 1--22, X or Y.
- start
A numeric vector of start positions, starting at 1.
- end
A numeric vector of end positions.
The three vectors need to be of the same length so that
chromosome
names,start
andend
positions can be matched by position.- cytogenetic_band
A character vector of cytogenetic bands of the form
'1p36.11'
.- gene_name
Gene symbol according to HUGO Gene Nomenclature (HGNC).
- efo_trait
A character vector of EFO trait descriptions, e.g.,
'uric acid measurement'
.- reported_trait
A character vector of phenotypic traits as reported by the original authors of the study.
- set_operation
Either
'union'
or'intersection'
. This tells how variants retrieved by different criteria should be combined:'union'
binds together all results removing duplicates and'intersection'
only keeps same variants found with different criteria.- interactive
A logical. If all variants are requested, whether to ask interactively if we really want to proceed.
- std_chromosomes_only
Whether to return only variants mapped to standard chromosomes: 1 thru 22, X, Y, and MT.
- verbose
Whether the function should be verbose about the different queries or not.
- warnings
Whether to print warnings.
Value
A variants object.
Details
Please note that all search criteria are vectorised, thus allowing for batch
mode search, e.g., one can search by multiple variant identifiers at once by
passing a vector of identifiers to variant_id
.
Examples
# Get variants by study identifier
get_variants(study_id = 'GCST001085', warnings = FALSE)
#> An object of class "variants"
#> Slot "variants":
#> # A tibble: 18 × 7
#> variant_id merged functional_class chromosome_name chromosome_position
#> <chr> <int> <chr> <chr> <int>
#> 1 rs6469823 1 intergenic_variant 8 119341744
#> 2 rs200752 0 intron_variant 20 15618886
#> 3 rs1372662 0 intron_variant 8 134554803
#> 4 rs2469997 0 intergenic_variant 8 119341027
#> 5 rs12522034 0 intergenic_variant 5 36425491
#> 6 rs7735940 0 regulatory_region_vari… 5 36423829
#> 7 rs7960483 0 intron_variant 12 45531972
#> 8 rs6887846 0 intron_variant 5 83140542
#> 9 rs3798440 0 intron_variant 6 75846902
#> 10 rs6452524 0 intron_variant 5 83137962
#> 11 rs7827545 0 intron_variant 8 134554324
#> 12 rs10496288 0 intergenic_variant 2 83065441
#> 13 rs13420028 0 intron_variant 2 132430533
#> 14 rs10785581 0 intron_variant 12 45537190
#> 15 rs10188442 0 intron_variant 2 132431666
#> 16 rs10496289 0 intergenic_variant 2 83066256
#> 17 rs9350602 0 intron_variant 6 75850781
#> 18 rs200759 0 intron_variant 20 15625776
#> # ℹ 2 more variables: chromosome_region <chr>, last_update_date <dttm>
#>
#> Slot "genomic_contexts":
#> # A tibble: 138 × 12
#> variant_id gene_name chromosome_name chromosome_position distance
#> <chr> <chr> <chr> <int> <int>
#> 1 rs6469823 MIR548AZ 8 119341744 16479
#> 2 rs6469823 CCN3 8 119341744 74702
#> 3 rs6469823 MIR548AZ 8 119341744 16479
#> 4 rs6469823 MAL2-AS1 8 119341744 94887
#> 5 rs6469823 LOC124900266 8 119341744 46604
#> 6 rs6469823 CCN3 8 119341744 74702
#> 7 rs6469823 MAL2-AS1 8 119341744 94896
#> 8 rs6469823 MAL2 8 119341744 96071
#> 9 rs6469823 MAL2 8 119341744 96071
#> 10 rs6469823 LOC124902009 8 119341744 8146
#> # ℹ 128 more rows
#> # ℹ 7 more variables: is_mapped_gene <lgl>, is_closest_gene <lgl>,
#> # is_intergenic <lgl>, is_upstream <lgl>, is_downstream <lgl>, source <chr>,
#> # mapping_method <chr>
#>
#> Slot "ensembl_ids":
#> # A tibble: 62 × 3
#> variant_id gene_name ensembl_id
#> <chr> <chr> <chr>
#> 1 rs6469823 MIR548AZ ENSG00000276479
#> 2 rs6469823 CCN3 ENSG00000136999
#> 3 rs6469823 MAL2-AS1 ENSG00000253972
#> 4 rs6469823 MAL2 ENSG00000147676
#> 5 rs200752 ENSAP1 ENSG00000224274
#> 6 rs200752 RNA5SP475 ENSG00000222500
#> 7 rs200752 MACROD2 ENSG00000172264
#> 8 rs1372662 MTCO1P49 ENSG00000253916
#> 9 rs1372662 ZFAT ENSG00000066827
#> 10 rs1372662 ZFAT-AS1 ENSG00000248492
#> # ℹ 52 more rows
#>
#> Slot "entrez_ids":
#> # A tibble: 89 × 3
#> variant_id gene_name entrez_id
#> <chr> <chr> <chr>
#> 1 rs6469823 MIR548AZ 102466162
#> 2 rs6469823 CCN3 4856
#> 3 rs6469823 MAL2-AS1 105375726
#> 4 rs6469823 LOC124900266 124900266
#> 5 rs6469823 MAL2 114569
#> 6 rs6469823 LOC124902009 124902009
#> 7 rs200752 ENSAP1 170511
#> 8 rs200752 RNA5SP475 100873717
#> 9 rs200752 MACROD2 140733
#> 10 rs1372662 MTCO1P49 107075177
#> # ℹ 79 more rows
#>
# Get a variant by its identifier
if (FALSE) {
get_variants(variant_id = 'rs3798440', warnings = FALSE)
}