Skip to contents

Retrieves variants via the NHGRI-EBI GWAS Catalog REST API. The REST API is queried multiple times with the criteria passed as arguments (see below). By default all variants that match the criteria supplied in the arguments are retrieved: this corresponds to the default option set_operation set to 'union'. If you rather have only the variants that match simultaneously all criteria provided, then set set_operation to 'intersection'.

Usage

get_variants(
  study_id = NULL,
  association_id = NULL,
  variant_id = NULL,
  efo_id = NULL,
  pubmed_id = NULL,
  genomic_range = NULL,
  cytogenetic_band = NULL,
  gene_name = NULL,
  efo_trait = NULL,
  reported_trait = NULL,
  set_operation = "union",
  interactive = TRUE,
  std_chromosomes_only = TRUE,
  verbose = FALSE,
  warnings = TRUE
)

Arguments

study_id

A character vector of GWAS Catalog study accession identifiers.

association_id

A character vector of GWAS Catalog association identifiers.

variant_id

A character vector of GWAS Catalog variant identifiers.

efo_id

A character vector of EFO identifiers.

pubmed_id

An integer vector of PubMed identifiers.

genomic_range

A named list of three vectors:

chromosome

A character vector of chromosome names of the form 1--22, X or Y.

start

A numeric vector of start positions, starting at 1.

end

A numeric vector of end positions.

The three vectors need to be of the same length so that chromosome names, start and end positions can be matched by position.

cytogenetic_band

A character vector of cytogenetic bands of the form '1p36.11'.

gene_name

Gene symbol according to HUGO Gene Nomenclature (HGNC).

efo_trait

A character vector of EFO trait descriptions, e.g., 'uric acid measurement'.

reported_trait

A character vector of phenotypic traits as reported by the original authors of the study.

set_operation

Either 'union' or 'intersection'. This tells how variants retrieved by different criteria should be combined: 'union' binds together all results removing duplicates and 'intersection' only keeps same variants found with different criteria.

interactive

A logical. If all variants are requested, whether to ask interactively if we really want to proceed.

std_chromosomes_only

Whether to return only variants mapped to standard chromosomes: 1 thru 22, X, Y, and MT.

verbose

Whether the function should be verbose about the different queries or not.

warnings

Whether to print warnings.

Value

A variants object.

Details

Please note that all search criteria are vectorised, thus allowing for batch mode search, e.g., one can search by multiple variant identifiers at once by passing a vector of identifiers to variant_id.

Examples

# Get variants by study identifier
get_variants(study_id = 'GCST001085', warnings = FALSE)
#> An object of class "variants"
#> Slot "variants":
#> # A tibble: 18 × 7
#>    variant_id merged functional_class        chromosome_name chromosome_position
#>    <chr>       <int> <chr>                   <chr>                         <int>
#>  1 rs6469823       1 intergenic_variant      8                         119341744
#>  2 rs200752        0 intron_variant          NA                               NA
#>  3 rs1372662       0 intron_variant          8                         134554803
#>  4 rs2469997       0 intergenic_variant      8                         119341027
#>  5 rs12522034      0 intergenic_variant      5                          36425491
#>  6 rs7735940       0 regulatory_region_vari… NA                               NA
#>  7 rs7960483       0 intron_variant          NA                               NA
#>  8 rs6887846       0 intron_variant          5                          83140542
#>  9 rs3798440       0 intron_variant          NA                               NA
#> 10 rs6452524       0 intron_variant          NA                               NA
#> 11 rs7827545       0 intron_variant          NA                               NA
#> 12 rs10496288      0 intergenic_variant      NA                               NA
#> 13 rs13420028      0 intron_variant          NA                               NA
#> 14 rs10785581      0 intron_variant          12                         45537190
#> 15 rs10188442      0 intron_variant          2                         132431666
#> 16 rs10496289      0 intergenic_variant      2                          83066256
#> 17 rs9350602       0 intron_variant          6                          75850781
#> 18 rs200759        0 intron_variant          20                         15625776
#> # ℹ 2 more variables: chromosome_region <chr>, last_update_date <dttm>
#> 
#> Slot "genomic_contexts":
#> # A tibble: 60 × 12
#>    variant_id gene_name chromosome_name chromosome_position distance
#>    <chr>      <chr>     <chr>                         <int>    <int>
#>  1 rs6469823  CCN3      8                         119341744    74702
#>  2 rs6469823  MAL2      8                         119341744    96071
#>  3 rs6469823  MIR548AZ  8                         119341744    16479
#>  4 rs6469823  MAL2      8                         119341744    96071
#>  5 rs6469823  MIR548AZ  8                         119341744    16479
#>  6 rs6469823  MAL2-AS1  8                         119341744    94887
#>  7 rs6469823  CCN3      8                         119341744    74702
#>  8 rs6469823  MAL2-AS1  8                         119341744    94896
#>  9 rs1372662  ZFAT      8                         134554803        0
#> 10 rs1372662  ZFAT      8                         134554803        0
#> # ℹ 50 more rows
#> # ℹ 7 more variables: is_mapped_gene <lgl>, is_closest_gene <lgl>,
#> #   is_intergenic <lgl>, is_upstream <lgl>, is_downstream <lgl>, source <chr>,
#> #   mapping_method <chr>
#> 
#> Slot "ensembl_ids":
#> # A tibble: 28 × 3
#>    variant_id gene_name ensembl_id     
#>    <chr>      <chr>     <chr>          
#>  1 rs6469823  CCN3      ENSG00000136999
#>  2 rs6469823  MAL2      ENSG00000147676
#>  3 rs6469823  MIR548AZ  ENSG00000276479
#>  4 rs6469823  MAL2-AS1  ENSG00000253972
#>  5 rs1372662  ZFAT      ENSG00000066827
#>  6 rs1372662  ZFAT-AS1  ENSG00000248492
#>  7 rs2469997  MAL2      ENSG00000147676
#>  8 rs2469997  MIR548AZ  ENSG00000276479
#>  9 rs2469997  CCN3      ENSG00000136999
#> 10 rs2469997  MAL2-AS1  ENSG00000253972
#> # ℹ 18 more rows
#> 
#> Slot "entrez_ids":
#> # A tibble: 32 × 3
#>    variant_id gene_name entrez_id
#>    <chr>      <chr>     <chr>    
#>  1 rs6469823  CCN3      4856     
#>  2 rs6469823  MAL2      114569   
#>  3 rs6469823  MIR548AZ  102466162
#>  4 rs6469823  MAL2-AS1  105375726
#>  5 rs1372662  ZFAT      57623    
#>  6 rs1372662  ZFAT-AS1  594840   
#>  7 rs2469997  MAL2      114569   
#>  8 rs2469997  MIR548AZ  102466162
#>  9 rs2469997  CCN3      4856     
#> 10 rs2469997  MAL2-AS1  105375726
#> # ℹ 22 more rows
#> 

# Get a variant by its identifier
if (FALSE) {
get_variants(variant_id = 'rs3798440', warnings = FALSE)
}