Retrieves studies via the NHGRI-EBI GWAS Catalog REST API. The REST API is queried multiple times with the criteria passed as arguments (see below). By default all studies that match the criteria supplied in the arguments are retrieved: this corresponds to the default option set_operation set to 'union'. If you rather have only the studies that match simultaneously all criteria provided, then set set_operation to 'intersection'.

get_studies(study_id = NULL, association_id = NULL,
  variant_id = NULL, efo_id = NULL, pubmed_id = NULL,
  user_requested = NULL, full_pvalue_set = NULL, efo_uri = NULL,
  efo_trait = NULL, reported_trait = NULL, set_operation = "union",
  interactive = TRUE, verbose = FALSE, warnings = TRUE)

Arguments

study_id

A character vector of GWAS Catalog study accession identifiers.

association_id

A character vector of GWAS Catalog association identifiers.

variant_id

A character vector of GWAS Catalog variant identifiers.

efo_id

A character vector of EFO identifiers.

pubmed_id

An integer vector of PubMed identifiers.

user_requested

A logical (scalar!) indicating to retrieve either studies requested by users of the Catalog (TRUE) or otherwise (FALSE).

full_pvalue_set

A logical (scalar!) indicating to retrieve studies with full summary statistics (TRUE) or studies without it (FALSE).

efo_uri

A character vector of EFO URIs.

efo_trait

A character vector of EFO trait descriptions, e.g., 'uric acid measurement'.

reported_trait

A character vector of phenotypic traits as reported by the original authors of the study.

set_operation

Either 'union' or 'intersection'. This tells how studies retrieved by different criteria should be combined: 'union' binds together all results removing duplicates and 'intersection' only keeps same studies found with different criteria.

interactive

A logical. If all studies are requested, whether to ask interactively if we really want to proceed.

verbose

Whether the function should be verbose about the different queries or not.

warnings

Whether to print warnings.

Value

A studies object.

Details

Please note that all search criteria are vectorised, thus allowing for batch mode search, e.g., one can search by multiple variant identifiers at once by passing a vector of identifiers to variant_id.

Examples

# Get a study by its accession identifier get_studies(study_id = 'GCST001085')
#> An object of class "studies" #> Slot "studies": #> # A tibble: 1 x 13 #> study_id reported_trait initial_sample_… replication_sam… gxe gxg #> <chr> <chr> <chr> <chr> <lgl> <lgl> #> 1 GCST001… Hypertension … ~2,000 European… <NA> FALSE TRUE #> # … with 7 more variables: snp_count <int>, qualifier <chr>, imputed <lgl>, #> # pooled <lgl>, study_design_comment <chr>, full_pvalue_set <lgl>, #> # user_requested <lgl> #> #> Slot "genotyping_techs": #> # A tibble: 1 x 2 #> study_id genotyping_technology #> <chr> <chr> #> 1 GCST001085 Genome-wide genotyping array #> #> Slot "platforms": #> # A tibble: 1 x 2 #> study_id manufacturer #> <chr> <chr> #> 1 GCST001085 Affymetrix #> #> Slot "ancestries": #> # A tibble: 1 x 4 #> study_id ancestry_id type number_of_individuals #> <chr> <int> <chr> <int> #> 1 GCST001085 1 initial 5000 #> #> Slot "ancestral_groups": #> # A tibble: 1 x 3 #> study_id ancestry_id ancestral_group #> <chr> <int> <chr> #> 1 GCST001085 1 European #> #> Slot "countries_of_origin": #> # A tibble: 1 x 5 #> study_id ancestry_id country_name major_area region #> <chr> <int> <chr> <chr> <chr> #> 1 GCST001085 1 <NA> <NA> <NA> #> #> Slot "countries_of_recruitment": #> # A tibble: 1 x 5 #> study_id ancestry_id country_name major_area region #> <chr> <int> <chr> <chr> <chr> #> 1 GCST001085 1 <NA> <NA> <NA> #> #> Slot "publications": #> # A tibble: 1 x 7 #> study_id pubmed_id publication_date publication title author_fullname #> <chr> <int> <date> <chr> <chr> <chr> #> 1 GCST001… 21626137 2011-05-27 Hum Genet Two-… Slavin TP #> # … with 1 more variable: author_orcid <chr> #>
# Get a study by association identifier get_studies(association_id = '25389945')
#> An object of class "studies" #> Slot "studies": #> # A tibble: 1 x 13 #> study_id reported_trait initial_sample_… replication_sam… gxe gxg #> <chr> <chr> <chr> <chr> <lgl> <lgl> #> 1 GCST005… Major depress… 2,605 European … 8,508 European … FALSE FALSE #> # … with 7 more variables: snp_count <int>, qualifier <chr>, imputed <lgl>, #> # pooled <lgl>, study_design_comment <chr>, full_pvalue_set <lgl>, #> # user_requested <lgl> #> #> Slot "genotyping_techs": #> # A tibble: 1 x 2 #> study_id genotyping_technology #> <chr> <chr> #> 1 GCST005108 Genome-wide genotyping array #> #> Slot "platforms": #> # A tibble: 1 x 2 #> study_id manufacturer #> <chr> <chr> #> 1 GCST005108 Illumina #> #> Slot "ancestries": #> # A tibble: 2 x 4 #> study_id ancestry_id type number_of_individuals #> <chr> <int> <chr> <int> #> 1 GCST005108 1 replication 25035 #> 2 GCST005108 2 initial 18773 #> #> Slot "ancestral_groups": #> # A tibble: 2 x 3 #> study_id ancestry_id ancestral_group #> <chr> <int> <chr> #> 1 GCST005108 1 European #> 2 GCST005108 2 European #> #> Slot "countries_of_origin": #> # A tibble: 2 x 5 #> study_id ancestry_id country_name major_area region #> <chr> <int> <chr> <chr> <chr> #> 1 GCST005108 1 <NA> <NA> <NA> #> 2 GCST005108 2 <NA> <NA> <NA> #> #> Slot "countries_of_recruitment": #> # A tibble: 2 x 5 #> study_id ancestry_id country_name major_area region #> <chr> <int> <chr> <chr> <chr> #> 1 GCST005108 1 U.K. Europe Northern Europe #> 2 GCST005108 2 U.K. Europe Northern Europe #> #> Slot "publications": #> # A tibble: 1 x 7 #> study_id pubmed_id publication_date publication title author_fullname #> <chr> <int> <date> <chr> <chr> <chr> #> 1 GCST005… 29187746 2017-11-30 Transl Psy… Geno… Howard DM #> # … with 1 more variable: author_orcid <chr> #>
# Get studies by variant identifier get_studies(variant_id = 'rs3798440')
#> An object of class "studies" #> Slot "studies": #> # A tibble: 1 x 13 #> study_id reported_trait initial_sample_… replication_sam… gxe gxg #> <chr> <chr> <chr> <chr> <lgl> <lgl> #> 1 GCST001… Hypertension … ~2,000 European… <NA> FALSE TRUE #> # … with 7 more variables: snp_count <int>, qualifier <chr>, imputed <lgl>, #> # pooled <lgl>, study_design_comment <chr>, full_pvalue_set <lgl>, #> # user_requested <lgl> #> #> Slot "genotyping_techs": #> # A tibble: 1 x 2 #> study_id genotyping_technology #> <chr> <chr> #> 1 GCST001085 Genome-wide genotyping array #> #> Slot "platforms": #> # A tibble: 1 x 2 #> study_id manufacturer #> <chr> <chr> #> 1 GCST001085 Affymetrix #> #> Slot "ancestries": #> # A tibble: 1 x 4 #> study_id ancestry_id type number_of_individuals #> <chr> <int> <chr> <int> #> 1 GCST001085 1 initial 5000 #> #> Slot "ancestral_groups": #> # A tibble: 1 x 3 #> study_id ancestry_id ancestral_group #> <chr> <int> <chr> #> 1 GCST001085 1 European #> #> Slot "countries_of_origin": #> # A tibble: 1 x 5 #> study_id ancestry_id country_name major_area region #> <chr> <int> <chr> <chr> <chr> #> 1 GCST001085 1 <NA> <NA> <NA> #> #> Slot "countries_of_recruitment": #> # A tibble: 1 x 5 #> study_id ancestry_id country_name major_area region #> <chr> <int> <chr> <chr> <chr> #> 1 GCST001085 1 <NA> <NA> <NA> #> #> Slot "publications": #> # A tibble: 1 x 7 #> study_id pubmed_id publication_date publication title author_fullname #> <chr> <int> <date> <chr> <chr> <chr> #> 1 GCST001… 21626137 2011-05-27 Hum Genet Two-… Slavin TP #> # … with 1 more variable: author_orcid <chr> #>
# Get studies by EFO trait identifier get_studies(efo_id = 'EFO_0005537')
#> An object of class "studies" #> Slot "studies": #> # A tibble: 1 x 13 #> study_id reported_trait initial_sample_… replication_sam… gxe gxg #> <chr> <chr> <chr> <chr> <lgl> <lgl> #> 1 GCST002… Breast cancer… 1,529 European … 2,148 European … FALSE FALSE #> # … with 7 more variables: snp_count <int>, qualifier <chr>, imputed <lgl>, #> # pooled <lgl>, study_design_comment <chr>, full_pvalue_set <lgl>, #> # user_requested <lgl> #> #> Slot "genotyping_techs": #> # A tibble: 1 x 2 #> study_id genotyping_technology #> <chr> <chr> #> 1 GCST002305 Genome-wide genotyping array #> #> Slot "platforms": #> # A tibble: 1 x 2 #> study_id manufacturer #> <chr> <chr> #> 1 GCST002305 Illumina #> #> Slot "ancestries": #> # A tibble: 2 x 4 #> study_id ancestry_id type number_of_individuals #> <chr> <int> <chr> <int> #> 1 GCST002305 1 replication 3457 #> 2 GCST002305 2 initial 4928 #> #> Slot "ancestral_groups": #> # A tibble: 2 x 3 #> study_id ancestry_id ancestral_group #> <chr> <int> <chr> #> 1 GCST002305 1 European #> 2 GCST002305 2 European #> #> Slot "countries_of_origin": #> # A tibble: 0 x 5 #> # … with 5 variables: study_id <chr>, ancestry_id <int>, country_name <chr>, #> # major_area <chr>, region <chr> #> #> Slot "countries_of_recruitment": #> # A tibble: 10 x 5 #> study_id ancestry_id country_name major_area region #> <chr> <int> <chr> <chr> <chr> #> 1 GCST002305 1 U.S. Northern America <NA> #> 2 GCST002305 1 Greece Europe Southern Europe #> 3 GCST002305 1 Germany Europe Western Europe #> 4 GCST002305 1 U.K. Europe Northern Europe #> 5 GCST002305 1 Norway Europe Northern Europe #> 6 GCST002305 2 Finland Europe Northern Europe #> 7 GCST002305 2 U.S. Northern America <NA> #> 8 GCST002305 2 Australia Oceania Australia/New Zealand #> 9 GCST002305 2 Germany Europe Western Europe #> 10 GCST002305 2 U.K. Europe Northern Europe #> #> Slot "publications": #> # A tibble: 1 x 7 #> study_id pubmed_id publication_date publication title author_fullname #> <chr> <int> <date> <chr> <chr> <chr> #> 1 GCST002… 24325915 2013-12-09 Carcinogen… Geno… Purrington KS #> # … with 1 more variable: author_orcid <chr> #>