Skip to contents

Gets linkage disequilibrium data for variants from Ensembl REST API. There are four ways to query, either by:

Genomic window centred on variants:

get_ld_variants_by_window(variant_id, genomic_window_size, ...)

Pairs of variants:

get_ld_variants_by_pair(variant_id1, variant_id2, ...)

Genomic range:

get_ld_variants_by_range(genomic_range, ...)

All pair combinations of variants:

get_ld_variants_by_pair_combn(variant_id, ...)

Usage

get_ld_variants_by_window(
  variant_id,
  genomic_window_size = 500L,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

get_ld_variants_by_pair(
  variant_id1,
  variant_id2,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

get_ld_variants_by_range(
  genomic_range,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

get_ld_variants_by_pair_combn(
  variant_id,
  species_name = "homo_sapiens",
  population = "1000GENOMES:phase_3:CEU",
  d_prime = 0,
  r_squared = 0.05,
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

variant_id

Variant identifiers, e.g., 'rs123'. This argument is to be used with either function get_ld_variants_by_window() or get_ld_variants_by_pair_combn(). In the case of get_ld_variants_by_pair_combn() all pairwise combinations of elements of variant_id are used to define pairs of variants for querying. Note that this argument is not the same as variant_id1 or variant_id2, to be used with function get_ld_variants_by_pair.

genomic_window_size

An integer vector specifying the genomic window size in kilobases (kb) around the variant indicated in variant_id. This argument is to be used with function get_ld_variants_by_window(). At the moment, the Ensembl REST API does not allow values greater than 500kb. A window size of 500 means looking 250kb upstream and downstream the variant passed as variant_id. The minimum value for this argument is 1L, not 0L.

species_name

The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: 'homo_sapiens' (human), 'ovis_aries' (Domestic sheep) or 'capra_hircus' (Goat).

population

Population for which to compute linkage disequilibrium. See get_populations on how to find available populations for a species.

d_prime

\(D'\) is a measure of linkage disequilibrium. d_prime defines a cut-off threshold: only variants whose \(D' \ge \)d_prime are returned.

r_squared

\(r^2\) is a measure of linkage disequilibrium. r_squared defines a cut-off threshold: only variants whose \(r^2 \ge \)r_squared are returned. The lower bound for r_squared is 0.05, not 0; the upper bound is 1.

verbose

Whether to be verbose about the http requests and respective responses' status.

warnings

Whether to show warnings.

progress_bar

Whether to show a progress bar.

variant_id1

The first variant of a pair of variants. Used with variant_id2. Note that this argument is not the same as variant_id. This argument is to be used with function get_ld_variants_by_pair().

variant_id2

The second variant of a pair of variants. Used with variant_id1. Note that this argument is not the same as variant_id. This argument is to be used with function get_ld_variants_by_pair().

genomic_range

Genomic range formatted as a string "chr:start..end", e.g., "X:1..10000". Check function genomic_range to easily create these ranges from vectors of start and end positions. This argument is to be used with function get_ld_variants_by_range().

Value

A tibble of 6 variables:

species_name

Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.

population

Population for which to compute linkage disequilibrium.

variant_id1

First variant identifier.

variant_id2

Second variant identifier.

d_prime

\(D'\) between the two variants.

r_squared

\(r^2\) between the two variants.

Examples

# Retrieve variants in LD by a window size of 1kb:
# 1kb: 500 bp upstream and 500 bp downstream of variant.
get_ld_variants_by_window('rs123', genomic_window_size = 1L)
#> Error in curl::curl_fetch_memory(url, handle = handle): OpenSSL SSL_read: error:0A000126:SSL routines::unexpected eof while reading, errno 0

# Retrieve LD measures for pairs of variants:
get_ld_variants_by_pair(
  variant_id1 = c('rs123', 'rs35439278'),
  variant_id2 = c('rs122', 'rs35174522')
)
#> # A tibble: 2 × 6
#>   species_name population              variant_id1 variant_id2 r_squared d_prime
#>   <chr>        <chr>                   <chr>       <chr>           <dbl>   <dbl>
#> 1 homo_sapiens 1000GENOMES:phase_3:CEU rs123       rs122          0.722     1.00
#> 2 homo_sapiens 1000GENOMES:phase_3:CEU rs35439278  rs35174522     0.0973    1.00

# Retrieve variants in LD within a genomic range
get_ld_variants_by_range('7:100000..100500')
#> # A tibble: 1 × 6
#>   species_name population              variant_id1 variant_id2 r_squared d_prime
#>   <chr>        <chr>                   <chr>       <chr>           <dbl>   <dbl>
#> 1 homo_sapiens 1000GENOMES:phase_3:CEU rs35439278  rs35174522     0.0973    1.00

# Retrieve all pair combinations of variants in LD
get_ld_variants_by_pair_combn(c('rs6978506', 'rs12718102', 'rs13307200'))
#> # A tibble: 3 × 6
#>   species_name population              variant_id1 variant_id2 r_squared d_prime
#>   <chr>        <chr>                   <chr>       <chr>           <dbl>   <dbl>
#> 1 homo_sapiens 1000GENOMES:phase_3:CEU rs6978506   rs12718102      0.111   0.999
#> 2 homo_sapiens 1000GENOMES:phase_3:CEU rs6978506   rs13307200      0.320   1.00 
#> 3 homo_sapiens 1000GENOMES:phase_3:CEU rs12718102  rs13307200      0.266   0.875