Biomarker Data Set by Vermeulen et al. (2009) • vermeulen

vermeulen provides the Biomarker data set by Vermeulen et al. (2009) in tidy format.

This data set is for a real-time quantitative PCR experiment that comprises:

The raw fluorescence data of 24,576 amplification curves.
64 targets: 59 genes of interest and 5 reference genes.
366 neuroblastoma cDNA samples and 18 dilution series samples.

Installation

Install vermeulen from CRAN:

# Install from CRAN
install.packages("vermeulen")

You can instead install the development version of vermeulen from GitHub:

# install.packages("remotes")
remotes::install_github("ramiromagno/vermeulen")

Usage

Because of CRAN size limits the data is not provided at installation time. The data can be retrieved from this GitHub repository after installation with the function get_biomarker_dataset().

library(vermeulen)
library(tibble)
library(dplyr)

# Takes a few seconds (downloading from GitHub...)
biomarker <- as_tibble(get_biomarker_dataset())
biomarker
#> # A tibble: 1,226,880 × 11
#>    plate well  dye   target target_type sample sample_type copies dilution cycle
#>    <fct> <fct> <fct> <fct>  <fct>       <chr>  <fct>        <int>    <dbl> <int>
#>  1 AHCY  A1    SYBR  AHCY   toi         1495   unk             NA       NA     1
#>  2 AHCY  A1    SYBR  AHCY   toi         1495   unk             NA       NA     2
#>  3 AHCY  A1    SYBR  AHCY   toi         1495   unk             NA       NA     3
#>  4 AHCY  A1    SYBR  AHCY   toi         1495   unk             NA       NA     4
#>  5 AHCY  A1    SYBR  AHCY   toi         1495   unk             NA       NA     5
#>  6 AHCY  A1    SYBR  AHCY   toi         1495   unk             NA       NA     6
#>  7 AHCY  A1    SYBR  AHCY   toi         1495   unk             NA       NA     7
#>  8 AHCY  A1    SYBR  AHCY   toi         1495   unk             NA       NA     8
#>  9 AHCY  A1    SYBR  AHCY   toi         1495   unk             NA       NA     9
#> 10 AHCY  A1    SYBR  AHCY   toi         1495   unk             NA       NA    10
#> # ℹ 1,226,870 more rows
#> # ℹ 1 more variable: fluor <dbl>

Types of samples:

count(
  distinct(biomarker, plate, well, sample_type, copies, dilution),
  sample_type,
  copies,
  dilution
)
#> # A tibble: 7 × 4
#>   sample_type copies dilution     n
#>   <fct>        <int>    <dbl> <int>
#> 1 ntc              0      Inf   192
#> 2 std             15    10000   192
#> 3 std            150     1000   192
#> 4 std           1500      100   192
#> 5 std          15000       10   192
#> 6 std         150000        1   192
#> 7 unk             NA       NA 23424

Code of Conduct

Please note that the vermeulen project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

References

Vermeulen et al.. Predicting outcomes for children with neuroblastoma using a multigene-expression signature: a retrospective SIOPEN/COG/GPOH study. The Lancet Oncology 10, 663–671 (2009). doi: 10.1016/S1470-2045(09)70154-8.
Ruijter et al.. Evaluation of qPCR curve analysis methods for reliable biomarker discovery: Bias, resolution, precision, and implications. Methods 59 32–46 (2013). doi: 10.1016/j.ymeth.2012.08.011.