Skip to contents

Scores Differentially Methylated Regions (DMRs) based on their ability to discriminate between sample groups using cross-validated Support Vector Machine (SVM) classification. For each DMR, this function performs stratified k-fold cross-prediction using an RBF kernel SVM and computes a margin-sensitive classification score based on decision values, which serves as a complementary measure of the DMR's discriminative power. Use this score alongside DMR-level pval, qval, and effect-size columns rather than as a replacement for statistical evidence. The scores are then smoothed along the genome using a Gaussian-kNN approach, and piecewise-linear segments are detected using the PELT algorithm, expecting a rising->plateau->decreasing pattern. Finally, DMRs are assigned to localized blocks based on the smoothed score profiles and specified gap rules.

Usage

scoreDMRs(
  dmrs,
  beta,
  pheno,
  covariates = NULL,
  genome = "hg38",
  array = "450K",
  sorted_locs = NULL,
  sample_group_col = "Sample_Group",
  block_gap_mode = c("adaptive", "fixed", "none"),
  block_gap_fixed_bp = NULL,
  block_gap_quantile = 0.95,
  block_gap_multiplier = 1.5,
  block_gap_min_bp = 2500,
  block_gap_max_bp = 50000,
  njobs = getOption("CMEnt.njobs", .defaultNJobs()),
  verbose = getOption("CMEnt.verbose", 1L)
)

Arguments

dmrs

Data frame or GRanges object containing DMR coordinates and metadata

beta

Character. Path to beta value file, tabix file, beta matrix, BetaHandler object, or bed file

pheno

Data frame. Phenotype data containing sample group information

covariates

Character vector of covariate columns in pheno to regress out before scoring. Default is NULL.

genome

Character. Genome version (e.g., "hg38", "hg19", "hs1", "mm10"). Default is "hg38"

array

Character. Array platform type (e.g., "450K", "EPIC", "EPICv2"). Default is "450K"

sorted_locs

Data frame. Optional pre-computed sorted genomic locations. Default is NULL

sample_group_col

Character. Column name in pheno containing sample group information. Default is "Sample_Group"

block_gap_mode

Character. Distance rule for block construction: "adaptive" (default), "fixed", or "none".

block_gap_fixed_bp

Numeric. Maximum allowed midpoint gap (bp) when block_gap_mode = "fixed". Ignored otherwise.

block_gap_quantile

Numeric in (0, 1). Quantile of chromosome DMR midpoint gaps used in adaptive thresholding. Default is 0.95.

block_gap_multiplier

Numeric > 0. Multiplier applied to the adaptive gap quantile. Default is 1.5.

block_gap_min_bp

Numeric >= 0. Lower clamp for adaptive gap threshold (bp). Default is 250000.

block_gap_max_bp

Numeric >= block_gap_min_bp. Upper clamp for adaptive gap threshold (bp). Default is 5000000.

njobs

Integer. Number of parallel jobs used for cross-validated scoring. Default comes from getOption("CMEnt.njobs").

verbose

Numeric. Logging verbosity level. Default comes from getOption("CMEnt.verbose").

Value

GRanges object with DMRs ordered by complementary classification score and additional metadata columns:

  • score: Margin-sensitive cross-validated classification score for the DMR

  • cv_accuracy: Raw cross-validated classification accuracy for the DMR

  • score_smoothed: Gaussian-kNN smoothed score trajectory per chromosome

  • segment_id: Piecewise-linear segment index estimated with PELT

  • segment_slope: Estimated slope of the segment that each DMR belongs to

  • block_id: Localized DMR block label (NA for DMRs not assigned to a block)

Details

The function uses stratified k-fold cross-prediction to ensure balanced representation of sample groups in each fold. The number of folds can be controlled using the option "CMEnt.scoring_nfold" (default is 5). An RBF (Radial Basis Function) kernel SVM is trained on the beta values of site sites within each DMR. For reproducible fold assignments, call set.seed() before scoreDMRs().

The score combines classification correctness and margin confidence, making it more sensitive than plain cross-validated accuracy when many DMRs classify perfectly. It is a complementary ranking and diagnostic measure, especially useful for sample-level separation. The cv_accuracy column stores the raw cross-validated accuracy for reference. Blocks are detected from smoothed score profiles and split at large midpoint gaps using the selected block_gap_mode.

Examples

# Load example data
loadExampleInputDataChr5And11()

# Load pre-computed DMRs
dmrs <- readRDS(system.file("extdata", "example_outputChr5And11.rds", package = "CMEnt"))

# score DMRs
scoring_dmrs <- scoreDMRs(
    dmrs = dmrs[1],
    beta = beta,
    pheno = pheno,
    sample_group_col = "Sample_Group"
)