Scores Differentially Methylated Regions (DMRs) based on their ability to
discriminate between sample groups using cross-validated Support Vector Machine (SVM)
classification. For each DMR, this function performs stratified k-fold cross-prediction
using an RBF kernel SVM and computes a margin-sensitive classification score based on
decision values, which serves as a complementary measure of the DMR's discriminative
power. Use this score alongside DMR-level pval, qval, and effect-size columns
rather than as a replacement for statistical evidence.
The scores are then smoothed along the genome using a Gaussian-kNN approach,
and piecewise-linear segments are detected using the PELT algorithm, expecting a rising->plateau->decreasing pattern.
Finally, DMRs are assigned to localized blocks based on the smoothed score profiles
and specified gap rules.
Usage
scoreDMRs(
dmrs,
beta,
pheno,
covariates = NULL,
genome = "hg38",
array = "450K",
sorted_locs = NULL,
sample_group_col = "Sample_Group",
block_gap_mode = c("adaptive", "fixed", "none"),
block_gap_fixed_bp = NULL,
block_gap_quantile = 0.95,
block_gap_multiplier = 1.5,
block_gap_min_bp = 2500,
block_gap_max_bp = 50000,
njobs = getOption("CMEnt.njobs", .defaultNJobs()),
verbose = getOption("CMEnt.verbose", 1L)
)Arguments
- dmrs
Data frame or GRanges object containing DMR coordinates and metadata
- beta
Character. Path to beta value file, tabix file, beta matrix, BetaHandler object, or bed file
- pheno
Data frame. Phenotype data containing sample group information
- covariates
Character vector of covariate columns in
phenoto regress out before scoring. Default isNULL.- genome
Character. Genome version (e.g., "hg38", "hg19", "hs1", "mm10"). Default is "hg38"
- array
Character. Array platform type (e.g., "450K", "EPIC", "EPICv2"). Default is "450K"
- sorted_locs
Data frame. Optional pre-computed sorted genomic locations. Default is NULL
- sample_group_col
Character. Column name in pheno containing sample group information. Default is "Sample_Group"
- block_gap_mode
Character. Distance rule for block construction:
"adaptive"(default),"fixed", or"none".- block_gap_fixed_bp
Numeric. Maximum allowed midpoint gap (bp) when
block_gap_mode = "fixed". Ignored otherwise.- block_gap_quantile
Numeric in
(0, 1). Quantile of chromosome DMR midpoint gaps used in adaptive thresholding. Default is0.95.- block_gap_multiplier
Numeric > 0. Multiplier applied to the adaptive gap quantile. Default is
1.5.- block_gap_min_bp
Numeric >= 0. Lower clamp for adaptive gap threshold (bp). Default is
250000.- block_gap_max_bp
Numeric >=
block_gap_min_bp. Upper clamp for adaptive gap threshold (bp). Default is5000000.- njobs
Integer. Number of parallel jobs used for cross-validated scoring. Default comes from
getOption("CMEnt.njobs").- verbose
Numeric. Logging verbosity level. Default comes from
getOption("CMEnt.verbose").
Value
GRanges object with DMRs ordered by complementary classification score and additional metadata columns:
score: Margin-sensitive cross-validated classification score for the DMR
cv_accuracy: Raw cross-validated classification accuracy for the DMR
score_smoothed: Gaussian-kNN smoothed score trajectory per chromosome
segment_id: Piecewise-linear segment index estimated with PELT
segment_slope: Estimated slope of the segment that each DMR belongs to
block_id: Localized DMR block label (NA for DMRs not assigned to a block)
Details
The function uses stratified k-fold cross-prediction to ensure balanced representation
of sample groups in each fold. The number of folds can be controlled using the
option "CMEnt.scoring_nfold" (default is 5). An RBF (Radial Basis Function) kernel
SVM is trained on the beta values of site sites within each DMR. For
reproducible fold assignments, call set.seed() before scoreDMRs().
The score combines classification correctness and margin confidence,
making it more sensitive than plain cross-validated accuracy when many DMRs
classify perfectly. It is a complementary ranking and diagnostic measure,
especially useful for sample-level separation. The cv_accuracy column stores
the raw cross-validated accuracy for reference. Blocks are detected from smoothed score profiles and
split at large midpoint gaps using the selected block_gap_mode.
Examples
# Load example data
loadExampleInputDataChr5And11()
# Load pre-computed DMRs
dmrs <- readRDS(system.file("extdata", "example_outputChr5And11.rds", package = "CMEnt"))
# score DMRs
scoring_dmrs <- scoreDMRs(
dmrs = dmrs[1],
beta = beta,
pheno = pheno,
sample_group_col = "Sample_Group"
)
