This helper function identifies differentially methylated positions (DMPs) from a BSseq object using the DSS package. It allows for flexible specification of sample groups, covariates, and chromosome filtering.
Usage
findDMPsBSSeq(
bsseq,
samplesheet,
samplesheet_sep = "\t",
sample_group_col = "Sample_Group",
id_col = "Sample_ID",
chr = "auto",
case_group = NULL,
covariates = NULL,
output_file = NULL,
njobs = 1L
)Arguments
- bsseq
A BSseq object or a file path to a saved BSseq object (RDS format).
- samplesheet
A data frame or a file path to a tab-delimited text file containing sample metadata. Must include columns for sample IDs and group labels.
- samplesheet_sep
The separator used in the samplesheet file if a file path is provided. Default is tab ("\t").
- sample_group_col
The name of the column in the samplesheet that contains the group labels for comparison. Default is "Sample_Group".
- id_col
The name of the column in the samplesheet that contains the sample IDs. Default is "Sample_ID".
- chr
A character vector of chromosome names to include in the analysis, or "auto" to automatically include chr1-chr22, or "all" to include chr1-chr22 plus chrX and chrY. Default is "auto".
- case_group
The specific group label in the sample_group_col to treat as the "case" group for comparison. If NULL, the first unique group in sample_group_col will be used as the case group. Default is NULL.
- covariates
A character vector of additional covariate column names from the samplesheet to include in the DSS model, or a comma-separated string of covariate names. Default is NULL (no additional covariates).
- output_file
An optional file path to save the DMP results as a tab-delimited text file. If the file name ends with ".gz", the output will be gzipped. Default is NULL (no file output).
- njobs
The number of parallel jobs to use for chromosome-level analysis. Default is 1.
Value
A data frame of identified DMPs with columns for chromosome, position, site ID, p-value, q-value, delta beta, and DMP score.
Examples
if (requireNamespace("bsseqData", quietly = TRUE) &&
requireNamespace("DSS", quietly = TRUE)) {
# Load example BSseq data
data(BS.cancer.ex, package = "bsseqData")
BS.cancer.ex <- BS.cancer.ex[seq_len(1000), ]
# Create a sample metadata data frame
samplesheet <- data.frame(
Sample_ID = colnames(BS.cancer.ex),
Sample_Group = c(rep("Condition1", 3), rep("Condition2", 3)),
Age = c(30, 32, 31, 28, 29, 27)
)
# Find DMPs with DSS
# \donttest{
dmps <- findDMPsBSSeq(
bsseq = BS.cancer.ex,
samplesheet = samplesheet,
sample_group_col = "Sample_Group",
id_col = "Sample_ID",
case_group = "Condition2",
covariates = "Age",
output_file = NULL,
njobs = 4
)
print(head(dmps))
# }
}