Extracts motif frequencies around site sites within DMRs. For each DMR, retrieves sequences around the start and end site sites, calculates base frequencies at each position, and stores the results in the DMR metadata.
Usage
extractDMRMotifs(
dmrs,
genome = "hg38",
array = "450k",
beta_locs = NULL,
motif_site_flank_size = 5,
plot_dir = NULL
)Arguments
- dmrs
Dataframe or GRanges object containing DMR coordinates and site indices
- genome
Character. Genome version to use for sequence extraction. Defaults to hg38.
- array
Character. Array platform type (e.g., "450K", "EPIC"). Ignored if input is not array-based. (default: "450K")
- beta_locs
Data frame. Optional pre-computed genomic locations. If NULL, locations will be retrieved using getSortedGenomicLocs (default: NULL)
- motif_site_flank_size
Integer. Number of base pairs to include as flanking regions around each site site (default: 5)
- plot_dir
Character. Optional directory where diagnostic motif plots may be written.
Value
The input Dataframe/GRanges object with an additional metadata column:
pwm: A matrix of base frequencies (rows: positions relative to site, columns: bases A, C, G, T)
consensus_seq: A character string representing the consensus sequence derived from the PWM
Examples
# Extract motif frequencies for DMRs
dmrs <- data.frame(
chr = c("chr16", "chr3"),
start = c(53468112, 37459206),
end = c(53468712, 37493431),
start_site = c("cg00000029", "cg00000108"),
start_seed = c("cg00000029", "cg00000108"),
end_site = c("cg13426503", "cg08730726"),
end_seed = c("cg13426503", "cg08730726"),
seeds = c("cg00000029,cg13426503", "cg00000108,cg08730726")
)
dmrs_with_motifs <- extractDMRMotifs(dmrs, genome = "hg38", array = "450K")
# Access motif frequencies for the first DMR
motif_freqs_dmr1 <- dmrs_with_motifs$pwm[[1]]