Retrieves the DNA sequences corresponding to genomic regions specified in a GRanges object. This function is useful for extracting the actual DNA sequence of identified DMRs for downstream analyses such as motif finding or sequence composition analysis.
Usage
getDMRSequences(
dmrs,
genome,
use_online = FALSE,
uflank_size = 0,
dflank_size = 0,
batch_size = 100,
njobs = 1
)Arguments
- dmrs
GRanges object containing genomic coordinates of DMRs
- genome
Character. Genome version to use for sequence extraction, .e.g. "hg38" or "hs1".
- use_online
Logical. If TRUE, forces use of online UCSC API instead of BSgenome packages. If FALSE (default), uses BSgenome packages with online fallback when packages are unavailable (default: FALSE)
- uflank_size
Integer. Number of base pairs to add as flanking regions upstream of each DMR (default: 0)
- dflank_size
Integer. Number of base pairs to add as flanking regions downstream of each DMR (default: 0)
- batch_size
Integer. For online API, number of regions to process per batch (default: 100)
- njobs
Integer. For online API, number of cores for parallel processing (default: 1)
Details
The function first attempts to use genome-appropriate BSgenome packages:
hg19: BSgenome.Hsapiens.UCSC.hg19
hg38: BSgenome.Hsapiens.UCSC.hg38
hs1: BSgenome.Hsapiens.UCSC.hs1
mm10: BSgenome.Mmusculus.UCSC.mm10
mm39: BSgenome.Mmusculus.UCSC.mm39
If the required BSgenome package is not installed, the function raises an
error with installation instructions. Set use_online = TRUE to query
sequences from the UCSC Genome Browser REST API instead. The online method
processes sequences in batches with optional parallel processing for
improved performance with large datasets.
For large numbers of DMRs (>10k), consider using parallel processing by setting njobs > 1 when using the online API, or install the appropriate BSgenome package for much faster local sequence retrieval.
Examples
dmrs <- GenomicRanges::GRanges("chr1", IRanges::IRanges(100000, 100100))
# \donttest{
# Extract sequences for DMRs using BSgenome packages
sequences <- getDMRSequences(dmrs, "hg19")
# Force use of online UCSC API with parallel processing
sequences <- getDMRSequences(dmrs, "hg19", use_online = TRUE, njobs = 4)
# Calculate GC content
gc_content <- vapply(sequences, function(s) {
(stringr::str_count(s, "G") + stringr::str_count(s, "C")) / nchar(s)
}, numeric(1))
# }
