Skip to contents

Retrieves the DNA sequences corresponding to genomic regions specified in a GRanges object. This function is useful for extracting the actual DNA sequence of identified DMRs for downstream analyses such as motif finding or sequence composition analysis.

Usage

getDMRSequences(
  dmrs,
  genome,
  use_online = FALSE,
  uflank_size = 0,
  dflank_size = 0,
  batch_size = 100,
  njobs = 1
)

Arguments

dmrs

GRanges object containing genomic coordinates of DMRs

genome

Character. Genome version to use for sequence extraction, .e.g. "hg38" or "hs1".

use_online

Logical. If TRUE, forces use of online UCSC API instead of BSgenome packages. If FALSE (default), uses BSgenome packages with online fallback when packages are unavailable (default: FALSE)

uflank_size

Integer. Number of base pairs to add as flanking regions upstream of each DMR (default: 0)

dflank_size

Integer. Number of base pairs to add as flanking regions downstream of each DMR (default: 0)

batch_size

Integer. For online API, number of regions to process per batch (default: 100)

njobs

Integer. For online API, number of cores for parallel processing (default: 1)

Value

A Character vector containing DNA sequences for each DMR

Details

The function first attempts to use genome-appropriate BSgenome packages:

  • hg19: BSgenome.Hsapiens.UCSC.hg19

  • hg38: BSgenome.Hsapiens.UCSC.hg38

  • hs1: BSgenome.Hsapiens.UCSC.hs1

  • mm10: BSgenome.Mmusculus.UCSC.mm10

  • mm39: BSgenome.Mmusculus.UCSC.mm39

If the required BSgenome package is not installed, the function raises an error with installation instructions. Set use_online = TRUE to query sequences from the UCSC Genome Browser REST API instead. The online method processes sequences in batches with optional parallel processing for improved performance with large datasets.

For large numbers of DMRs (>10k), consider using parallel processing by setting njobs > 1 when using the online API, or install the appropriate BSgenome package for much faster local sequence retrieval.

Examples

dmrs <- GenomicRanges::GRanges("chr1", IRanges::IRanges(100000, 100100))
# \donttest{
# Extract sequences for DMRs using BSgenome packages
sequences <- getDMRSequences(dmrs, "hg19")

# Force use of online UCSC API with parallel processing
sequences <- getDMRSequences(dmrs, "hg19", use_online = TRUE, njobs = 4)

# Calculate GC content
gc_content <- vapply(sequences, function(s) {
    (stringr::str_count(s, "G") + stringr::str_count(s, "C")) / nchar(s)
}, numeric(1))
# }