Skip to contents

Extracts motif frequencies around site sites within DMRs. For each DMR, retrieves sequences around the start and end site sites, calculates base frequencies at each position, and stores the results in the DMR metadata.

Usage

extractDMRMotifs(
  dmrs,
  genome = "hg38",
  array = "450k",
  beta_locs = NULL,
  motif_site_flank_size = 5,
  plot_dir = NULL
)

Arguments

dmrs

Dataframe or GRanges object containing DMR coordinates and site indices

genome

Character. Genome version to use for sequence extraction. Defaults to hg38.

array

Character. Array platform type (e.g., "450K", "EPIC"). Ignored if input is not array-based. (default: "450K")

beta_locs

Data frame. Optional pre-computed genomic locations. If NULL, locations will be retrieved using getSortedGenomicLocs (default: NULL)

motif_site_flank_size

Integer. Number of base pairs to include as flanking regions around each site site (default: 5)

plot_dir

Character. Optional directory where diagnostic motif plots may be written.

Value

The input Dataframe/GRanges object with an additional metadata column:

  • pwm: A matrix of base frequencies (rows: positions relative to site, columns: bases A, C, G, T)

  • consensus_seq: A character string representing the consensus sequence derived from the PWM

Examples

# Extract motif frequencies for DMRs
dmrs <- data.frame(
    chr = c("chr16", "chr3"),
    start = c(53468112, 37459206),
    end = c(53468712, 37493431),
    start_site = c("cg00000029", "cg00000108"),
    start_seed = c("cg00000029", "cg00000108"),
    end_site = c("cg13426503", "cg08730726"),
    end_seed = c("cg13426503", "cg08730726"),
    seeds = c("cg00000029,cg13426503", "cg00000108,cg08730726")
)
dmrs_with_motifs <- extractDMRMotifs(dmrs, genome = "hg38", array = "450K")
# Access motif frequencies for the first DMR
motif_freqs_dmr1 <- dmrs_with_motifs$pwm[[1]]