Skip to contents

This helper function sorts a methylation beta values file by genomic coordinates (chromosome and position) as required by the buildDMRs function. The function reads the beta file, sorts the site sites according to their genomic positions using array annotation, and writes the sorted data to a new file.

Usage

sortBetaFileByCoordinates(
  beta_file,
  output_file = NULL,
  array = c("450K", "27K", "EPIC", "EPICv2"),
  genome = "hg38",
  genomic_locs = NULL,
  overwrite = FALSE
)

Arguments

beta_file

Character. Path to the input beta values file to be sorted

output_file

Character. Path for the output sorted beta file (default: adds "_sorted" suffix)

array

Character. Array platform type (default: "450K")

genome

Character. Genome version (default: "hg38")

genomic_locs

Data frame. Optional pre-computed genomic locations. If NULL, locations will be retrieved automatically (default: NULL)

overwrite

Logical. Whether to overwrite existing output file (default: FALSE)

Value

Character. Path to the sorted output file

Details

The function performs the following steps:

  1. Reads the beta values file

  2. Loads the appropriate array annotation (450K or EPIC)

  3. Sorts site sites by genomic coordinates (chr:start)

  4. Writes the sorted data to a new file

  5. Validates that the output is properly sorted

Note

If you want to convert to tabix, consider using the convertBetaToTabix function instead directly, sorting is done internally.

Examples

beta_file <- tempfile(fileext = ".tsv")
writeLines(c("sample1", "cg2\t0.2", "cg1\t0.1"), beta_file)
locs <- data.frame(
    chr = c("chr1", "chr1"),
    start = c(100L, 200L),
    row.names = c("cg1", "cg2")
)
sorted_file <- sortBetaFileByCoordinates(beta_file, genomic_locs = locs)