
Sort Beta File by Genomic Coordinates
Source:R/sort_beta_file_by_coordinates.R
sortBetaFileByCoordinates.RdThis helper function sorts a methylation beta values file by genomic coordinates (chromosome and position) as required by the buildDMRs function. The function reads the beta file, sorts the site sites according to their genomic positions using array annotation, and writes the sorted data to a new file.
Usage
sortBetaFileByCoordinates(
beta_file,
output_file = NULL,
array = c("450K", "27K", "EPIC", "EPICv2"),
genome = "hg38",
genomic_locs = NULL,
overwrite = FALSE
)Arguments
- beta_file
Character. Path to the input beta values file to be sorted
- output_file
Character. Path for the output sorted beta file (default: adds "_sorted" suffix)
- array
Character. Array platform type (default: "450K")
- genome
Character. Genome version (default: "hg38")
- genomic_locs
Data frame. Optional pre-computed genomic locations. If NULL, locations will be retrieved automatically (default: NULL)
- overwrite
Logical. Whether to overwrite existing output file (default: FALSE)
Details
The function performs the following steps:
Reads the beta values file
Loads the appropriate array annotation (450K or EPIC)
Sorts site sites by genomic coordinates (chr:start)
Writes the sorted data to a new file
Validates that the output is properly sorted
Note
If you want to convert to tabix, consider using the convertBetaToTabix function instead directly, sorting is done internally.
Examples
beta_file <- tempfile(fileext = ".tsv")
writeLines(c("sample1", "cg2\t0.2", "cg1\t0.1"), beta_file)
locs <- data.frame(
chr = c("chr1", "chr1"),
start = c(100L, 200L),
row.names = c("cg1", "cg2")
)
sorted_file <- sortBetaFileByCoordinates(beta_file, genomic_locs = locs)