Converts a methylation beta values file to a tabix-indexed BED format
for faster random access during DMR analysis. The function uses a memory-efficient
chunk-based approach to handle large files and can persist the derived tabix file
next to analysis outputs when output_prefix is supplied.
Usage
convertBetaToTabix(
beta_file,
sorted_locs = NULL,
array = c("450K", "27K", "EPIC", "EPICv2"),
genome = "hg38",
locations_file = NULL,
output_file = NULL,
chunk_size = 50000,
njobs = 1,
.bed_file = NULL,
output_prefix = NULL
)Arguments
- beta_file
Character. Path to the input beta values file
- sorted_locs
Data frame with genomic locations containing 'chr' and 'start' columns. If NULL, will be retrieved automatically using getSortedGenomicLocs() (default: NULL)
- array
Character. Array platform type. Only used if sorted_locs is NULL (default: "450K")
- genome
Character. Genome version. Only used if sorted_locs is NULL (default: "hg38")
- locations_file
Character. Optional path to an explicit genomic locations file passed through to
getSortedGenomicLocs().- output_file
Character. Path for the output tabix file. If NULL, a temporary file is used unless
output_prefixis supplied.- chunk_size
Integer. Number of rows to process in each chunk (default: 50000)
- njobs
Integer. Number of parallel jobs for sorting (default: 1)
- .bed_file
Character. Internal precomputed BED path used to skip beta-to-BED conversion.
- output_prefix
Character. Optional prefix used to persist derived tabix artifacts next to analysis outputs.
Details
The function performs the following steps:
Checks if tabix and bgzip tools are available in the system PATH
Processes the beta file in chunks (50,000 rows at a time) to minimize memory usage
Converts beta values to BED format with genomic coordinates
Sorts, compresses (bgzip), and indexes (tabix) the file
Persists the derived file if an explicit output path or
output_prefixis provided
Examples
if (nzchar(Sys.which("tabix")) && nzchar(Sys.which("bgzip"))) {
beta_file <- tempfile(fileext = ".tsv")
writeLines(c("\tsample1", "cg1\t0.5"), beta_file)
locs <- data.frame(chr = "chr1", start = 100L, row.names = "cg1")
tabix_file <- convertBetaToTabix(beta_file, sorted_locs = locs)
}
