Skip to contents

Converts a methylation beta values file to a tabix-indexed BED format for faster random access during DMR analysis. The function uses a memory-efficient chunk-based approach to handle large files and can persist the derived tabix file next to analysis outputs when output_prefix is supplied.

Usage

convertBetaToTabix(
  beta_file,
  sorted_locs = NULL,
  array = c("450K", "27K", "EPIC", "EPICv2"),
  genome = "hg38",
  locations_file = NULL,
  output_file = NULL,
  chunk_size = 50000,
  njobs = 1,
  .bed_file = NULL,
  output_prefix = NULL
)

Arguments

beta_file

Character. Path to the input beta values file

sorted_locs

Data frame with genomic locations containing 'chr' and 'start' columns. If NULL, will be retrieved automatically using getSortedGenomicLocs() (default: NULL)

array

Character. Array platform type. Only used if sorted_locs is NULL (default: "450K")

genome

Character. Genome version. Only used if sorted_locs is NULL (default: "hg38")

locations_file

Character. Optional path to an explicit genomic locations file passed through to getSortedGenomicLocs().

output_file

Character. Path for the output tabix file. If NULL, a temporary file is used unless output_prefix is supplied.

chunk_size

Integer. Number of rows to process in each chunk (default: 50000)

njobs

Integer. Number of parallel jobs for sorting (default: 1)

.bed_file

Character. Internal precomputed BED path used to skip beta-to-BED conversion.

output_prefix

Character. Optional prefix used to persist derived tabix artifacts next to analysis outputs.

Value

Character. Path to the created tabix file, or NULL if conversion failed

Details

The function performs the following steps:

  1. Checks if tabix and bgzip tools are available in the system PATH

  2. Processes the beta file in chunks (50,000 rows at a time) to minimize memory usage

  3. Converts beta values to BED format with genomic coordinates

  4. Sorts, compresses (bgzip), and indexes (tabix) the file

  5. Persists the derived file if an explicit output path or output_prefix is provided

Examples

if (nzchar(Sys.which("tabix")) && nzchar(Sys.which("bgzip"))) {
    beta_file <- tempfile(fileext = ".tsv")
    writeLines(c("\tsample1", "cg1\t0.5"), beta_file)
    locs <- data.frame(chr = "chr1", start = 100L, row.names = "cg1")
    tabix_file <- convertBetaToTabix(beta_file, sorted_locs = locs)
}