Skip to content

tdhock/datatable-foverlaps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overlap Death Match!

Intersect/overlap of genomic data (possibly from bed/bedGraph files) is implemented by

Demonstration

See demo.Rterm for the terminal output during my talk.

Which is fastest?

Recent versions of these packages are all pretty fast, see slides for details. The only big winner is data.table::fread, which is much faster than read.table or rtracklayer::import for reading big bed/bedGraph files.

Do they all give the same results?

They all give the correct results, if used correctly. The only issue is that chromStart is 0-based and chromEnd is 1-based in bedGraph files, so you need to use chromStart+1 to get correct results in R. More specifically, if you read a bed file into R as a data.frame with columns chrom, chromStart, chromEnd, you need to use IRanges(chromStart+1L, chromEnd) or data.table(chromStart=chromStart+1L, chromEnd) as input to findOverlaps/foverlaps.

How to reproduce these results?

The bedGraph files are big so I did not put them online anywhere, which makes it impossible to re-do the timings in TF.benchmark.RData.

However a subset of the data is available:

About

benchmarking foverlaps

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published