The F# BioProviders simplify programmatic access to bioinformatics data.
This library provides strongly-typed access to over 240 million genomic sequences through a set of Type Providers, including the GenBankProvider and RefSeqProvider. For more information, see the detailed documentation.
The BioProviders work by parsing genomic data files using the .NET Bio library, which are then represented using types from the BioFSharp library.
Below, a simple example of finding the complement of the genomic sequence of a Staphylococcus lugdunensis assembly is provided.
#r "nuget: BioProviders"
open BioProviders
open BioFSharp
let [<Literal>] Species = "Staphylococcus lugdunensis"
let [<Literal>] Accession = "GCA_001546615.1"
let genome = GenBankProvider<Species, Accession>.Genome()
genome.Sequence |> BioSeq.complement
The above code produces the result:
BioSeq.BioSeq<Nucleotides.Nucleotide> = seq [C; T; A; C; ...]
To build the BioProviders package, perform the following steps:
- Install the .NET SDK specified in the global.json file
build.sh -t Build
orbuild.cmd -t Build
BioProviders uses a set of data files generated from assembly lists from the NCBI FTP server for species and assembly lookup.
- To generate these files, run
dotnet fsi DataFileGenerator.fsx
, to save the files tobuild\data
.- Approximately 1 GB is required due to the download size. They are deleted on process completion; use the argument
-keepDownloads
to keep them. - To save the files in the type provider's cache folder, use the argument
-saveToCache
.
- Approximately 1 GB is required due to the download size. They are deleted on process completion; use the argument
- By default, the package downloads files from this repository to
AppData\Local\Temp\BioProviders
. To change this for your own version, change the URL in the fileremote.txt
insrc\DesignTime
.
The BioProviders package code is formatted using fantomas.
- To format the code, run
build.sh -t Format
orbuild.cmd -t Format
- To check formatting, run
build.sh -t CheckFormat
orbuild.cmd -t CheckFormat
BioProviders is covered by the MIT license.
The package also uses:
- BioFSharp - MIT license
- .NET Bio - Apache-2.0 license
- FluentFTP - MIT license
- FSharp.Data - Apache-2.0 license
Current maintainers are Alex Kenna, Samuel Smith and James Hogan.