-
Notifications
You must be signed in to change notification settings - Fork 1
Home
This project is aimed to develop a Bioinformatics library for common tasks in the field of Computational Biology using the BASH programming language. It is led by Andrés M. Pinzón , full time professor at Bioinformatics and Systems Biology Laboratory , Institute for Genetics - National University of Colombia in south America, and started as part of his 2022-2023 sabbatical leave.
Basically this library has been around for several years in our laboratory, as a bunch of routines programmed for common bioinformatics tasks such as dealing with FASTA headers and FQ files, as well as with manipulation of lists of genes etc.
After years of using this first version of BioBash it was clear that it was really, really useful for our common tasks but at the same time it was lacking on features and scope. Moreover, it was based on pure BASH programming language and I found myself re-inventing the wheel ...in BASH! So, since there are hundreds of useful and optimized bioinformatics tools, why create something like BioBash? I can think of at least to good answers for that:
- Not everything has been already done in computational biology, and there is still room for some improvement.
- We can take advantage of a whole universe of computational biology tools and make them even more accesible in a common bioinformatics environment: BASH. In this regard, BioBASH is build on top of a set of "Core Principles" that drive its development.
So one of the main aims of BioBash is to have a consistent interface for common analysis in the field, without re-inventing the wheel (it is re-using as much code as possible and interfacing already existent tools) and in a common ambient for Computational Biologists (e.i. BASH). In this regard, on one hand BioBash is a wrapper for several pre-existent bioinformatics tools, such as clustalw, seqtk, BWA, Bowtie, NCBI-BLAST etc, with a consistent interface for all of them. On the other hand it also provides brand new routines for file manipulation and other Bioinformatics-related tasks common in the field (such as dealing with lists), and for that regard uses core utils that should come with any UNIX-Like installation (please refer to BioBASH core principles for detailed information).
For example, if you have a list of genes in a text file and you want to know how many of these are unique genes, and how many are over-represented in the list, one way is to use common core BASH commands such as sort and unique, to obtain that information, OR use BioBASH and forget about all the command line options needed for each program.
Another example, if you have two multiple FASTA files, and you want to BLAST one to each other and see how they match (and perhaps plot the results), you can use NCBI-BLAST's formatdb command, create the database, and then use blastp or blastn (or any other variant), perform the alignment (with all the options necessary) and obtain your results. OR you can use BioBash and go for a cup of coffee, and let BioBash deal with routes, temporary files, re-naming, threading, plotting etc.
So I believe BioBash can make you more efficient through a consistent interface for several computational biology tools. For this all commands in BioBash behave, respond and output in a similar way, no matter what is happening behind scenes (please refer to BioBASH core principles for detailed information).
In practice BioBASH is first intended to be used for end users (researchers, students) as a way to speed up analysis by reducing the learning curve of BASH and the huge diversity of computational biology tools, providing the following:
- A consistent command line interface (CLI).
- A super simple installation procedure.
- Ready to use BioBASH programs targeted for useful and common Bioinformatics tasks.
- A detailed documentation.
- Support for most common UNIX-Like operating Systems (sadly we stopped OSX support since 0.2.1 version).
On the other hand, BioBASH can also be seen as a Computational Biology library (somehow "similar" to BioPython or BioPERL) that provides modules of functions that can be used for the development of new BioBASH programs (scripts). Actually this library was first developed as a way to provide the end-user BioBASH programs (for more information refer to BioBASH official documentation).
Although BioBASH is perhaps the most complete Bash library for Bioinformatics, this is not really a new idea in the community, several other projects under the same name has been started (and abandoned) with the same name, as we also did with our first version of this library that we started developing around 2019. Thus, to our knowledge amongst all those biobash projects, the only worth mentioning is Simon Frost's Biobash, really nice scripts although it was poorly documented, not structured as a useful library but more like a group of independent scripts, and was also abandoned around 2018.
BB is supported for other third party libraries and coding standards worth mentioning.
A library call SHML (Shell Markup Language) is used for "stylizing" shell output. All coloring, font sizes, icons, emojis etc., used in our scripts are possible thanks to SHML.
Another library used behind the scenes by BB is Bash-Utility, which provides a series of functions and helpers for Bash programming that saves you much time and effort.
Process_optargs is also used through all BioBASH for managing function flags and key/values.
A key component on any coding project is to follow a good coding standard and do your best to implement good programming practices into your code. This makes code more accesible to anyone willing to contribute, makes debug easier (bugs are less common) and also helps to speed up the generation of documentation.
The standards followed by BB are the Shell Style Guide suggested by the Google Style Guides community, as well as the ones suggested by Jeff Lindsay at Progrium. I think these are must follow rules for everyone programming in BASH.
Code documentation is automatically generated with another amazing library called shdoc.
In case this can be interesting to anyone, this project has been developed both under OSX (using Parallels and Lubuntu 24) and Linux machines (depending if I am at home or at work) using VSCode as code editor, supported by the following plug-ins:
- Bash IDE
- Bats
- indent-rainbow
- shell-format
- ShellCheck
Several sources were used while developing BioBash (apart from the Third Party libraries above). Some of these are: