A refactoring of the Adaptive DNA Storage Codec (ADS Codex) in "modern" C++.
- External Dependencies
- Installation
- Running the Program
- Features
- Documentation
- Plans for Future Releases
- Acknowledgments
-
zlib for C++: a library for handling .gz files, necessary to support direct reading of compressedFASTQs. More information about zlib here.
-
CMake for Building: You can download CMake from cmake.org or with your package manager.
-
Doxygen for Documentation: used for automatically generating documentation from source code comments. More information about Doxygen can be found here.
Note that the Doxygen documentation for this project adheres to the style guide available here.
-
Clone the DecodeNcodeAnything repository:
git clone https://github.com/rdnajac/DecodeNcodeAnything.git cd DecodeNcodeAnything
-
Create a build directory:
mkdir build cd build
-
Configure the project with CMake:
On Debian-based systems:
cmake ..
On Windows (Make sure your MinGW-x64 version is newer than 11.0.0):
cmake -G "MinGW Makefiles" ..
-
Build the project:
make
Adjust the build commands based on your specific build system or requirements. Alternatively, you can run the default installation script: scripts/build.sh
from the repo's top-level directory.
To generate documentation using Doxygen, follow these steps:
-
Install Doxygen:
Ensure that Doxygen is installed on your system. If it's not installed, you can typically install it using your package manager.
For example, on Debian-based systems:
sudo apt-get install doxygen
On Windows system:
git clone https://github.com/doxygen/doxygen.git
Refer to the Doxygen installation guide for more details.
-
Navigate to the project root:
cd /path/to/DecodeNcodeAnything
-
Run the documentation generation script:
On Debian-based systems
scripts/gen_docs.sh
This script generates a new Doxyfile, configures it, and runs Doxygen to generate documentation in the
./docs
folder.On Windows system:
Just use Visual Studio to construct the project. Then it will generate a doxygen.exe.
-
Access the documentation:
Open the generated documentation by navigating to the specified output directory:
cd /path/to/DecodeNcodeAnything/docs
Open the
index.html
file in a web browser to explore the generated documentation.
After cloning and building the the program, the executables (including test programs) are located in the build directory. Running the program is simple:
./build/app/encode <file-to-be-encoded>
or alternatively,
./build/app/encode <file-to-be-decoded>
The decoder expects the FASTQ files while the encoder can handle any readable file.
Library written in C++ for module export.
Reed–Solomon Error Correction is a mathematical technique that allows the correction of errors in transmitted or stored data to enhance reliability and robustness. It is widely used in various applications, including data storage, QR codes, and digital communication.
Resources for understanding Reed–Solomon error correction:
TODO: Implement Google's testing framework.
Example:
#include <gtest/gtest.h>
TEST(ADSCodexTest, EncodingTest) {
// Test encoding functionality of the ADS Codex
// ...
ASSERT_TRUE(/* Some condition indicating success */);
}
TEST(ADSCodexTest, DecodingTest) {
// Test decoding functionality of the ADS Codex
// ...
ASSERT_TRUE(/* Some condition indicating success */);
}
// Add more tests as needed...
int main(int argc, char **argv) {
::testing::InitGoogleTest(&argc, argv);
return RUN_ALL_TESTS();
}
- Performance Optimization with Lookup Tables:
- Introduce and leverage lookup tables for performance optimization. Lookup tables can enhance the efficiency of certain operations, contributing to faster encoding and decoding processes.
- Abstract Interface for Oligo Viability Criteria (H4G2):
- Introduce an abstract interface for evaluating the viability of oligonucleotides based on specific criteria. One such criteria, denoted as H4G2, prevents the inclusion of oligos with homopolymers longer than 4 nucleotides (for A, T, and C) or 2 nucleotides for G. Developers can extend this interface to implement custom viability criteria.
- Expanded Documentation:
- Enhance and expand the documentation to provide comprehensive guidance on usage and potential extensions.
These plans are subject to change based on community feedback and project priorities. Stay tuned for updates and announcements related to future releases.
If you have specific features or improvements you would like to see in future releases, feel free to contribute to the discussion on our GitHub repository or open a new issue.
- Adaptive DNA Storage Codec (ADS Codex): The foundation for this project.
- Google Test: For providing a robust testing framework.
- CMake: Used for building the project.
- Doxygen: Used for generating documentation from source code comments.
- Illumina: For contributions to DNA sequencing technology.
- Oxford Nanopore Technology: For advancements in nanopore sequencing.
- Kilobaser: For innovations in DNA synthesis technology.
- Bjarne Stroustrup: For his foundational contributions to C++ and for his guidance in this project.