- Refactor code to modern Python (black, mypy, isort)
- Added Github Actions
- Added pre-commit hooks
- Changed the build configuration to Poetry
- Added a Docker container
- Changed tests to pytest
- Updated versions of dependencies
- Performed code optimizations, refactorings and clean ups
- Added more tests for almost full coveragge
- Bumped taggd to 0.4.0
- Changed documentation
- Added --demultiplex-chunk-size option
- Added annotation (htseq) feature type as parameter
- Fix a small bug when having the UMI before the BARCODE
- Improved the unit-tests
- Make input data a positional arguments in extra scripts
- --ref-map is not a required parameter now when --disable_mapping is used
- Create dataset and compute saturation steps are only performed if the annotated file is present
- Fixed bug in merge_fastq.py
- Fixed bugs in st_qa.py
- Updated docs
- Added flags to skip (trimming, mapping and annotation)
- Fixed a bug in the compute saturation option
- Restored the unittests
- Improvements when dealing with ambiguous and no_feature genes
- Ported to Python 3
- Fixed a small bug when soring the output of taggd
- Improved the st_qa.py script
- Removed parallel code for un-necessary parts
- Added option to provide saturation points
- Improved the st_qa.py script
- Few small improvements in the annotation step
- Added support to use a transcriptome (--transcriptome)
- Fixed a error in the parameters
- Added option to disable the barcode demultiplexing step
- Added option to disable the UMI filtering step
- Made the parsing of unique UMIs gene by gene and parallel
- AdjacentBi is now the default method for UMI counting
- Made the trimming function output the trimmed R2 in BAM format with the barcode and UMI
- Made the mapping function works with a BAM file as input (latest STAR release)
- Made the annotation function parallel
- Made the quality step parallel
- Improvements in speed and memory (constant memory use)
- The STAR genome loading strategy can be now set
- Added an affinity based method to cluster UMIs
- Added option to set the STAR BAM sort memory limit
- Fixed a bug that will output the matrix of counts inverted
- st_qa.py generate expression heatmap plots
- Fixed a minor bug in the computation of the saturation curves
- adjust_matrix_coordinates now does not update the coordinates by default
- adjust_matrix_coordiantes works with the latest ST Spot detector format
- small updates in the fastq merging script
- relaxed a bit the restriction checks for some parameters
- Added extra scripts:
- merge_bcl : merge BCL files based in patterns
- filter_gene_type_matrix : filter gene in output data based on Ensembl gene types
- Bumped Pysam and HTSeq
- Small update to make the PIP installation more robust
- Small update to make the PIP installation more robust
- Optimized the counting of UMIs by strand, start-pos and offset
- Fixed a typo in one of the parameter that caused the pipeline to not run
- Disabled spliced alignments by default
- Optimized the mapping and annotation steps
- Homopolymer miss-matches is a parameter now
- Removing now also PolyNs (parameter)
- Added more methods to cluster UMIs
- Optimized the UMI counting algorithm
- Optimized the memory use
- Take into account soft-clipped bases when computing start/end positions
- Changed the limit range of some parameters
- Fixed small bugs
- Small improvements in st_qa.py and convertEnsemblToNames.py
- Bumped TaggD version
- Added more stats to the dataset output
- Added scripts to compute stats
- Added new option for TaggD
- Fixed bugs in convertEnsemblToNames
- Added some parameters for TaggD demultiplexing
- Bumped version of TaggD
- Made homopolymers filters enabled by default
- Added a test dataset to the docs
- Fixed a small bug in the deletion of the tmp folder
- Make sure to remove tmp files even if an error happens
- Fixed bug that would leave some files in /tmp
- Allowed mis-matches when removing adaptors is now 2
- Removed some un-necessary parameters
- Simplified the two pass mode
- Added flag to discard reads mapping to anti-sense strand
- Parameters for GC content filter instead of using the same value as AT content filter
- Fixed a small bug in the logging of some parameters
- When removing adaptors (homopolymers streches) allow to up to 3 missmatches
- Added GC content filter (same % as AT content)
- Fixed a minor bug in the counting of UMIs or - strand
- If no temp folder is given a new unique one is created on top of the execution folder
- integrate createDataset.py into the code of the pipeline
- Adjusted some parameters names and descriptions (no UMI is default)
- Added sliding window when counting unique molecules
- Added support to bzip
- Fixed small bug in the parsing of the umi quality parameter
- Added option to check for UMI quality
- Optimized the UMI template check code
- Optimized how the unique molecules are counted
- Better stats for the quality filter step
- Updated convertEnsemblToNames script
- Updated stringdocs
- Small bug fixes
- Fixed a bug with the non ambiguous option
- Fix a bug in the saturation computation
- When a R2 is trimmed its correspondant R1 is trimmed as well
- Fixed a stupid bug in the compute saturation option
- Changed the rRNA filter so the BAM output does not need to be sorted
- Fixed a bug in the parsing of parameters
- Fixed a small bug with the location of discarded files
- Replaced JSON for data frame in the output format
- Replaced python gzip for system call (faster)
- Changed the logic of how the filenames are stored and handled
- Improved the error messages and error handling
- Removed barcodes IDs from the output file
- Updated comments, manual and license
- Small improvements
- Fixed a bug in the computation of saturation curves
- Added a normal hash with INT keys to increase speed and reduce memory
- Using the gene_id for annotation again
- Added parameter for strandness in annotation (yes by default)
- Simplified a bit the quality trimming step (do not account for user input trimmed bases)
- Added stats for annotated reads
- Replaced shelve dict for sqldict
- Fixed some small bugs in the annotation
- Removed the pair mode keep option
- Removed un-neccessary pair mode and mapped checks after alignment
- Added option to do the STAR 2 pass mode
- Removed option to run pipeline without IDs
- Speed improvements
- Perform demultiplex after mapping
- No attaching the barcode to reverse reads
- Removing some parameters
- Some improvements in stDataPlotter
- Option to use BAM format
- Removed annotation filtering step
- Removed forward trimming parameters
- Output gene names even with ENSEMBL
- Small memory improvements
- Updates in plotting script
- End coordinates now contain the whole read length
- Make annotation strand aware (reverse)
- Updated to STAR 2.5
- Fixed a small bug
- Added some memory improvements
- Added parameters for inverse trimming
- Memory and speed optimizations in createDatasets
- Added option for low_memory use
- Added unique genes to saturation points
- Added option to keep non-annotated reads
- Fixed some small bugs
- Fixed a bug in the saturation points
- Removed counttrie as option for clustering
- Updated and improved CTTS scripts
- Updated datfa plotter color list
- Fixed a bug in the saturation points
- Improved speed and memory in createDatasets
- Changed saturation points to fixed values that grow exp
- Improved speed in computation of saturation points
- Small bug fixes
- Upgraded json2Scatter with many improvements
- Rename json2scatter to stDataPlotter
- Fixed a bug in the hierarchical clustering
- Added the input parameter to qa_stats
- Append experiment name to output files
- Added option to compute saturation points
- Added tool to plot stdata and clusters with aligned image
- Fixed a bug in the hierarchical clustering
- Fixed a bug in the printed stats
- Fixed a bug in retrieving the version of the software
- Added time stamps in different steps
- Added a UMI template quality filter
- Fixed a bug in counttrie clustering method
- Improved sorting of molecular barcodes prior clustering
- Added hiearachical clustering option
- Removed reads.json
- Added qa_stats.json to the output
- Restored old versioning system
- Removed hadoop related stuff
- Added support for gziped input files
- Improved the log a bit
- Added parameters for max,min intron size and max gap size
- Fixed some bugs in the prefix trie
- Added an option to find molecular barcodes clusters using a prefix trie
- Fixed a bug in the function to retrieve the pipeline version
- Fixed a bug with --disable-multimap option
- Fixed a typo in a parameter
- Fixed a bug that caused some parameters to not work
- Added some extra debugging info in createDatasets
- Output the read name in the BED output file
- Changed --allowed-kimera for --allowed-kmer
- Added version as parameter and log message
- Added parameter to disable soft clipping in mapping
- Disable softclipping in rRNA filter
- Make sure that discarded reads after rRNA filter are replaced by Ns
- Improved stats info a bit
- Bumped Taggd to 0.2.2
- Fixed a bug in the rRNA filter that would cause to not discard rRNA mapped reads
- Added check when UMI is the same as barcode
- Added more stats
- Added percentiles distributiosn stats for createDAtaset
- Added support for BAM and SAM (not functional now)
- Added option to disable multiple aligned reads
- Fixed a bug in the bed file
- Added AT content filter in quality trimming
- Added min mapped length filter after mapping
- Make sure one of the multiple aligned reads is set as not multiple aligned so it can be annotated
- Discard the other multiple aligned reads after mapping
- Disable sorting
- Restored back to use gene_id as column for annotation
- Changed naming convention
- Added support for normal RNA analysis
- Improved STAR configuration
- Added mapping post processing to filter out and adjust reversed reads
- Changed to use gene_name for annotation
- Fixed some bugs and some improvements
- Fixed bugs in the trimming
- Improved stats
- Fixed a bug that would remove original input files
- Added a script to convert ENSEMBL ids to gene names
- Fixed a bug that would not compute the number of discarded reads when using molecular barcodes
- Fixed a bug in the barcodes JSON output
- Fixed a bug in the molecular barcodes algorithm
- Fixed a bug that would keep the original fastq reads in the system
- Update taggd version
- Small improvements with error checking and log in the mapping
- Fixed a bug that would remove the file after filtering annoted reads
- Make the sorting by name instead by position due to a bug in htseq-count
- Fixed a bug in the capture of parameters
- Improved the logs
- Fixed few bugs
- Added back taggd
- Added BED file to output
- Added STAR
- Optimized workflow
- do rRNA filter first
- Optimized annotation
- Optimized trimming
- Output reads do not contain duplicates
- Allowing molecular barcodes to be before the barcodes
- Added back findIndexes
- Removed cutadapt dependency
- Fixed a bug in the installation
- Added options to remove PolyC fix bugs in adaptors removal
- Added test for STAR and STAR binary to dependencies
- Added TAGGD and removed findIndexes
- Improved install script
- Added options to remove adaptors (PolyA, PolyT and PolyG)
- Exchanged Bowtie as primary mapper with STAR.
- Added option to keep files with discarded reads/barcodes
- Internal refactoring and optimization
- Outputted reads JSON now only has the portion of the read that was used to map
- Cutadapt is integrated but only using the quality trimming for now
- Internal refactoring and optimizations
- Added small unit-test for molecular barcodes
- Added more molecular barcodes algorithms (using a naive one for now)
- Fixed small issues in JSON parsing libraries
- Rewrite createDatasets.py
- Clean up repository and deprecated files
- Change the unit-test library and structure
- Refactor the unit-test (use pipeline API instead of command line calls)
- Ensure unit-test remove tmp files when failing
- Add better error handling
- Add unit-test for Molecular Barcodes
- Add Molecular Barcodes functionality
- General refactor and clean up
- Add invoke options (clean, build, install)
- Fix an important bug in createDatasets that caused incorrect computation of reads counts
- Improved installers
- Small bug fixes
- Added basic uni-test to do a run of the pipeline
- Some optimizations and bug fixes
- Fixed a error with new version of HTSeq-count that will discard more reads
- Added extra parameters
- Fixed some typos
- Fixed a bug that caused to remove some bases from the barcode ID in the rw reads
- code refactored and modularized
- add argparse for parameters parsing
- add API for Amazon EMR and terminal version
- better error handling
- optimized code
- new version of FindIndexes
- remove dependencies
- added proper installers and documentation