- Completely restructured taxonomic data processing:
- Introduced new modular functions for taxa handling in model-taxa.R
- Added efficient batch processing for species matching
- Implemented optimized FAO area retrieval system
- Streamlined length-weight coefficient calculations
- Enhanced integration with FishBase and SeaLifeBase
- New taxonomic processing functions:
load_taxa_databases()
: Unified database loading from FishBase and SeaLifeBaseprocess_species_list()
: Enhanced species list processing with taxonomic ranksmatch_species_from_taxa()
: Improved species matching across databasesget_species_areas_batch()
: Efficient FAO area retrievalget_length_weight_batch()
: Optimized length-weight parameter retrieval
- Enhanced performance through batch processing
- Reduced API calls to external databases
- Better error handling and input validation
- More comprehensive documentation
- Improved code organization and modularity
- Removed legacy taxonomic processing functions
- Deprecated redundant species matching methods
- Removed outdated data transformation utilities
- Added detailed function documentation
- Updated vignettes with new workflows
- Improved code examples
- Enhanced README with new features
- Enhanced taxonomic and catch data processing capabilities:
- Added comprehensive functions for species and catch data processing
- Implemented length-weight coefficient retrieval from FishBase and SeaLifeBase
- Created functions for calculating catch weights using multiple methods
- Added new data reshaping utilities for species and catch information
- Extended Wild Fishing (WF) survey validation with detailed quality checks
- Updated cloud storage and data download/upload functions
- Complete overhaul of the data pipeline architecture
- Added PDS (Pelagic Data Systems) integration:
- New trip ingestion and preprocessing functionality
- GPS track data processing capabilities
- Implemented MongoDB export and storage functions
- Removed renv dependency management for improved reliability
- Updated Docker configuration for more robust builds
- Enhanced validation system for survey data
- Added new data processing steps:
- GPS track preprocessing
- Catch data validation
- Length measurements validation
- Market data validation
- Flexible data export capabilities
- Improved GitHub Actions workflow with additional processing steps
- Streamlined package dependencies
- Updated build and deployment processes
- Enhanced data storage and retrieval mechanisms
- All the functions are now documented and indexed according to keywords
- Thin out the R folder gathering functions by modules
- Move to parquet format rather than CSV/RDS
Added the validation step and updated the preprocessing step for wcs kobo surveys data, see preprocess_wcs_surveys()
and validate_wcs_surveys()
functions. Currently, validation for catch weight, length and market values are obtained using median absolute deviation method (MAD) leveraging on the k parameters of the univOutl::LocScaleB
function.
In order to accurately spot any outliers, validation is performed based on gear type and species.
N.B. VALIDATION PARAMETERS ARE NOT YET TUNED
No need to run the pipeline every two days, decreased not to every 4 days.
Drop parent repository code (peskas.timor.pipeline), add infrastructure to download WCS survey data and upload it to cloud storage providers
- The ingestion of WCS Zanzibar surveys is implemented in
ingest_wcs_surveys()
. - The functions
retrieve_wcs_surveys()
downloads WCS Zanzibar surveys data
- Updated configuration management:
- Moved configuration settings to inst/conf.yml
- Improved configuration structure and organization
- Enhanced configuration flexibility