All notable changes to the Datawaza project will be documented in this file.
- New functions added to Model module (
- compare_models() - Find the best classification model and hyper-parameters for a dataset.
- create_nn_binary() - Create a binary classification neural network model.
- create_nn_multi() - Create a multi-class classification neural network model.
- eval_model() - Produce a detailed evaluation report for a classification model.
- plot_acf_residuals() - Plot residuals, histogram, ACF, and PACF of a time series ARIMA model.
- plot_train_history() - Plot the training and validation history of a fitted Keras model.
- New functions added to Explore module (
- plot_scatt() - Create a scatter plot using Seaborn's scatterplot function.
- print_ascii_image() - Print an ASCII image from one or more tensors.
- New functions added to Tools module (
- DebugPrinter - Conditionally print debugging information during the execution of a script.
- model_summary() - Create a DataFrame summary of a Keras model's architecture and parameters.
- Package configuration
- - Support for Python 3.9 - 3.12, modified requirements, separate [doc] and [test] tags.
- Because Cartopy does not support Python 3.8, and that's a dependency for
, 3.8 is not supported.
- Because Cartopy does not support Python 3.8, and that's a dependency for
- Additional dependencies and updated minimum versions: importlib_resources, scikeras, xgboost, imbalanced-learn, tensorflow, keras, pytorch. See requirements.txt for the full list
- - Support for Python 3.9 - 3.12, modified requirements, separate [doc] and [test] tags.
- Explore module (
- plot_map_ca - Detect Python version. To get path to package 'data' directory that stores map files, use importlib.resources for >= 3.10, otherwise importlib_resources
- Model module (
- eval_model() - Changed logic for handling class labels/display names to now use class_map dictionary. Bug fixes.
- iterate_model() - Added ability to do Random Grid Search.
- plot_results() - Added ability to switch from line chart to bar chart.
- Issue with KerasClassifier calling OneHotEncoder with the 'sparse' parameter solved in SciKeras 0.13.0
- Minor bug fixes and changes for compatibility with library updates
- Updated documentation and User Guide notebook
First pre-release to test package installation.
- Explore module ( for data exploration and visualization:
- get_corr() - Display the top n positive and negative correlations with a target variable in a DataFrame.
- get_outliers() - Detects and summarizes outliers for the specified numeric columns in a DataFrame, based on an IQR ratio.
- get_unique() - Print the unique values of all variables below a threshold n, including counts and percentages.
- plot_3d() - Create a 3D scatter plot using Plotly Express.
- plot_charts() - Display multiple bar plots and histograms for categorical and/or continuous variables in a DataFrame, with an option to dimension by the specified hue.
- plot_corr() - Plot the top n correlations of one variable against others in a DataFrame.
- plot_map_ca() - Plot longitude and latitude data on a geographic map of California.
- Clean module ( for data cleaning:
- convert_data_values() - Convert mixed data values (ex: GB, MB, KB) to a common unit of measurement.
- convert_dtypes() - Convert specified columns in a DataFrame to the desired data type.
- convert_time_values() - Convert time values in specified columns of a DataFrame to a target format.
- reduce_multicollinearity() - Reduce multicollinearity in a DataFrame by removing highly correlated features.
- split_outliers() - Split a DataFrame into two based on the presence of outliers.
- Model module ( for model iteration and evaluation:
- create_pipeline() - Create a custom pipeline for data preprocessing and modeling.
- create_results_df() - Initialize the results_df DataFrame with the columns required for iterate_model.
- iterate_model() - Iterate and evaluate a model pipeline with specified parameters.
- plot_results() - Plot the results of model iterations and select the best metric.
- Tools module ( with helper functions:
- LogTransformer - Apply logarithmic transformation to numerical features.
- calc_pfi() - Calculate Permutation Feature Importance for a trained model.
- calc_vif() - Calculate the Variance Inflation Factor (VIF) for each feature.
- check_for_duplicates() - Check for duplicate items (ex: column names) across multiple lists.
- extract_coef() - Extract feature names and coefficients from a trained model.
- format_df() - Format columns of a DataFrame as either large or small numbers.
- log_transform() - Apply a log transformation to specified columns in a DataFrame.
- split_dataframe() - Split a DataFrame into categorical and numerical columns.
- thousand_dollars() - Format a number as currency with thousands separators on a matplotlib chart axis.
- thousands() - Format a number with thousands separators on a matplotlib chart axis.
- Documentation site (
- API module documentation based on Sphinx and
- User Guide in the form of a Jupyter notebook with examples of every function
- Test cases via Doctest examples in each function, minimal coverage
- 2023-08-20: Initial setup of the
package structure