-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance Numeric Data Inspection and Introduce Positive/Negative Filtering #217
Conversation
for more information, see https://pre-commit.ci
include zero
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
@jalr4ever Please help me review this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggestions in the PR are optional changes.
Modified the code according to the suggestions in the code review, and all unit tests have passed. |
@@ -14,69 +14,132 @@ class NumericInspector(Inspector): | |||
|
|||
This class is a subclass of `Inspector` and is designed to provide methods for inspecting | |||
and analyzing numeric data. It includes methods for detecting int or float data type. | |||
|
|||
In August 2024, we introduced a new feature that will continue to judge the positivity or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should indicate the PR and release version here, rather than the date?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I also have another branch in development, I'll release after merging another PR. Due to various reasons, we haven't released a new version for a long time :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nerver mind, thanks for your work!
Enhance NumericInspector and Implement PositiveNegativeFilter
Description
This PR introduces significant enhancements to the Synthetic Data Generator (SDG) framework, specifically in the
NumericInspector
class and the addition of a newPositiveNegativeFilter
class. TheNumericInspector
has been updated to support the identification of both positive and negative numeric columns, improving the quality of synthetic data generation. ThePositiveNegativeFilter
class is designed to filter data based on the positivity or negativity of values in specified columns, ensuring that the integrity of the data is maintained during processing.Key changes include:
NumericInspector
to classify columns as positive or negative based on defined thresholds.PositiveNegativeFilter
to enforce positivity or negativity constraints on specified columns during data processing.Motivation and Context
The motivation behind these changes is to enhance the data quality assurance mechanisms within the SDG framework. By allowing the identification of positive and negative columns, we can ensure that the synthetic data generated meets specific criteria, which is crucial for various applications such as model training and data sharing. This change addresses the need for more robust data validation and filtering capabilities, ultimately leading to better performance and reliability of the generated synthetic data.
How has this been tested?
The changes have been thoroughly tested using a dedicated test suite. The following tests were performed:
NumericInspector
to ensure correct identification of positive and negative columns.PositiveNegativeFilter
to verify that it correctly filters data based on the positivity and negativity of values in specified columns.Types of changes
Checklist: