-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python] support Dataset.get_data for Sequence input. #4472
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cyfdecyf Many thanks for this PR! Please check my initial thoughts below:
160c1de
to
37b781b
Compare
elif isinstance(self.data, Sequence): | ||
self.data = self.data[self.used_indices] | ||
elif isinstance(self.data, list) and len(self.data) > 0 and all(isinstance(x, Sequence) for x in self.data): | ||
self.data = np.array([row for row in self._yield_row_from_seqlist(self.data, self.used_indices)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add test cases with Sequence
and List[Sequence]
for add_features_from()
here
# test that method works but sets raw data to None in case of immergeable data types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please enhance this test (or just add a new one without a parametrization to save some CI time)
def test_sequence(tmpdir, sample_count, batch_size, include_0_and_nan, num_seq): |
with new
get_data()
functionality for Sequence(s)
raw data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. I'll add test cases later this week.
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Test cases are added now. It's a little hacky because I'm not familiar with the use cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cyfdecyf Thank you very much for continuing to enhance Sequence
support!
Test cases are added now. It's a little hacky because I'm not familiar with the use cases.
No problem! I can slightly refactor them later. Don't want to delay this PR.
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Implement features for #4403.