Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] support Dataset.get_data for Sequence input. #4472

Merged
merged 5 commits into from
Jul 30, 2021

Conversation

cyfdecyf
Copy link
Contributor

Implement features for #4403.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cyfdecyf Many thanks for this PR! Please check my initial thoughts below:

Comment on lines +2257 to +2260
elif isinstance(self.data, Sequence):
self.data = self.data[self.used_indices]
elif isinstance(self.data, list) and len(self.data) > 0 and all(isinstance(x, Sequence) for x in self.data):
self.data = np.array([row for row in self._yield_row_from_seqlist(self.data, self.used_indices)])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test cases with Sequence and List[Sequence] for add_features_from() here

# test that method works but sets raw data to None in case of immergeable data types

Copy link
Collaborator

@StrikerRUS StrikerRUS Jul 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please enhance this test (or just add a new one without a parametrization to save some CI time)

def test_sequence(tmpdir, sample_count, batch_size, include_0_and_nan, num_seq):

with new get_data() functionality for Sequence(s) raw data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I'll add test cases later this week.

@cyfdecyf
Copy link
Contributor Author

Test cases are added now. It's a little hacky because I'm not familiar with the use cases.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cyfdecyf Thank you very much for continuing to enhance Sequence support!

Test cases are added now. It's a little hacky because I'm not familiar with the use cases.

No problem! I can slightly refactor them later. Don't want to delay this PR.

@StrikerRUS StrikerRUS merged commit 1d21d1a into microsoft:master Jul 30, 2021
@cyfdecyf cyfdecyf deleted the sequence-dataset branch August 2, 2021 23:17
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants