Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] fix mypy error about pandas categorical features #6253

Merged
merged 2 commits into from
Jan 3, 2024

Conversation

jameslamb
Copy link
Collaborator

Contributes to #3756.
Contributes to #3867.

Fixes the following error from mypy:

python-package/lightgbm/basic.py:824: error:
Incompatible return value type (got "tuple[ndarray[Any, Any], list[str], list[str] | list[int], list[list[Any]]]", expected "tuple[ndarray[Any, Any], list[str], list[str], list[list[Any]]]")  [return-value]

@@ -786,7 +786,7 @@ def _data_from_pandas(
feature_name: _LGBM_FeatureNameConfiguration,
categorical_feature: _LGBM_CategoricalFeatureConfiguration,
pandas_categorical: Optional[List[List]]
) -> Tuple[np.ndarray, List[str], List[str], List[List]]:
) -> Tuple[np.ndarray, List[str], Union[List[str], List[int]], List[List]]:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hint was wrong.

categorical_feature can be a list of integers (indicating 0-based column positions).

categorical_feature : list of str or int, or 'auto', optional (default="auto")
Categorical features.
If list of int, interpreted as indices.

And it's passed back out of this function unmodified if it's provided like that.

if categorical_feature == 'auto': # use cat cols from DataFrame
categorical_feature = cat_cols_not_ordered
else: # use cat cols specified by user
categorical_feature = list(categorical_feature) # type: ignore[assignment]

categorical_feature = cat_cols_not_ordered
else: # use cat cols specified by user
categorical_feature = list(categorical_feature) # type: ignore[assignment]
Copy link
Collaborator Author

@jameslamb jameslamb Dec 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

categorical_feature can only be a list of strings, a list of integers, or the literal string 'auto'.

So I don't think this else: clause is helpful. When it's reached, under correctly-provided inputs it's just calling list() on something that's already a list, which has no effect.

I guess this could be providing a small bit of help for cases where some other type like a tuple was passed in, but if we want to support that then:

  • it should be done at all public interfaces that support categorical_feature
  • it shouldn't be buried inside a pandas-specific function

I think this can and should be removed, in the interest of further simplifying the flow of pandas data through the library.

@jameslamb jameslamb marked this pull request as ready for review December 30, 2023 04:55
@jameslamb jameslamb changed the title WIP: [python-package] fix mypy error about pandas categorical features [python-package] fix mypy error about pandas categorical features Dec 30, 2023
@jameslamb jameslamb merged commit 48e3629 into master Jan 3, 2024
41 checks passed
@jameslamb jameslamb deleted the mypy/pandas-cat-features branch January 3, 2024 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants