-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: clean color selection in _matplotlib/style #37203
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @ivanovmg
emm, maybe @simonjayhawkins can provide some feedbacks on the mypy error here?
pandas/plotting/_matplotlib/style.py
Outdated
|
||
def get_standard_colors( | ||
num_colors: int, colormap=None, color_type: str = "default", color=None | ||
num_colors: int, | ||
colormap=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you are doing the refactor here, maybe can you annotate this as well?
pandas/plotting/_matplotlib/style.py
Outdated
num_colors: int, | ||
colormap=None, | ||
color_type: str = "default", | ||
color=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also annotate this?
pandas/plotting/_matplotlib/style.py
Outdated
def _get_colors_from_colormap( | ||
colormap: Union[str, "Colormap"], | ||
num_colors: int, | ||
) -> Collection[Color]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emm, i have seen you use Collection
in some places in this file, while based on the code, seems almost they are all List
, should we just use List
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can use Collection for arguments to be permissive where appropriate, but return type should be more precise, in this case List.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will adjust for return types. I remember exactly @simonjayhawkins suggesting to use more generic types. Thus, I used collection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. that's correct for function arguments. return types should be precise.
pandas/plotting/_matplotlib/style.py
Outdated
|
||
def _get_colors_from_color( | ||
color: Union[Color, Dict[str, Color], Collection[Color]], | ||
) -> Union[Dict[str, Color], Collection[Color]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe change Collection
to List
?
pandas/plotting/_matplotlib/style.py
Outdated
) -> Union[Dict[str, Color], Collection[Color]]: | ||
"""Get colors from user input color.""" | ||
if isinstance(color, Iterable) and len(color) == 0: | ||
raise ValueError("Invalid color argument: {color}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need an f
in front?
just to clarify: what was the reason to check ValueError here? seems original error message gives an empty string, while here is an empty iterable? is this on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this check to avoid try/except ZeroDivisionError (in the original code), which would be happening when color == "". I will make it an f-string, I missed that.
pandas/plotting/_matplotlib/style.py
Outdated
if isinstance(color, dict): | ||
return color | ||
|
||
if isinstance(color, str): | ||
if _is_single_color(color): | ||
# GH #36972 | ||
return [color] | ||
else: | ||
return list(color) | ||
|
||
# ignoring mypy error here | ||
# error: Argument 1 to "list" has incompatible type | ||
# "Union[Sequence[float], Collection[Union[str, Sequence[float]]]]"; | ||
# expected "Iterable[Union[str, Sequence[float]]]" [arg-type] | ||
# A this point color may be sequence of floats or series of colors, | ||
# all convertible to list | ||
return list(color) # type: ignore [arg-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emm, this seems to be a bit more complex than the original comprehension (IIUC, you are adding a patch to convert single color to list in this, right?) is it possible to also write in a comprehension expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original portion of the code was exactly quite dynamic. I decided to make it more explicit. I now understand that I missed one case here (which does not hit in tests). This case is when color is sequence of floats (apparently the internal code calls get_standard_colors only with color being sequence of sequence of floats). I will add this edge case and the corresponding test. Presumably, mypy errors will also be resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly speaking, I kind of liked the original comprehension. The problem is that it was not compatible with typing.
pandas/plotting/_matplotlib/style.py
Outdated
return list(color) # type: ignore [arg-type] | ||
|
||
|
||
def _get_colors_from_color_type(color_type: str, num_colors: int) -> Collection[Color]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List should be enough
pandas/plotting/_matplotlib/style.py
Outdated
raise ValueError("color_type must be either 'default' or 'random'") | ||
|
||
|
||
def _get_default_colors(num_colors: int) -> Collection[Color]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List should be enough
pandas/plotting/_matplotlib/style.py
Outdated
colors = [c["color"] for c in list(plt.rcParams["axes.prop_cycle"])] | ||
except KeyError: | ||
colors = list(plt.rcParams.get("axes.color_cycle", list("bgrcmyk"))) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the original code, seems there is a check of colors
in this branch, so does it mean after this refactoring, this won't happen? or colors
shouldn't be a str in the first place once reaching this part of code?
if isinstance(colors, str):
colors = list(colors)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, in this particular function colors will not be a string. This check is effectively carried out in _get_colors_from_color
.
pandas/plotting/_matplotlib/style.py
Outdated
def _get_colors_from_colormap( | ||
colormap: Union[str, "Colormap"], | ||
num_colors: int, | ||
) -> Collection[Color]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can use Collection for arguments to be permissive where appropriate, but return type should be more precise, in this case List.
pandas/plotting/_matplotlib/style.py
Outdated
# expected "Iterable[Union[str, Sequence[float]]]" [arg-type] | ||
# A this point color may be sequence of floats or series of colors, | ||
# all convertible to list | ||
return list(color) # type: ignore [arg-type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ci failure could be fixed with
return list(color) # type: ignore [arg-type] | |
return list(color) # type: ignore[arg-type] |
but please avoid using ignore instead.
It turns out that even if "axes.prop_cycle" is not defined, then it will be composed of CN colors (hex notation). However, in the current matplotlib there is no "axes.color_cycle" property in rcParams at all (not even allowed to setup).
797ce7a
to
79b0f08
Compare
Here is a problem. I want to install Cycler (https://matplotlib.org/cycler/) for testing purposes, to mimic matplotlib behavior on default color selection. When I add it into |
8dd5d3c
to
f513bdb
Compare
I have some weird problem with the new test module
|
All checks are good except Windows py38_np18, where tests related to numba failed. I guess it is irrelevant of my changes. |
I checked that we actually can use cycler. When I use cycler I would prefer to keep only required number of colors:
This looks simple to understand, IMHO. |
pandas/tests/plotting/test_style.py
Outdated
with suppress(ImportError): | ||
from pandas.plotting._matplotlib.style import get_standard_colors | ||
|
||
|
||
@td.skip_if_no_mpl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could just use pytest.importorskip see pandas\tests\plotting\test_converter.py or other test modules testing optional dependencies. e.g. pytables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did that. Please check if this is what you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for late response, i was fallen ill in the past week.
@ivanovmg looks very good to me and original logics seem to be covered and unchanged! nice job!
only two minor suggestions on the tests to make the tests slightly shorter, also okay if not change.
(7, ["b", "g", "r", "y", "b", "g", "r"]), | ||
], | ||
) | ||
def test_default_colors_named_from_prop_cycle_string(self, num_colors, expected): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you could combine this one with the test above, and add cycler_color
to parameterize since those two tests look almost identical and test basically the same except how colors are defined
(1, ["r", "g", "b", "k"]), | ||
(2, ["r", "g", "b", "k"]), | ||
(3, ["r", "g", "b", "k"]), | ||
(4, ["r", "g", "b", "k"]), | ||
(5, ["r", "g", "b", "k", "r"]), | ||
(6, ["r", "g", "b", "k", "r", "g"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it's nicer to just test the case where num_color </>/= color, so 3 parametrizations is enough. no need to change here though ^^
pandas/plotting/_matplotlib/style.py
Outdated
return [color] | ||
|
||
if _is_floats_color(color): | ||
color = cast(Sequence[float], color) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cast seems unnecessary - is there an error this solves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, cast is necessary.
At this point color
is of type Union[Color, Collection[Color]]
. And for some reason _is_floats_color
check does not filter out Collection[Color]
. So, instead of ignoring, I added cast here.
mypy error if removing cast.
pandas\plotting\_matplotlib\style.py:114: error: List item 0 has incompatible type "Union[Union[str, Sequence[float]], Collection[Union[str, Sequence[float]]]]"; expected "Union[str, Sequence[float]]" [list-item]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functions can't narrow types yet in mypy, but regardless this is pretty confusing. Can you try to refactor this to make things clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not know how to make it even clearer.
The logic is the following.
- Start with color being maybe
a. a single string color
b. or multiple string color (like 'rgbk')
c. or single float color (0.1, 0.2, 0.3)
d. or multiple float colors - Address option a.
- Address option b.
- Address options c and d.
Is the logic confusing or the two cast statements are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd, I updated the logic a little bit.
- Address one color (either string or sequence of floats)
- Address collection of colors
Casting is still there as there are no other options to make mypy happy.
Please let me know what you think about this implementation.
1. Rename _is_single_color -> _is_single_string_color 2. Extract new method _is_single_color, which checks whether the color provided is a single color, that can be either string or a sequence of floats. 3. Allow integers to be a part of float sequence color, (1, 0.4, 0.2, 0.5), for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. a couple of comments (aside from @WillAyd and @charlesdong1991 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a few minor comments, ping on green.
pandas/plotting/_matplotlib/style.py
Outdated
color_type: str, | ||
num_colors: int, | ||
) -> List[Color]: | ||
"""Get colors from user input.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a doc-string.
also the current summary is not very descriptive, nor is the function name .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it into _derive_colors
. I am not sure if you would consider it a better alternative. I thought of having a more verbose name (like _derive_colors_from_cmap_color_and_type
, but that does not seem reasonable to me, although I added the explanation in the docstring.
Anyway derive seems a better verb, implying some logic underneath.
ce2361e
to
b369834
Compare
thanks @ivanovmg very nice |
… (#37655) * Moving the file test_frame.py to a new directory * Сreated file test_frame_color.py * Transfer tests of test_frame.py to test_frame_color.py * PEP 8 fixes * Transfer tests of test_frame.py to test_frame_groupby.py and test_frame_subplots.py * Removing unnecessary imports * PEP 8 fixes * Fixed class name * Transfer tests of test_frame.py to test_frame_subplots.py * Transfer tests of test_frame.py to test_frame_groupby.py, test_frame_subplots.py, test_frame_color.py * Changed class names * Removed unnecessary imports * Removed import * catch FutureWarnings (#37587) * TST/REF: collect indexing tests by method (#37590) * REF: prelims for single-path setitem_with_indexer (#37588) * ENH: __repr__ for 2D DTA/TDA (#37164) * CLN: de-duplicate _validate_where_value with _validate_setitem_value (#37595) * TST/REF: collect tests by method (#37589) * TST/REF: move remaining setitem tests from test_timeseries * TST/REF: rehome test_timezones test * move misplaced arithmetic test * collect tests by method * move misplaced file * REF: Categorical.is_dtype_equal -> categories_match_up_to_permutation (#37545) * CLN refactor non-core (#37580) * refactor core/computation (#37585) * TST/REF: share method tests between DataFrame and Series (#37596) * BUG: Index.where casting ints to str (#37591) * REF: IntervalArray comparisons (#37124) * regression fix for merging DF with datetime index with empty DF (#36897) * ERR: fix error message in Period for invalid frequency (#37602) * CLN: remove rebox_native (#37608) * TST/REF: tests.generic (#37618) * TST: collect tests by method (#37617) * TST/REF: collect test_timeseries tests by method * misplaced DataFrame.values tst * misplaced dataframe.values test * collect test by method * TST/REF: share tests across Series/DataFrame (#37616) * Gh 36562 typeerror comparison not supported between float and str (#37096) * docs: fix punctuation (#37612) * REGR: pd.to_hdf(..., dropna=True) not dropping missing rows (#37564) * parametrize set_axis tests (#37619) * CLN: clean color selection in _matplotlib/style (#37203) * DEPR: DataFrame/Series.slice_shift (#37601) * REF: re-use validate_setitem_value in Categorical.fillna (#37597) * PERF: release gil for ewma_time (#37389) * BUG: Groupy dropped nan groups from result when grouping over single column (#36842) * ENH: implement timeszones support for read_json(orient='table') and astype() from 'object' (#35973) * REF/BUG/TYP: read_csv shouldn't close user-provided file handles (#36997) * BUG/REF: read_csv shouldn't close user-provided file handles * get_handle: typing, returns is_wrapped, use dataclass, and make sure that all created handlers are returned * remove unused imports * added IOHandleArgs.close * added IOArgs.close * mostly comments * move memory_map from TextReader to CParserWrapper * moved IOArgs and IOHandles * more comments Co-authored-by: Jeff Reback <jeff@reback.net> * more typing checks to pre-commit (#37539) * TST: 32bit dtype compat test_groupby_dropna (#37623) * BUG: Metadata propagation for groupby iterator (#37461) * BUG: read-only values in cython funcs (#37613) * CLN refactor core/arrays (#37581) * Fixed Metadata Propogation in DataFrame (#37381) * TYP: add Shape alias to pandas._typing (#37128) * DOC: Fix typo (#37630) * CLN: parametrize test_nat_comparisons (#37195) * dataframe dataclass docstring updated (#37632) * refactor core/groupby (#37583) * BUG: set index of DataFrame.apply(f) when f returns dict (#37544) (#37606) * BUG: to_dict should return a native datetime object for NumPy backed dataframes (#37571) * ENH: memory_map for compressed files (#37621) * DOC: add example & prose of slicing with labels when index has duplicate labels (#36814) * DOC: add example & prose of slicing with labels when index has duplicate labels #36251 * DOC: proofread the sentence. Co-authored-by: Jun Kudo <jun-lab@junnoMacBook-Pro.local> * DOC: Fix typo (#37636) "columns(s)" sounded odd, I believe it was supposed to be just "column(s)". * CI: troubleshoot win py38 builds (#37652) * TST/REF: collect indexing tests by method (#37638) * TST/REF: collect tests for get_numeric_data (#37634) * misplaced loc test * TST/REF: collect get_numeric_data tests * REF: de-duplicate _validate_insert_value with _validate_scalar (#37640) * CI: catch windows py38 OSError (#37659) * share test (#37679) * TST: match matplotlib warning message (#37666) * TST: match matplotlib warning message * TST: match full message * pd.Series.loc.__getitem__ promotes to float64 instead of raising KeyError (#37687) * REF/TST: misplaced Categorical tests (#37678) * REF/TST: collect indexing tests by method (#37677) * CLN: only call _wrap_results one place in nanmedian (#37673) * TYP: Index._concat (#37671) * BUG: CategoricalIndex.equals casting non-categories to np.nan (#37667) * CLN: _replace_single (#37683) * REF: simplify _replace_single by noting regex kwarg is bool * Annotate * CLN: remove never-False convert kwarg * TYP: make more internal funcs keyword-only (#37688) * REF: make Series._replace_single a regular method (#37691) * REF: simplify cycling through colors (#37664) * REF: implement _wrap_reduction_result (#37660) * BUG: preserve fold in Timestamp.replace (#37644) * CLN: Clean indexing tests (#37689) * TST: fix warning for pie chart (#37669) * PERF: reverted change from commit 7d257c6 to solve issue #37081 (#37426) * DataFrameGroupby.boxplot fails when subplots=False (#28102) * ENH: Improve error reporting for wrong merge cols (#37547) * Transfer tests of test_frame.py to test_frame_color.py * PEP8 * Fixes for linter * Сhange pd.DateFrame to DateFrame * Move inconsistent namespace check to pre-commit, fixup more files (#37662) * check for inconsistent namespace usage * doc * typos * verbose regex * use verbose flag * use verbose flag * match both directions * add test * don't import annotations from future * update extra couple of cases * 🚚 rename * typing * don't use subprocess * don't type tests * use pathlib * REF: simplify NDFrame.replace, ObjectBlock.replace (#37704) * REF: implement Categorical.encode_with_my_categories (#37650) * REF: implement Categorical.encode_with_my_categories * privatize * BUG: unpickling modifies Block.ndim (#37657) * REF: dont support dt64tz in nanmean (#37658) * CLN: Simplify groupby head/tail tests (#37702) * Bug in loc raised for numeric label even when label is in Index (#37675) * REF: implement replace_regex, remove unreachable branch in ObjectBlock.replace (#37696) * TYP: Check untyped defs (except vendored) (#37556) * REF: remove ObjectBlock._replace_single (#37710) * Transfer tests of test_frame.py to test_frame_color.py * TST/REF: collect indexing tests by method (#37590) * PEP8 * Сhange DateFrame to pd.DateFrame * Сhange pd.DateFrame to DateFrame * Removing imports * Bug fixes * Bug fixes * Fix incorrect merge * test_frame_color.py edit * Transfer tests of test_frame.py to test_frame_color.py, test_frame_groupby.py and test_frame_subplots.py * Removing unnecessary imports * PEP8 * # Conflicts: # pandas/tests/plotting/frame/test_frame.py # pandas/tests/plotting/frame/test_frame_color.py # pandas/tests/plotting/frame/test_frame_subplots.py * Moving the file test_frame.py to a new directory * Transfer tests of test_frame.py to test_frame_color.py, test_frame_groupby.py and test_frame_subplots.py * Removing unnecessary imports * PEP8 * CLN: clean categorical indexes tests (#37721) * Fix merge error * PEP 8 fixes * Fix merge error * Removing unnecessary imports * PEP 8 fixes * Fixed class name * Transfer tests of test_frame.py to test_frame_subplots.py * Transfer tests of test_frame.py to test_frame_groupby.py, test_frame_subplots.py, test_frame_color.py * Changed class names * Removed unnecessary imports * Removed import * TST/REF: collect indexing tests by method (#37590) * TST: match matplotlib warning message (#37666) * TST: match matplotlib warning message * TST: match full message * TST: fix warning for pie chart (#37669) * Transfer tests of test_frame.py to test_frame_color.py * PEP8 * Fixes for linter * Сhange pd.DateFrame to DateFrame * Transfer tests of test_frame.py to test_frame_color.py * PEP8 * Сhange DateFrame to pd.DateFrame * Сhange pd.DateFrame to DateFrame * Removing imports * Bug fixes * Bug fixes * Fix incorrect merge * test_frame_color.py edit * Fix merge error * Fix merge error * Removing unnecessary features * Resolving Commit Conflicts daf999f 365d843 * black fix Co-authored-by: jbrockmendel <jbrockmendel@gmail.com> Co-authored-by: Marco Gorelli <m.e.gorelli@gmail.com> Co-authored-by: Philip Cerles <philip.cerles@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Sven <sven.schellenberg@paradynsystems.com> Co-authored-by: Micael Jarniac <micael@jarniac.com> Co-authored-by: Andrew Wieteska <48889395+arw2019@users.noreply.github.com> Co-authored-by: Maxim Ivanov <41443370+ivanovmg@users.noreply.github.com> Co-authored-by: Erfan Nariman <34067903+erfannariman@users.noreply.github.com> Co-authored-by: Fangchen Li <fangchen.li@outlook.com> Co-authored-by: patrick <61934744+phofl@users.noreply.github.com> Co-authored-by: attack68 <24256554+attack68@users.noreply.github.com> Co-authored-by: Torsten Wörtwein <twoertwein@users.noreply.github.com> Co-authored-by: Jeff Reback <jeff@reback.net> Co-authored-by: Janus <janus@insignificancegalore.net> Co-authored-by: Joel Whittier <rootbeerfriend@gmail.com> Co-authored-by: taytzehao <jtth95@gmail.com> Co-authored-by: ma3da <34522496+ma3da@users.noreply.github.com> Co-authored-by: junk <juntrp0207@gmail.com> Co-authored-by: Jun Kudo <jun-lab@junnoMacBook-Pro.local> Co-authored-by: Alex Kirko <alexander.kirko@gmail.com> Co-authored-by: Yassir Karroum <ukarroum17@gmail.com> Co-authored-by: Kaiqi Dong <kaiqi@kth.se> Co-authored-by: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com> Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
There was a comment in the top of the module that the code was a bit too dynamic.
The logic in the function was too complex.
I refactored the code
Note: there is one issue with mypy, which I do not know how to handle. Currently I suggested to ignore the error.