Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: list-likes of list-likes in str.cat #22264

Merged
merged 5 commits into from
Aug 31, 2018

Conversation

h-vetinari
Copy link
Contributor

As mentioned in #21950, my suggestion is to modify the allowed combinations (as of v0.23) as follows:

Type of "others"                        |  action  |  comment
---------------------------------------------------------------------
list-like of strings                    |   keep   |  as before; mimics behavior elsewhere,
                                                      cf.: pd.Series(range(3)) + [2,4,6]
Series                                  |   keep   |
np.ndarray (1-dim)                      |   keep   |
DataFrame                               |   keep   |  sequential concatenation
np.ndarray (2-dim)                      |   keep   |  sequential concatenation
list-like of
    Series/Index/np.ndarray (1-dim)     |   keep   |  sequential concatenation
list-like containing list-likes (1-dim)
    other than Series/Index/np.ndarray  |   DEPR   |  sequential concatenation

In other words, if the user wants sequential concatenation, there are many possibilities available, and list-of-lists does not have to be one of them, IMO. This would substantially simplify (post-deprecation) the code for str.cat._get_series_list:

def _get_series_list(self, others):
    """
    Auxiliary function for :meth:`str.cat`. Turn potentially mixed input
    into a list of Series (elements without an index must match the length
    of the calling Series/Index).

    Parameters
    ----------
    others : Series, DataFrame, np.ndarray, list-like or list-like of
        objects that are either Series, Index or np.ndarray (1-dim)

    Returns
    -------
    list : others transformed into list of Series
    """
    from pandas import Index, Series, DataFrame

    # self._orig is either Series or Index
    idx = self._orig if isinstance(self._orig, Index) else self._orig.index

    # Generally speaking, all objects without an index inherit the index
    # `idx` of the calling Series/Index - i.e. must have matching length.
    # Objects with an index (i.e. Series/Index/DataFrame) keep their own.
    if isinstance(others, Series):
        return [others]
    elif isinstance(others, Index):
        return [Series(others.values, index=others)]
    elif isinstance(others, DataFrame):
        return [others[x] for x in others]
    elif isinstance(others, np.ndarray) and others.ndim == 2:
        others = DataFrame(others, index=idx)
        return [others[x] for x in others]
    elif is_list_like(others):
        others = list(others)  # ensure iterators do not get read twice etc

        # in case of list-like `others`, all elements must be
        # either Series/Index/np.ndarray (1-dim)...
        if all(isinstance(x, (Series, Index))
               or (isinstance(x, np.ndarray) and x.ndim == 1) for x in others):
            los = []
            while others:  # iterate through list and append each element
                los = los + self._get_series_list(others.pop(0))
            return los
        # ... or just strings
        elif all(not is_list_like(x) for x in others):
            return [Series(others, index=idx)]
    raise TypeError('others must be Series, Index, DataFrame, np.ndarrary or '
                    'list-like (either containing only strings or containing '
                    'only objects of type Series/Index/np.ndarray[1-dim])')

@h-vetinari h-vetinari force-pushed the depr_str_cat_LLoLL branch 2 times, most recently from 0eeff33 to 372cc24 Compare August 9, 2018 18:24
@gfyoung gfyoung added Strings String extension data type and string data Deprecate Functionality to remove in pandas labels Aug 9, 2018
@@ -306,20 +306,20 @@ The same alignment can be used when ``others`` is a ``DataFrame``:
Concatenating a Series and many objects into a Series
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

All one-dimensional list-likes can be arbitrarily combined in a list-like container (including iterators, ``dict``-views, etc.):
Several objects of type ``Series``, ``Index`` or ``np.ndarray`` can be arbitrarily combined in a list-like container (including iterators, ``dict``-views, etc.):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Several objects of type Series, Index, or np.ndarray" - are you sure that's what you meant to say? I think that should be reworded.

join_warn = join_warn or wnx

if depr_warn:
warnings.warn('list-likes other than Series, Index or '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add a comma after "Index"

@gfyoung gfyoung requested review from TomAugspurger and WillAyd and removed request for WillAyd August 9, 2018 20:03
@pep8speaks
Copy link

pep8speaks commented Aug 9, 2018

Hello @h-vetinari! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 28, 2018 at 17:09 Hours UTC

@codecov
Copy link

codecov bot commented Aug 9, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@fa47b8d). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #22264   +/-   ##
=========================================
  Coverage          ?   92.03%           
=========================================
  Files             ?      169           
  Lines             ?    50785           
  Branches          ?        0           
=========================================
  Hits              ?    46742           
  Misses            ?     4043           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.44% <100%> (?)
#single 42.22% <0%> (?)
Impacted Files Coverage Δ
pandas/core/strings.py 98.63% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fa47b8d...7e74d80. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine, ex comments on the text.rst


.. ipython:: python

s
u
s.str.cat([u.values, ['A', 'B', 'C', 'D'], map(str, u.index)], na_rep='-')
s.str.cat([u.values, u, u.index.astype(str)], na_rep='-')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you save these in another variable, rather than doing the combination in-line. its easier to see it printed out (with comments)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "comments"?


All elements must match in length to the calling ``Series`` (or ``Index``), except those having an index if ``join`` is not None:

.. ipython:: python

v
s.str.cat([u, v, ['A', 'B', 'C', 'D']], join='outer', na_rep='-')
s.str.cat([u, v, u.values], join='outer', na_rep='-')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, u and u.values will have been shown 3 lines above, and v directly above. I think this should be plenty, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use w as you defined above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point here is to show how join works with objects that have no index, so can't use w (which has an index).

@jreback jreback added this to the 0.24.0 milestone Aug 9, 2018

.. ipython:: python

s
u
s.str.cat([u.values, ['A', 'B', 'C', 'D'], map(str, u.index)], na_rep='-')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put some comments here, this is just too much code in a row

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my POV, it's just one line of code, while showing all the ingredients. But I'm gonna change it back to just using variants of u in different "boxes".

I think it's fair enough to show just u and not u.values and pd.Index(u) as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so pls show u.values, I have no idea what it actually is, and as I said, pls either add comments or split up showing s, u, u.values, this is just too long

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback u is shown directly above...?! Also in the screenshot I posted - in context it's IMO very clear (especially as it's at the end of the concatenation section, where the reader has seen the same objects 10 times already).

I can of course show both u and u.values, but I thought that understanding the relationship between a Series and its .values would be safe to expect of a users pandas-fu.


All elements must match in length to the calling ``Series`` (or ``Index``), except those having an index if ``join`` is not None:

.. ipython:: python

v
s.str.cat([u, v, ['A', 'B', 'C', 'D']], join='outer', na_rep='-')
s.str.cat([u, v, u.values], join='outer', na_rep='-')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use w as you defined above

@jreback
Copy link
Contributor

jreback commented Aug 10, 2018

maybe would be helpful to show a rendered version here of text.rst

@h-vetinari
Copy link
Contributor Author

@jreback here you go:

1

@h-vetinari
Copy link
Contributor Author

@jreback @TomAugspurger ping :)


.. ipython:: python

s
u
s.str.cat([u.values, ['A', 'B', 'C', 'D'], map(str, u.index)], na_rep='-')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so pls show u.values, I have no idea what it actually is, and as I said, pls either add comments or split up showing s, u, u.values, this is just too long

@@ -474,12 +474,14 @@ Other API Changes
Deprecations
~~~~~~~~~~~~

- :meth:`DataFrame.to_stata`, :meth:`read_stata`, :class:`StataReader` and :class:`StataWriter` have deprecated the ``encoding`` argument. The encoding of a Stata dta file is determined by the file type and cannot be changed (:issue:`21244`).
- :meth:`MultiIndex.to_hierarchical` is deprecated and will be removed in a future version (:issue:`21613`)
- :meth:`DataFrame.to_stata`, :meth:`read_stata`, :class:`StataReader` and :class:`StataWriter` have deprecated the ``encoding`` argument. The encoding of a Stata dta file is determined by the file type and cannot be changed (:issue:`21244`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls don't edit other things. The longer the PR the more time things will take. I guess since its already done leave it.

- :meth:`Series.ptp` is deprecated. Use ``numpy.ptp`` instead (:issue:`21614`)
- :meth:`Series.compress` is deprecated. Use ``Series[condition]`` instead (:issue:`18262`)
- :meth:`Categorical.from_codes` has deprecated providing float values for the ``codes`` argument. (:issue:`21767`)
- :func:`pandas.read_table` is deprecated. Instead, use :func:`pandas.read_csv` passing ``sep='\t'`` if necessary (:issue:`21948`)
- :meth:`Series.str.cat` has deprecated using arbitrary list-likes *within* list-likes. A list-like container may still contain
arbitrarily many ``Series``, ``Index`` or 1-dimenstional ``np.ndarray``, or alternatively, only scalar values. (:issue:`21950`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the word arbitrarily (and in text.rst)

others : Series, DataFrame, np.ndarray, list-like or list-like of
objects that are either Series, np.ndarray (1-dim) or list-like
others : Series, Index, DataFrame, np.ndarray, list-like or list-like
of objects that are Series, Index, np.ndarray (1-dim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or np.ndarray

@h-vetinari
Copy link
Contributor Author

@jreback Incorporated feedback, except the u.values part, where my comment above doesn't show up (so copying it here):

so pls show u.values, I have no idea what it actually is, and as I said, pls either add comments or split up showing s, u, u.values, this is just too long

u is shown directly above...?! Also in the screenshot I posted. I can of course show both u and u.values, but I thought that understanding the relationship between a Series and its .values would be safe to expect of a users' pandas-fu.

In context it's IMO very clear (especially as it's at the end of the concatenation section, where the reader has seen the same objects 10 times already).

@h-vetinari
Copy link
Contributor Author

@jreback Same screenshot as above. IMO the relationship u <--> u.values does not need to be explained - if you disagree with that, please let me know. (side note: the example in the current docs is even more complicated; but ultimately its easy to grok from the result what's the content of the inputs -- see [70] below)

1_alt

@jreback
Copy link
Contributor

jreback commented Aug 22, 2018

ok this is fine. can you rebase once more and ping on green.

@h-vetinari
Copy link
Contributor Author

@jreback All green.

@h-vetinari h-vetinari force-pushed the depr_str_cat_LLoLL branch 2 times, most recently from 730ab79 to 80547eb Compare August 27, 2018 18:15
@h-vetinari
Copy link
Contributor Author

@jreback

ok this is fine. can you rebase once more and ping on green.

All green. Just fixed an indentation error on the way.

Copy link
Member

@gfyoung gfyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny typo, ping on push.

- :meth:`Series.ptp` is deprecated. Use ``numpy.ptp`` instead (:issue:`21614`)
- :meth:`Series.compress` is deprecated. Use ``Series[condition]`` instead (:issue:`18262`)
- The signature of :meth:`Series.to_csv` has been uniformed to that of doc:meth:`DataFrame.to_csv`: the name of the first argument is now 'path_or_buf', the order of subsequent arguments has changed, the 'header' argument now defaults to True. (:issue:`19715`)
- :meth:`Categorical.from_codes` has deprecated providing float values for the ``codes`` argument. (:issue:`21767`)
- :func:`pandas.read_table` is deprecated. Instead, use :func:`pandas.read_csv` passing ``sep='\t'`` if necessary (:issue:`21948`)
- :meth:`Series.str.cat` has deprecated using arbitrary list-likes *within* list-likes. A list-like container may still contain
many ``Series``, ``Index`` or 1-dimenstional ``np.ndarray``, or alternatively, only scalar values. (:issue:`21950`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dimensional

@h-vetinari
Copy link
Contributor Author

@jreback

tiny typo, ping on push.

Side note: [ci skip] does not seem to be respected by the new CircleCI.

@h-vetinari
Copy link
Contributor Author

@jreback

This is all done, approved and green.

@jreback
Copy link
Contributor

jreback commented Aug 31, 2018

Side note: [ci skip] does not seem to be respected by the new CircleCI.

I don't like these skip's generally. they are not necessary as all of the CI's cancel prior builds anyhow.

@jreback jreback merged commit 98fb53c into pandas-dev:master Aug 31, 2018
@jreback
Copy link
Contributor

jreback commented Aug 31, 2018

thanks @h-vetinari

@h-vetinari h-vetinari deleted the depr_str_cat_LLoLL branch September 2, 2018 00:03
@h-vetinari
Copy link
Contributor Author

@jreback

Side note: [ci skip] does not seem to be respected by the new CircleCI.

I don't like these skip's generally. they are not necessary as all of the CI's cancel prior builds anyhow.

This was not about cancelling prior builds but not having to run the full CI suite if (say) a single comment (like for the GH number) gets added. I view the CI as somewhat limited resources (appveyor recently having something like 12h waiting time), so [ci skip] can be useful for avoiding a CI run where it's pointless (any code/doc changes should of course run through).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DEPR: list of lists in Series.str.cat
4 participants