Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: add top-level melt function as method #15521

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -933,6 +933,7 @@ Reshaping, sorting, transposing
DataFrame.swaplevel
DataFrame.stack
DataFrame.unstack
DataFrame.melt
DataFrame.T
DataFrame.to_panel
DataFrame.to_xarray
Expand Down
9 changes: 5 additions & 4 deletions doc/source/reshaping.rst
Original file line number Diff line number Diff line change
Expand Up @@ -265,7 +265,7 @@ the right thing:
Reshaping by Melt
-----------------

The :func:`~pandas.melt` function is useful to massage a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you mention here both?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (GitHub is not folding this review component for some reason, however).

The ``melt`` and :func:`~DataFrame.melt` functions are useful to massage a
DataFrame into a format where one or more columns are identifier variables,
while all other columns, considered measured variables, are "unpivoted" to the
row axis, leaving just two non-identifier columns, "variable" and "value". The
Expand All @@ -281,10 +281,11 @@ For instance,
'height' : [5.5, 6.0],
'weight' : [130, 150]})
cheese
pd.melt(cheese, id_vars=['first', 'last'])
pd.melt(cheese, id_vars=['first', 'last'], var_name='quantity')
cheese.melt(id_vars=['first', 'last'])
cheese.melt(id_vars=['first', 'last'], var_name='quantity')

Another way to transform is to use the ``wide_to_long`` panel data convenience function.
Another way to transform is to use the ``wide_to_long`` panel data convenience
function.

.. ipython:: python

Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -324,6 +324,7 @@ Other Enhancements
- ``Series.sort_index`` accepts parameters ``kind`` and ``na_position`` (:issue:`13589`, :issue:`14444`)

- ``DataFrame`` has gained a ``nunique()`` method to count the distinct values over an axis (:issue:`14336`).
- ``DataFrame`` has gained a ``melt()`` method, equivalent to ``pd.melt()``, for unpivoting from a wide to long format (:issue:`12640`).
- ``DataFrame.groupby()`` has gained a ``.nunique()`` method to count the distinct values for all columns within each group (:issue:`14336`, :issue:`15197`).

- ``pd.read_excel()`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)
Expand Down
105 changes: 104 additions & 1 deletion pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,9 @@
optional_by="""
by : str or list of str
Name or list of names which refer to the axis items.""",
versionadded_to_excel='')
versionadded_to_excel='',
versionadded_melt='\n.. versionadded:: 0.20.0\n',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs extra spaces (you can check pd.DataFrame.melt? if you are on this branch)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by this, can you clarify?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do pd.DataFrame.melt? you see:

Signature: pd.DataFrame.melt(self, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None)
Docstring:
    "Unpivots" a DataFrame from wide format to long format, optionally leaving
    identifier variables set.

    This function is useful to massage a DataFrame into a format where one
    or more columns are identifier variables (`id_vars`), while all other
    columns, considered measured variables (`value_vars`), are "unpivoted" to
    the row axis, leaving just two non-identifier columns, 'variable' and
    'value'.

.. versionadded:: 0.20.0


    Parameters
    ----------
    ....

so the versionadded is not indented as the other lines.

But I think Jeff fixed it before merging

other_melt='melt')

_numeric_only_doc = """numeric_only : boolean, default None
Include only float, int, boolean data. If None, will attempt to use
Expand Down Expand Up @@ -4051,6 +4053,107 @@ def unstack(self, level=-1, fill_value=None):
from pandas.core.reshape import unstack
return unstack(self, level, fill_value)

_shared_docs['melt'] = """
"Unpivots" a DataFrame from wide format to long format, optionally leaving
identifier variables set.

This function is useful to massage a DataFrame into a format where one
or more columns are identifier variables (`id_vars`), while all other
columns, considered measured variables (`value_vars`), are "unpivoted" to
the row axis, leaving just two non-identifier columns, 'variable' and
'value'.
%(versionadded_melt)s

Parameters
----------
frame : DataFrame
id_vars : tuple, list, or ndarray, optional
Column(s) to use as identifier variables.
value_vars : tuple, list, or ndarray, optional
Column(s) to unpivot. If not specified, uses all columns that
are not set as `id_vars`.
var_name : scalar
Name to use for the 'variable' column. If None it uses
``frame.columns.name`` or 'variable'.
value_name : scalar, default 'value'
Name to use for the 'value' column.
col_level : int or string, optional
If columns are a MultiIndex then use this level to melt.

See also
--------
%(other_melt)s
pivot_table
DataFrame.pivot

Examples
--------
>>> import pandas as pd
>>> df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
... 'B': {0: 1, 1: 3, 2: 5},
... 'C': {0: 2, 1: 4, 2: 6}})
>>> df
A B C
0 a 1 2
1 b 3 4
2 c 5 6

>>> pd.melt(df, id_vars=['A'], value_vars=['B'])
A variable value
0 a B 1
1 b B 3
2 c B 5

>>> pd.melt(df, id_vars=['A'], value_vars=['B', 'C'])
A variable value
0 a B 1
1 b B 3
2 c B 5
3 a C 2
4 b C 4
5 c C 6

The names of 'variable' and 'value' columns can be customized:

>>> pd.melt(df, id_vars=['A'], value_vars=['B'],
... var_name='myVarname', value_name='myValname')
A myVarname myValname
0 a B 1
1 b B 3
2 c B 5

If you have multi-index columns:

>>> df.columns = [list('ABC'), list('DEF')]
>>> df
A B C
D E F
0 a 1 2
1 b 3 4
2 c 5 6

>>> pd.melt(df, col_level=0, id_vars=['A'], value_vars=['B'])
A variable value
0 a B 1
1 b B 3
2 c B 5

>>> pd.melt(df, id_vars=[('A', 'D')], value_vars=[('B', 'E')])
(A, D) variable_0 variable_1 value
0 a B E 1
1 b B E 3
2 c B E 5

"""

@Appender(_shared_docs['melt'] % _shared_doc_kwargs)
def melt(self, id_vars=None, value_vars=None, var_name=None,
value_name='value', col_level=None):
from pandas.core.reshape import melt
return melt(self, id_vars=id_vars, value_vars=value_vars,
var_name=var_name, value_name=value_name,
col_level=col_level)

# ----------------------------------------------------------------------
# Time series-related

Expand Down
96 changes: 6 additions & 90 deletions pandas/core/reshape.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,11 @@
import pandas.core.algorithms as algos
from pandas._libs import algos as _algos, reshape as _reshape

from pandas.core.frame import _shared_docs
from pandas.util.decorators import Appender
_shared_docs_kwargs = dict(
versionadded_melt="", other_melt='DataFrame.melt')

from pandas.core.index import MultiIndex, _get_na_value


Expand Down Expand Up @@ -701,98 +706,9 @@ def _convert_level_number(level_num, columns):
return result


@Appender(_shared_docs['melt'] % _shared_docs_kwargs)
def melt(frame, id_vars=None, value_vars=None, var_name=None,
value_name='value', col_level=None):
"""
"Unpivots" a DataFrame from wide format to long format, optionally leaving
identifier variables set.

This function is useful to massage a DataFrame into a format where one
or more columns are identifier variables (`id_vars`), while all other
columns, considered measured variables (`value_vars`), are "unpivoted" to
the row axis, leaving just two non-identifier columns, 'variable' and
'value'.

Parameters
----------
frame : DataFrame
id_vars : tuple, list, or ndarray, optional
Column(s) to use as identifier variables.
value_vars : tuple, list, or ndarray, optional
Column(s) to unpivot. If not specified, uses all columns that
are not set as `id_vars`.
var_name : scalar
Name to use for the 'variable' column. If None it uses
``frame.columns.name`` or 'variable'.
value_name : scalar, default 'value'
Name to use for the 'value' column.
col_level : int or string, optional
If columns are a MultiIndex then use this level to melt.

See also
--------
pivot_table
DataFrame.pivot

Examples
--------
>>> import pandas as pd
>>> df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},
... 'B': {0: 1, 1: 3, 2: 5},
... 'C': {0: 2, 1: 4, 2: 6}})
>>> df
A B C
0 a 1 2
1 b 3 4
2 c 5 6

>>> pd.melt(df, id_vars=['A'], value_vars=['B'])
A variable value
0 a B 1
1 b B 3
2 c B 5

>>> pd.melt(df, id_vars=['A'], value_vars=['B', 'C'])
A variable value
0 a B 1
1 b B 3
2 c B 5
3 a C 2
4 b C 4
5 c C 6

The names of 'variable' and 'value' columns can be customized:

>>> pd.melt(df, id_vars=['A'], value_vars=['B'],
... var_name='myVarname', value_name='myValname')
A myVarname myValname
0 a B 1
1 b B 3
2 c B 5

If you have multi-index columns:

>>> df.columns = [list('ABC'), list('DEF')]
>>> df
A B C
D E F
0 a 1 2
1 b 3 4
2 c 5 6

>>> pd.melt(df, col_level=0, id_vars=['A'], value_vars=['B'])
A variable value
0 a B 1
1 b B 3
2 c B 5

>>> pd.melt(df, id_vars=[('A', 'D')], value_vars=[('B', 'E')])
(A, D) variable_0 variable_1 value
0 a B E 1
1 b B E 3
2 c B E 5

"""
# TODO: what about the existing index?
if id_vars is not None:
if not is_list_like(id_vars):
Expand Down
Loading