-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Fix DataFrame.to_xarray doctests and allow the CI to run it. #22673
Changes from 3 commits
670e768
a7ecbb2
ce5098a
08561d2
93edaca
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2494,80 +2494,91 @@ def to_xarray(self): | |
|
||
Returns | ||
------- | ||
a DataArray for a Series | ||
a Dataset for a DataFrame | ||
a DataArray for higher dims | ||
xarray.DataArray or xarray.Dataset | ||
Data in the pandas structure converted to Dataset if the object is | ||
a DataFrame, or a DataArray if the object is a Series. | ||
|
||
See Also | ||
-------- | ||
DataFrame.to_hdf : Write DataFrame to an HDF5 file. | ||
DataFrame.to_parquet : Write a DataFrame to the binary parquet format. | ||
|
||
Examples | ||
-------- | ||
>>> df = pd.DataFrame({'A' : [1, 1, 2], | ||
'B' : ['foo', 'bar', 'foo'], | ||
'C' : np.arange(4.,7)}) | ||
>>> df = pd.DataFrame([('falcon', 'bird', 389.0, 2), | ||
... ('parrot', 'bird', 24.0, 2), | ||
... ('lion', 'mammal', 80.5, 4), | ||
... ('monkey', 'mammal', np.nan, 4)], | ||
... columns=['name', 'class', 'max_speed', | ||
... 'num_legs'], | ||
... index=[0, 2, 3, 1]) | ||
>>> df | ||
A B C | ||
0 1 foo 4.0 | ||
1 1 bar 5.0 | ||
2 2 foo 6.0 | ||
name class max_speed num_legs | ||
0 falcon bird 389.0 2 | ||
2 parrot bird 24.0 2 | ||
3 lion mammal 80.5 4 | ||
1 monkey mammal NaN 4 | ||
|
||
>>> df.to_xarray() | ||
<xarray.Dataset> | ||
Dimensions: (index: 3) | ||
Dimensions: (index: 4) | ||
Coordinates: | ||
* index (index) int64 0 1 2 | ||
* index (index) int64 0 2 3 1 | ||
Data variables: | ||
A (index) int64 1 1 2 | ||
B (index) object 'foo' 'bar' 'foo' | ||
C (index) float64 4.0 5.0 6.0 | ||
|
||
>>> df = pd.DataFrame({'A' : [1, 1, 2], | ||
'B' : ['foo', 'bar', 'foo'], | ||
'C' : np.arange(4.,7)} | ||
).set_index(['B','A']) | ||
>>> df | ||
C | ||
B A | ||
foo 1 4.0 | ||
bar 1 5.0 | ||
foo 2 6.0 | ||
|
||
>>> df.to_xarray() | ||
name (index) object 'falcon' 'parrot' 'lion' 'monkey' | ||
class (index) object 'bird' 'bird' 'mammal' 'mammal' | ||
max_speed (index) float64 389.0 24.0 80.5 nan | ||
num_legs (index) int64 2 2 4 4 | ||
|
||
>>> df_multiindex = df.set_index(['class', 'name']) | ||
>>> df_multiindex | ||
max_speed num_legs | ||
class name | ||
bird falcon 389.0 2 | ||
parrot 24.0 2 | ||
mammal lion 80.5 4 | ||
monkey NaN 4 | ||
|
||
>>> df_multiindex.to_xarray() | ||
<xarray.Dataset> | ||
Dimensions: (A: 2, B: 2) | ||
Dimensions: (class: 2, name: 4) | ||
Coordinates: | ||
* B (B) object 'bar' 'foo' | ||
* A (A) int64 1 2 | ||
* class (class) object 'bird' 'mammal' | ||
* name (name) object 'falcon' 'lion' 'monkey' 'parrot' | ||
Data variables: | ||
C (B, A) float64 5.0 nan 4.0 6.0 | ||
|
||
>>> p = pd.Panel(np.arange(24).reshape(4,3,2), | ||
items=list('ABCD'), | ||
major_axis=pd.date_range('20130101', periods=3), | ||
minor_axis=['first', 'second']) | ||
>>> p | ||
<class 'pandas.core.panel.Panel'> | ||
Dimensions: 4 (items) x 3 (major_axis) x 2 (minor_axis) | ||
Items axis: A to D | ||
Major_axis axis: 2013-01-01 00:00:00 to 2013-01-03 00:00:00 | ||
Minor_axis axis: first to second | ||
|
||
>>> p.to_xarray() | ||
<xarray.DataArray (items: 4, major_axis: 3, minor_axis: 2)> | ||
array([[[ 0, 1], | ||
[ 2, 3], | ||
[ 4, 5]], | ||
[[ 6, 7], | ||
[ 8, 9], | ||
[10, 11]], | ||
[[12, 13], | ||
[14, 15], | ||
[16, 17]], | ||
[[18, 19], | ||
[20, 21], | ||
[22, 23]]]) | ||
max_speed (class, name) float64 389.0 nan nan 24.0 nan 80.5 nan nan | ||
num_legs (class, name) float64 2.0 nan nan 2.0 nan 4.0 4.0 nan | ||
|
||
>>> index = pd.MultiIndex(levels=[[pd.to_datetime("2018-01-01"), | ||
... pd.to_datetime("2015-05-23"), pd.to_datetime("2015-06-06"), | ||
... pd.to_datetime("2011-02-13"), pd.to_datetime("2014-07-06")], | ||
... ['one', 'two']], | ||
... labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]], | ||
... names=['first', 'second']) | ||
|
||
>>> s = pd.Series(np.arange(8), index=index) | ||
>>> s | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I find this too complicated for what we need to show. To have a Series with a multiindex with a datetime level, we can have something like:
I haven't used much xarray myself, and not sure what makes sense to show here. May be:
If that makes sense, I think with the first example, we can have @jreback does this make sense? Sorry for requesting the changes @Moisan, but my I find like the current version gives the idea that we're trying to show something more complex than what we are actually showing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No problem, I'm happy to make the examples more relevant :). |
||
first second | ||
2018-01-01 one 0 | ||
two 1 | ||
2015-05-23 one 2 | ||
two 3 | ||
2015-06-06 one 4 | ||
two 5 | ||
2011-02-13 one 6 | ||
two 7 | ||
dtype: int64 | ||
|
||
>>> s.to_xarray() | ||
<xarray.DataArray (first: 5, second: 2)> | ||
array([[ 0., 1.], | ||
[ 2., 3.], | ||
[ 4., 5.], | ||
[ 6., 7.], | ||
[nan, nan]]) | ||
Coordinates: | ||
* items (items) object 'A' 'B' 'C' 'D' | ||
* major_axis (major_axis) datetime64[ns] 2013-01-01 2013-01-02 2013-01-03 # noqa | ||
* minor_axis (minor_axis) object 'first' 'second' | ||
* first (first) datetime64[ns] 2018-01-01 2015-05-23 2015-06-06 ... | ||
* second (second) object 'one' 'two' | ||
|
||
Notes | ||
----- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we don't use the default index (so we don't specify it), or we specify one sorted? May be I'm missing the point, but seems like this should have a meaning, but couldn't see with the rest of the example. If there is no reason (may be you just copied from an example where this was for something?), I'd just remove it, so we save some space and avoid distractions.
The indentation of the
num_legs
seems wrong, I think it should be indented to the level ofname
. When possible we'll start validating automatically PEP8 in the examples, so if we can get this fixed already, that would be great.