Skip to content

Latest commit

 

History

History
124 lines (88 loc) · 4.37 KB

README.md

File metadata and controls

124 lines (88 loc) · 4.37 KB

pandas-illustrated

pypi python pytest Coverage Badge Code style: black License

This repo contains code for a number of helper functions mentioned in the Pandas Illustrated guide.

Installation:

pip install pandas-illustrated

Contents

Basic operations:

  • find(s, x, pos=False)
  • findall(s, x, pos=False)
  • insert(dst, pos, value, label, axis=0, ignore_index = False, order=None, allow_duplicates=False, inplace=False)
  • append(dst, value, label = lib.no_default, axis=0, ignore_index = False, order=None, allow_duplicates: bool = False, inplace=False)
  • drop(obj, items=None, like=None, regex=None, axis=None)
  • move(obj, pos, label=None, column=None, index=None, axis=None, reset_index=False)
  • join(dfs, on=None, how="left", suffixes=None)

Visualization improvements:

  • patch_series_repr(footer=True)
  • unpatch_series_repr()
  • sidebyside(*dfs, names=[], index=True, valign="top")
  • sbs = sidebyside

MultiIndex helpers:

  • patch_mi_co()
  • from_dict(d)
  • from_kw(**kwargs)

Locking columns order:

  • locked(obj, level=None, axis=None, categories=None, inplace=False)
  • lock = locked with inplace=True
  • vis_lock(obj, checkmark="✓")
  • vis_patch()
  • vis_unpatch()
  • from_product(iterables, sortorder=None, names=lib.no_default, lock=True)

MultiIndex manipulations:

  • get_level(obj, level_id, axis=None)
  • set_level(obj, level_id, labels, name=lib.no_default, axis=None, inplace=False)
  • move_level(obj, src, dst, axis=None, inplace=False, sort=False)
  • insert_level(obj, pos, labels, name=lib.no_default, axis=None, inplace=False, sort=False)
  • drop_level(obj, level_id, axis=None, inplace=False)
  • swap_levels(obj, i: Axis = -2, j: Axis = -1, axis: Axis = None, inplace=False, sort=False)
  • join_levels(obj, name=None, sep="_", axis=None, inplace=False)
  • split_level(obj, names=None, sep="_", axis=None, inplace=False)
  • rename_level(obj, mapping, level_id=None, axis=None, inplace=False)

Usage

find and findall

By default find(series, value) looks for the first occurrence of the given value in a series and returns the corresponsing index label.

>>> import pandas as pd
>>> import pdi

>>> s = pd.Series([4, 2, 4, 6], index=['cat', 'penguin', 'dog', 'butterfly'])

>>> pdi.find(s, 2)
'penguin' 

>>> pdi.find(s, 4)
'cat' 

When the value is not found raises a ValueError.

findall(series, value) returns a (possibly empty) index of all matching occurrences:

>>> pdi.findall(s, 4)
Index(['cat', 'dog'], dtype='object')

With pos=True keyword argument find() and findall() return the positional index instead:

>>> pdi.find(s, 2, pos=True)
1 

>>> pdi.find(s, 4, pos=True)
0

There is a number of ways to find index label for a given value. The most efficient of them are:

s.index[s.tolist().index(x)]       # faster for Series with less than 1000 elementss.index[np.where(s == x)[0][0]]    # faster for Series with over 1000 elements  

find() chooses optimal implementation depending on the series size; findall() always uses the where implementation.

Improving Series Representation

Run pdi.patch_series_repr() to make Series look better:

If you want to display several Series from one cell, call display(s) for each.

Displaying several Pandas objects side vy side

To display several dataframes, series or indices side by side run pdi.sidebyside(s1, s2, ...)

Testing

Run pytest in the project root.